GeneveProxy - an AWS Gateway Load Balancer reference application

GeneveProxy - an AWS Gateway Load Balancer reference application

The AWS Gateway Load Balancer (GWLB) allows AWS users to route VPC traffic through a centralized appliance. This appliance can perform monitoring, throttling and deep packet inspection. To achieve this, the appliance needs to support Geneve encapsulation and decapsulation. In this post we will provide a blueprint for a Python application with full Geneve support.

Reading the AWS Gateway Load Balancer product page and documentation, you will find that the product is strongly marketed towards third-party appliances. In other words, you purchase an appliance from a third-party vendor (often at a hefty price) and use their product to monitor and inspect your traffic. But as Corey Quinn wrote in his post “What I Don’t Get about the AWS Gateway Load Balancer”:

A (very!) careful reading of the documentation indicates that you aren’t required to go cross-account with these devices, and that there’s no requirement that the appliances actually be third party.

For this article I have built exactly that: a first party application, written in Python, which receives traffic from the GWLB, decapsulates the packet, inspects the packet, re-encapsulates it and returns it to the GWLB. The application is called GeneveProxy, and its full source code can be found on the Github GeneveProxy project page. Any code in this article is taken directly from this project. The purpose of this application is to provide a reference on how to process Geneve headers and how to interact with the AWS Gateway Load Balancer.

In the article below we will describe the full process of routing, encapsulation and packet inspection. We will do this by following an example packet from a source EC2 instance, through PrivateLink, to the GWLB, to the appliance, back to the GWLB and PrivateLink, and out to the internet.

Overview

I will not go into how to build the infrastructure, because this is pretty well documented on the Gateway Load Balancer Getting Started page.

Packets and headers

When the source EC2 instance executes a GET http://google.com, a new packet is sent out on the wire (1). This packet consists of the HTTP request with a TCP header and an IPv4 header:

IPv4 Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL=5 |    DSCP   |ECN|        Total Length=76        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Identification        |Flags|      Fragment Offset    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Time to Live | Protocol = 6  |         Header Checksum       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                  Source Address=10.1.0.152                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|             Destination Address=209.85.202.138                |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

TCP header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Source Port=51714       |      Destination Port=80      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        Sequence Number                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Acknowledgment Number                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Data |           |U|A|P|R|S|F|                               |
| Offset| Reserved  |R|C|S|S|Y|I|            Window             |
|       |           |G|K|H|T|N|N|                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           Checksum            |         Urgent Pointer        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Options                    |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

HTTP data
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             data                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The route tables for the source EC2 instance determine this packet needs to be routed to the public internet. The route tables have been configured to forward this traffic to the PrivateLink endpoint for the Gateway Load Balancer.

When the packet arrives at the GWLB, it adds three additional headers. First the Geneve header, then an UDP header, and finally an outer IPv4 header. Then the packet is put on the wire again (3). The new headers direct the traffic towards the Appliance EC2 instance:

Outer IPv4 Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL=5 |    DSCP   |ECN|        Total Length=60        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Identification        |Flags|      Fragment Offset    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Time to Live | Protocol = 17 |         Header Checksum       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                  Source Address=10.0.0.230                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|               Destination Address=10.0.10.132                 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

UDP Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Source Port=24810       |     Destination Port=6081     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Length=108           |           Checksum            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Geneve Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver| Opt Len=8 |O|C|    Rsvd.  |  Protocol Type=0x0800 (IPv4)  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Virtual Network Identifier (VNI)=0      |    Reserved   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Option Class=0x0108     |     Type=1    |R|R|R| Length=2|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      64-bit GWLBE ENI ID                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Option Class=0x0108     |     Type=2    |R|R|R| Length=2|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|              64-bit Customer Visible Attachment ID            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Option Class=0x0108     |     Type=3    |R|R|R| Length=1|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      32-bit Flow Cookie                       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Inner IPv4 Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL=5 |    DSCP   |ECN|        Total Length=60        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Identification        |Flags|      Fragment Offset    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Time to Live |  Protocol = 6 |         Header Checksum       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                  Source Address=10.1.0.152                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|             Destination Address=209.85.202.138                |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Options                    |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

TCP header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Source Port=51714       |      Destination Port=80      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        Sequence Number                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Acknowledgment Number                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Data |           |U|A|P|R|S|F|                               |
| Offset| Reserved  |R|C|S|S|Y|I|            Window             |
|       |           |G|K|H|T|N|N|                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           Checksum            |         Urgent Pointer        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Options                    |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

HTTP data
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             data                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The stack above is the exact packet that arrives at the appliance.

Receiving raw packets with Python

The AWS article Integrate your custom logic or appliance with AWS Gateway Load Balancer states:

When the appliance intends to forward the packet, it must do the following: […] swap the source and destination IP addresses in outer IPv4 header (i.e. Source IP = appliance IP address. Destination IP = GWLB IP address) […] update the IP checksum in outer IPv4 header.

To be able to do this, we need to access the raw IP headers. When you create a normal UDP socket, like so:

bind_sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
bind_sock.bind((UDP_IP, UDP_PORT))

You will only receive the data after the UDP header (the Geneve header will be at byte 0). To overcome this, we create a raw socket instead (GitHub link):

geneve_sock = socket.socket(
    socket.AF_INET,
    socket.SOCK_RAW,
    socket.IPPROTO_UDP
)

This socket will receive all UDP data directed to the appliance EC2 instance, so it’s up to us to only process packets arriving at the Geneve port (6081).

Parsing the data, header by header

Through the raw socket we have access to the full packet sent by the GWLB. As described above, this packet has six sections (outer IPv4, UDP, Geneve, inner IPv4, TCP, data). The appliance is built to read these sections and to determine whether a packet should be forwarded or dropped.

Flow Diagram

The outer IPv4 header

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL=5 |    DSCP   |ECN|        Total Length=60        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Identification        |Flags|      Fragment Offset    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Time to Live | Protocol = 17 |         Header Checksum       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                  Source Address=10.0.0.230                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|               Destination Address=10.0.10.132                 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The outer IPv4 header is read with outer_ipv4_header = Ipv4Header(data). This constructor reads all the bits and bytes of the IPv4 header (GitHub link):

ipv4_header_first_word = unpack('!BBH', header_bytes[:4])
ipv4_header_second_word = unpack('!HH', header_bytes[4:8])
ipv4_header_third_word = unpack('!BBH', header_bytes[8:12])

self.ihl = ipv4_header_first_word[0] & 0xF
self.dscp = ipv4_header_first_word[1] >> 2
self.ecn = ipv4_header_first_word[1] & 0x3
self.total_length = ipv4_header_first_word[2]

self.identification = ipv4_header_second_word[0]
self.flags = ipv4_header_second_word[1] >> 13
self.fragment_offset = ipv4_header_second_word[1] & 0x1FFF

self.ttl = ipv4_header_third_word[0]
self.protocol = ipv4_header_third_word[1]
self.header_checksum = ipv4_header_third_word[2]

self.source_ip = header_bytes[12:16]
self.destination_ip = header_bytes[16:20]

The ihl field, for Internet Header Length, contains the length of the IPv4 header (in 32-bit words). It is usually 5, which translates to 5 * 4 = 20 bytes of data. When the IPv4 header has been read, we pop these bytes from the received data, bringing the next header to the top of the data stack: data = data[outer_ipv4_header.ihl * 4:] (GitHub link).

The IPv4 header also specifies which protocol follows next in the protocol field. With GWLB, this will always be UDP (0x0011 or 17).

The outer UDP header

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Source Port=24810       |     Destination Port=6081     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Length=108           |           Checksum            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

With the IPv4 header stripped from the data, we can read the UDP header with udp_header = UdpHeader(data). Much like the IPv4 header, this constructor reads the bits and bytes from the UDP header (GitHub link):

udp_header_first_word = unpack('!HH', header_bytes[:4])
self.source_port = udp_header_first_word[0]
self.destination_port = udp_header_first_word[1]

udp_header_second_word = unpack('!HH', header_bytes[4:8])
self.length = udp_header_second_word[0]
self.checksum = udp_header_second_word[1]

The UDP header has a fixed length of two 32-bit words (8 bytes), so we can pop the header from the data with data = data[2 * 4:] (GitHub link). This brings the Geneve header to the top of the data.

The Geneve header

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver| Opt Len=8 |O|C|    Rsvd.  |  Protocol Type=0x0800 (IPv4)  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Virtual Network Identifier (VNI)=0      |    Reserved   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Option Class=0x0108     |     Type=1    |R|R|R| Length=2|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      64-bit GWLBE ENI ID                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Option Class=0x0108     |     Type=2    |R|R|R| Length=2|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|              64-bit Customer Visible Attachment ID            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Option Class=0x0108     |     Type=3    |R|R|R| Length=1|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      32-bit Flow Cookie                       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The Geneve header consists of two 32-bit words, followed by an variable amount of options. The total amount of data occupied by the option fields (in 32-bit words) is set in the Opt Len field. The Geneve header sent by GWLB always has option length 8, which means that the two-word header is followed by eight words of options, for a total of ten words or 40 bytes.

Each option (called a Type, Length, Value triplet or TLV) has a one-word header, followed by one or more words of data. The length of this data is specified in the length field of the TLV. These options can thus be parsed as follows (GitHub link):

parsed_length = 0

# Loop over the options until the amount of words processed matches
# the length set in opt_len.
while parsed_length != opt_len:
    # Unpack the first word (static header for this option)
    first_word = unpack('!HBB', header_bytes[:4])

    option_class = first_word[0]
    option_type = first_word[1]
    reserved = first_word[2] >> 5
    length = first_word[2] & 0x1f # in words (4 bytes)

    tunnel_options_length = (1 + length) # 1 word for the header + data length
    parsed_length += tunnel_options_length

    header_bytes = header_bytes[4:] # drop the 4 header bytes
    data = header_bytes[:length * 4]
    header_bytes = header_bytes[length * 4:] # drop the data bytes

    self.tunnel_options.append(GeneveTunnelOptions(
        option_class=option_class,
        option_type=option_type,
        reserved=reserved,
        length=length,
        data=data
    ))

The Geneve header also specifies the protocol of the data it encapsulates, in the Protocol Type field. In our example this is IPv4, because our original source packet (the GET request sent by the source EC2 instance) was an IPv4 request.

The original packet

Once the Geneve header has been parsed we know its length (2 * 4 bytes for the header + opt_len * 4 bytes for the options). When we pop these bytes from the top of the data (data = data[geneve_header.length_bytes:] - GitHub link), the data that remains is the original packet sent by our EC2 instance. We will inspect that packet to determine if it should be allowed or dropped. More about that later.

About Flow

The third TLV in the Geneve options has option class 0x0108 (AWS) and option type 3 (32-bit Flow Cookie). This cookie is the same for every packet in a flow. So what is a flow?

Any transfer of data consists of an exchange of a number of packets. A well known example is the TCP 3-way handshake, in which the client sends a SYN packet, the server responds with a SYN-ACK packet, and the client responds with a ACK packet again. After these three packets, the actual data in the request will be transmitted.

Let’s say our proxy should only allow outbound traffic. This would allow the first outbound packet (SYN), but it would drop the response (SYN-ACK) because that is an inbound packet. Dropping this packet breaks the transaction. However, these packets all share the same 32-bit Flow Cookie. This allows us to analyze the first packet (SYN), and then store the Flow Cookie locally. When the SYN-ACK arrives, we check our known Flow Cookies and see that this flow has been previously allowed. This means that we should allow any subsequent packet in the flow as well, which allows response packets to pass the proxy.

Additionally, without Flow Cookies we would need to inspect every packet, which would be extremely resource intensive. By storing and querying the Flow Cookies, we only need to inspect the first packet of every flow.

The first packet in every flow also determines the direction of the flow: if the source address in the inner IPv4 address is local (a non-public IP), the direction is outbound. Conversely, if the source address is a public IP address, the direction is inbound (GitHub link).

direction = Flow.DIR_OUTBOUND if source_address.is_private else Flow.DIR_INBOUND

Packet inspection

GeneveProxy applies three types of verification on every flow:

  • Direction allowed
  • Transport (TCP / UDP / ICMP) allowed
  • Destination port allowed

Direction

Through a config file, the application can be configured to drop inbound flows, drop outbound flows or allow both (technically it could also drop both directions, but well…).

if direction == Flow.DIR_OUTBOUND and self._config.outbound.get('drop_all_traffic'):
    print('Flow dropped because all outbound traffic is blocked')
    flow_stack.set_flow(flow_cookie, direction_allowed=False) # block this flow
    return False
if direction == Flow.DIR_INBOUND and self._config.inbound.get('drop_all_traffic'):
    print('Flow dropped because all inbound traffic is blocked')
    flow_stack.set_flow(flow_cookie, direction_allowed=False) # block this flow
    return False

(GitHub link)

Transport

When the direction is allowed, the application uses an allow list and a block list to determine if the transport protocol (TCP / UDP / ICMP) is allowed. The config looks like this:

# If `allowed_transport_protocols` is set and has values, the proxy
# will drop any outbound flows using transport protocols not present in the list.
allowed_transport_protocols:
- 0x0006 # TCP
- 0x0011 # UDP

# If `blocked_transport_protocols` is set and has values, the proxy
# will drop any outbound flows using transport protocols present in the list.
blocked_transport_protocols:
- 0x0001 # ICMP

(GitHub link)

Destination port

And finally, when the transport protocol is permitted the destination port is checked. This too uses an allow list and a block list:

# If `allowed_application_ports` is set and has values, the proxy
# will drop any outbound flows using application protocols not present in the list.
allowed_application_ports:
- 443 # HTTPS
- 80 # HTTP

# If `blocked_application_ports` is set and has values, the proxy
# will drop any outbound flows using application protocols present in the list.
blocked_application_ports:
- 53 # DNS

(GitHub link)

Keep in mind that GeneveProxy is a reference application. While these three checks allow for rudimentary packet filtering, they won’t block most malicious traffic. To achieve effective filtering the application would likely need to perform deep packet inspection, for example to verify that port 443 is actually used for HTTPS (and not a VPN or anything else), and the domains visited through HTTP and HTTPS are known to be safe.

When the packet passes all checks, its flow cookie is stored with an allowed state, and the application returns the packet back to the Gateway Load Balancer.

Return packet

Returning the packet to the GWLB is a slightly different process from most ethernet flows. Our application received a packet on its destination port (6081), from a source port (24810, in our example). Usually, return traffic would originate from the application’s local port (6081) and back to the source’s port (24810). In case of GWLB, we don’t switch these ports. In other words, our return traffic is sent to the GWLB’s port 6081, from our source port 24810.

To achieve this, we switch the source and destination addresses in the outer IPv4 address, while maintaining the UDP source and destination ports. The original headers we received were:

Outer IPv4 Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL=5 |    DSCP   |ECN|        Total Length=60        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Identification        |Flags|      Fragment Offset    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Time to Live | Protocol = 17 |         Header Checksum       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                  Source Address=10.0.0.230                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|               Destination Address=10.0.10.132                 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

UDP Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Source Port=24810       |     Destination Port=6081     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Length=108           |           Checksum            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

What we’re sending back is the following:

Outer IPv4 Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL=5 |    DSCP   |ECN|        Total Length=60        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Identification        |Flags|      Fragment Offset    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Time to Live | Protocol = 17 |   (Updated) Header Checksum   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                 Source Address=10.0.10.132                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|               Destination Address=10.0.0.230                  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

UDP Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Source Port=24810       |     Destination Port=6081     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Length=108           |           Checksum            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Swapping these addresses changes the IPv4 checksum, so we need to recalculate that before returning the traffic (GitHub link):

# Swap source and destination for response
outer_ipv4_header.swap_source_dest()
outer_ipv4_header.ttl -= 1
outer_ipv4_header.update_checksum()

When this is done, we pack all the headers and data back together again (GitHub link):

# Prepare the reponse packet to return to the GWLB.
response_packet = b''.join([
    outer_ipv4_header.as_bytes(),
    original_udp_header_bytes,
    original_geneve_header_bytes,
    data
])

The result is a full stack of six sections (five headers + data), just like we initially received:

Outer IPv4 Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL=5 |    DSCP   |ECN|        Total Length=60        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Identification        |Flags|      Fragment Offset    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Time to Live | Protocol = 17 |   (Updated) Header Checksum   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                 Source Address=10.0.10.132                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|               Destination Address=10.0.0.230                  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

UDP Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Source Port=24810       |     Destination Port=6081     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Length=108           |           Checksum            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Geneve Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver| Opt Len=8 |O|C|    Rsvd.  |  Protocol Type=0x0800 (IPv4)  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Virtual Network Identifier (VNI)=0      |    Reserved   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Option Class=0x0108     |     Type=1    |R|R|R| Length=2|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      64-bit GWLBE ENI ID                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Option Class=0x0108     |     Type=2    |R|R|R| Length=2|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|              64-bit Customer Visible Attachment ID            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Option Class=0x0108     |     Type=3    |R|R|R| Length=1|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                      32-bit Flow Cookie                       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Inner IPv4 Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL=5 |    DSCP   |ECN|        Total Length=60        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Identification        |Flags|      Fragment Offset    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Time to Live |  Protocol = 6 |         Header Checksum       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                  Source Address=10.1.0.152                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|             Destination Address=209.85.202.138                |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Options                    |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

TCP header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Source Port=51714       |      Destination Port=80      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        Sequence Number                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Acknowledgment Number                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Data |           |U|A|P|R|S|F|                               |
| Offset| Reserved  |R|C|S|S|Y|I|            Window             |
|       |           |G|K|H|T|N|N|                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           Checksum            |         Urgent Pointer        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Options                    |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

HTTP data
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             data                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

These bytes are put back on the wire (5), which sends them to the GWLB. The Load Balancer strips the Geneve headers, and forwards the packet to the PrivateLink endpoint (6). From there, the packet travels over an Internet Gateway to the public internet (7). When a response is received from the external host, the entire process is repeated, but in reverse.

Python specifics

To send a raw packet (including outer IPv4 and UDP headers), we need to set the IP_HDRINCL (header included) option for the socket. Without this option, the operating system adds these headers itself, which messes up our communication. To set this socket option, we add the line geneve_sock.setsockopt(socket.IPPROTO_IP, socket.IP_HDRINCL, 1) to our code (GitHub link).

As described earlier in the article, we’re listening for UDP packets on a raw socket. This means we also receive packets for other UDP ports like 53 (DNS) and 123 (NTP). We add the following code to make sure we only process packets at port 6081 (Geneve) (GitHub link):

if udp_header.destination_port != UDP_PORT:
    # Only process port 6081 packets
    return None

Another side effect of listening on a raw socket is that the operating system does not know we’re listening on port 6081. Because of this, the OS will respond with an udp port 6081 unreachable ICMP response on every inbound packet. We solve this by adding a second non-raw socket that actively listens on port 6081. This socket receives exactly the same packets as the raw socket, but without the IPv4 and UDP headers. We can’t use these packets, so we simply discard them without sending a response (GitHub link):

while True:
    read_sockets, _, _ = select.select(
        [geneve_sock, bind_sock, health_check_socket], [], []
    )
    for selected_sock in read_sockets:
        if selected_sock == geneve_sock:
            data, addr = selected_sock.recvfrom(65565)
            # Only process messages on the geneve_sock.
            response = parse_udp_packet(data, flow_stack, packet_inspector)

            # If `response` is None the packet should be dropped.
            # If the reponse is not None, it should be returned to the GWLB.
            if response:
                selected_sock.sendto(response, addr)
        if selected_sock == bind_sock:
            selected_sock.recvfrom(65565) # Drop packets on the bind_sock
        if selected_sock == health_check_socket:
            conn, _ = selected_sock.accept()
            conn.recv(65565)
            conn.send(hc_response().encode('utf-8'))

Health check

The code above already hinted on a health_check_socket. This is a third socket listening on port 80. When a request arrives on this port, a simple ‘Healthy’ message is returned, letting the GWLB know the application is running and ready to accept traffic. The full response is generated like this (GitHub link):

def hc_response():
    """Generate a health check response."""
    response = 'HTTP/1.1 200 OK\n'
    response_body = 'Healthy'

    response_headers = {
        'Content-Type': 'text/html; encoding=utf8',
        'Content-Length': len(response_body),
        'Connection': 'close',
    }

    response += ''.join(f'{k}: {v}\n' for k, v in response_headers.items())
    response += f'\n{response_body}'
    return response

Conclusion

When we run the application on an EC2 instance connected to a Gateway Load Balancer, it logs the dropped traffic:

Listening
Dropped Inbound flow because port 5504 is not in the allow list
Dropped Inbound flow because port 80 is not in the allow list
Dropped Inbound flow because port 5253 is not in the allow list
Dropped Inbound flow because port 1000 is not in the allow list
Dropped Inbound flow because port 53 is not in the allow list
Dropped Inbound flow because protocol 17 is not in the allow list
Dropped Outbound flow because port 123 is not in the allow list
Dropped Inbound flow because port 7443 is not in the allow list
Dropped Inbound flow because port 5903 is not in the allow list
Dropped Inbound flow because port 8080 is not in the allow list
Dropped Inbound flow because port 27137 is not in the allow list
Dropped Inbound flow because port 1004 is not in the allow list
Dropped Inbound flow because port 444 is not in the allow list
Dropped Outbound flow because protocol 1 is not in the allow list
Dropped Inbound flow because port 8088 is not in the allow list
Dropped Inbound flow because port 9720 is not in the allow list
Dropped Inbound flow because port 8983 is not in the allow list
Dropped Inbound flow because port 445 is not in the allow list
Dropped Outbound flow because port 8080 is not in the allow list
Dropped Outbound flow because port 53 is not in the allow list

There is quite a lot of inbound traffic; these are all scanners trying to identify vulnerable systems on public IP addresses. It’s a good thing we’re blocking all of this!

We’re also seeing some blocked outbound flows; port 123 (NTP), protocol 1 (ICMP), port 8080 and port 53 (DNS). What you want to block depends on your use case, and as said before, you probably want to extend your proxy to do more than just filter protocols and ports.

What this post has shown is how to build a proxy from scratch, removing the need to purchase extensive third-party appliances and providing the flexibility to implement whatever monitoring, throttling or filtering feature your environments require. Again, the full source can be found on GitHub.

I share posts like these and smaller news articles on Twitter, follow me there for regular updates! If you have questions or remarks, or would just like to get in touch, you can also find me on LinkedIn.

Luc van Donkersgoed
Luc van Donkersgoed