AWS Transit Gateway for connecting to on-premise: A thorough study

August 21, 2020

AWS Transit Gateway for connecting to on-premise: A thorough study

AWS released the Transit Gateway (TGW) back in 2018. It provided a breakthrough in enabling customers to connect Amazon Virtual Private Clouds (VPCs) and their on-premises networks using a single gateway. On its own, the TGW is a really powerful service but when paired with other resources like the Direct Connect (DX), some limitations start to appear.

Example Architecture

To simplify things, I provide below a visual representation of an architecture that we will study for its power as well as the limitations that come from it.

TGW DX VPN

In this example architecture, you see the usage of a TGW to connect the on-premise environment with VPCs that live within different AWS accounts. It also enables those VPCs to talk to each other. The connectivity between on-premise and AWS is facilitated by a DX connection, with a Site-to-Site VPN as a failover.

Associated Resources

In order to achieve the infrastructural setup that is visualized above, one needs to follow the following steps:

  1. Request a DX connection of at least 1 Gbps to the relevant AWS Region within the TGW account.
  2. Create a DX Gateway.
  3. Create a TGW.
  4. Create a VPN Customer Gateway and VPN Connections directly to the TGW.
  5. Associate the TGW with the DX Gateway. At this point, the allowed prefixes also need to be advertised to the on-premise network.
  6. Create a Transit Virtual Interface for the DX Connection-Gateway pair.
  7. Share the TGW resource with the relevant AWS Accounts using the AWS Resource Access Manager (RAM) service.

DX Connection – VPN Failover

Once you have everything deployed and working, it is easy to test the failover of the DX Connection over to the VPN. At this point, the TGW Route Table should have at least a DX Gateway and a VPN associated with it. In the AWS Console, you can trigger a Failover test under the Transit Virtual Interface by selecting “Bring down BGP”.

DX BGP FAILOVER

By inspecting closely the TGW Route Table, one can then see the preferred route over the DX Gateway switched to the one that targets the VPN connection.

Limitations of the described solution

DX Connections

Transit virtual interfaces are only available over dedicated connections or hosted connections with speeds of 1 Gbps or greater. Transit virtual interfaces are not available for hosted AWS Direct Connect connections with speeds of 500 Mbps and below, also known as a sub-1 Gbps hosted AWS Direct Connect connection.

Propagation of VPC CIDR blocks to on-premise

When a new VPC is associated with the TGW, the route table of the TGW is automatically updated. In order to advertise the CIDR block of each VPC to the on-premise network though, that requires a manual action. This is related to the allowed prefixes that were mentioned earlier in around the association of the TGW with the DX Gateway. One needs to explicitly update that list to include the new CIDR block. Furthermore, the exact CIDR block needs to be advertised and not a superset of it. For more information you can check here.

Another limitation related to the propagation of CIDR block to on-premise is the fact that the maximum number of allowed prefixes is capped at 20. This means that you can connect to on-premise a maximum of 20 VPCs. This is a major limitation considering the recommendation from AWS to isolate workloads into different accounts and VPCs as much as possible. It becomes an even bigger problem when DTAP comes in the picture.

The last point can be tackled in 2 ways:

  1. Follow the DTAP approach also for the TGW implementation and have a Production and a non-Production TGW.
    An example architecture of this is shown below.

TGW DX VPN DTAP

  1. Connect up to 3 TGWs to the same DX Gateway.
    An example abstracted architecture of this is shown below.

Multiple TGWs for DX Gateway

Both of these solutions are suffering from the same problem; you cannot peer, as of the time of writing this article (August, 2020), TGWs in the same region. The fact that all these TGWs are connected to the same DX Gateway does not create the magic bridge one would hope for. This automatically means that some of your VPCs won’t be able to talk to other VPCs, unless they are associated with the same TGW.

Conclusion

Using the TGW to connect the on-premise environment to VPCs on AWS could simplify things a lot. On the other hand, one needs to carefully plan and execute this solution taking into consideration the growth that is expected for the solution on AWS, in order to avoid hiccups down the road. Knowing AWS, these limitations should one by one be tackled in future announcements, but for now, this is the state of things.

Konstantinos Bessas

Konstantinos Bessas

Cloud Systems Engineer