Interactive AWS NAT Gateway

If you see a question on any AWS certification exam where the choice is between a Managed NAT Gateway and a NAT Instance, the answer is invariably going to be the Managed NAT Gateway.

For most cases, I think this is indeed the correct answer. It's scalable, very easy to deploy, and best of all, it is managed for you. However, the downside, as with any managed service, is that it becomes expensive at some point. Usually that point is when processing terabytes of data, but it could also be when there is little to no traffic at all. Consider, for example, using Managed NAT Gateways in every non-production account across 100s of VPCs - that cost will soon add up. The point of this blog post is to look at those situations and explore alternative solutions.

This blog post is interactive. As you click buttons or complete steps, keep an eye on the textboxes as their contents will change when you interact with the page.

Deploy

Before looking at costs, it is worth remembering why a NAT device is required.

Below is a public and a private subnet deployed in a VPC in an unspecified region and availability zone. The private subnet contains EC2 instances running applications that need to send data to some third-party application on the internet. However, with the current setup, data is not reaching the third-party application.

It is failing because resources in the private subnet cannot reach resources on the internet directly; only resources running in the public subnet can do that. This is where a NAT device comes in.

A NAT device, like a Managed NAT Gateway, allows applications in the private subnet to connect to other applications on the internet, but crucially it does not accept incoming requests from applications on the internet. If this were not a requirement, then a private subnet would not even be needed!

For the demo below, the chosen NAT device is a Managed NAT Gateway. Open source solutions are available and will be discussed in the cost section of this blog. For now, work through the demo using AWS native solutions to enable traffic flow from the EC2 instances to the internet.

Step 1: Deploy NAT Gateway

The EC2 instances () in the private subnet need to reach the internet through the Internet Gateway ().

The first step is to deploy the NAT Gateway into the VPC. Drag the NAT Gateway below into the correct subnet.

React Flow

AWS console does not provide a drag-and-drop interface, but nonetheless deploying a NAT Gateway using the console or an IaC tool is quite straightforward. Best of all, there is nothing else to do after deployment - it simply works unless certain limits are hit, which will be discussed later. There are no security groups or IAM policies involved, and as demonstrated through the demo, configuration occurs entirely through route tables. Though this route-based management might present challenges depending on an organisation's security requirements.

On the other hand, if an organisation decides to use NAT instances or an open source solution, then ongoing management will be required. The difficulty of this task will vary between organisations and depends on many factors, some of which may not be related to patching the instances. But, in some cases, a self-managed solution can be a sensible choice.

The next section examines the cost of using a NAT Gateway at different data processing volumes and explores possible solutions to reduce cost.

Cost

Pricing for NAT Gateway is straightforward. AWS charges based on the following:

Data processed by the NAT Gateway (in GB)
Number of hours the NAT Gateway has been running
Number of hours the Elastic IP has been leased

The demo below shows how this pricing structure works on a monthly basis. It simulates a system downloading different amounts of data from the internet using the same setup as before. For organisations that heavily use NAT Gateways, data processed often accounts for a large portion of the bill.

Try increasing the monthly volume of data processed using the slider below and see how the costs change. As you adjust the volume, check the textbox for tips to reduce costs at different volumes.

React Flow

data processed:

0GB

512GB

256TB

1PB

0GB | $35

Use serverless solutions such as DynamoDB and Lambda and save $35 per month.

If hosting on AWS is required, use f*k-nat created by Andrew Guenther . This is a self-managed solution. But unlike AWS NAT Instances (also self-managed), the AMIs used by f*k-nat are patched with the latest security updates. The smallest instance (t4g.nano) cost only $3 per month.

It is a good alternative for organisations with multiple non-production AWS accounts that contain many private subnets. At $35 per NAT Gateway, costs can add up in decentralised environments.

The demo above focuses on traffic entering AWS. While costs are still high at large volumes, sending data out of AWS will double the cost. Below is the same demo as earlier, but with the traffic direction reversed.

React Flow

data processed:

0GB

512GB

256TB

1PB

Egress costs are high across all cloud providers, but at least they use a tiered pricing model where costs drop as data volume grows. Organisations with low egress volumes may not realise these costs because AWS includes 100 GB of free outbound data transfer per month.

So far, the focus has been on data processing costs, but the cost of the NAT Gateway instances themselves can also be significant if they are deployed in every VPC. In most production environments, high availability is a requirement, which typically means deploying NAT Gateway instances across three or more Availability Zones. This is considered good practice because relying on a single NAT Gateway introduces operational risks, such as hitting service limits. Some of these limits are explored in the next section

Limits

AWS NAT Gateway has a few limitations . This section examines the maximum packet rate. Adjust the average packet rate below; assume each packet has a size of 1500 bytes.

React Flow

avg (packet rate):

1M / sec

10M / sec

avg (packet size):

1500 bytes

NAT gateway metrics

Count

Time

PacketsDropCount

System Healthy

When a NAT Gateway is deployed, AWS automatically creates the NATGateway namespace in CloudWatch. If there is an issue, the metrics within this namespace provide relevant information. For this demo, the PacketsDropCount metric has been selected, but other metrics can also offer useful insights. If PacketsDropCount is nonzero and consistently high, it indicates a problem. Understanding the root cause of dropped packets would require analysing the VPC Flow Logs.

Conclusion

This is my first blog post, and to be honest, an article around NAT Gateway was not my first choice. I originally planned to write an article on Transit Gateways but decided to start with something simpler. I am glad I did—because even this article turned out to be a challenge! I will get to Transit Gateways eventually.