Azure - Machine TCP settings

This article explains how to configure the OS on Azure Virtual Machine to properly handle connections going through the Azure-provided outbound proxy/NAT.






1 - Context

When you create Azure virtual machines with a public IP address the machine is able to directly connect to the internet (it has a resolvable IP). When the machine is created inside a private network with a private IP only outbound connections need to go through a proxy/NAT which has a resolvable public IP address.

To simplify the setup of private virtual machines Azure provides a shared NAT to all customers. This shared NAT however has restrictions in terms of timeout to ensure that the bandwidth is equally shared between customers.

Because of this restriction you may see in your application logs some errors indicating that the connection to a remote service has been lost.

E.g. with Rails

ActiveRecord::StatementInvalid: Mysql2::Error: MySQL server has gone away

This kind of errors means that your operating system is trying to keep the connection open for longer that what the NAT allows. This issue can be fixed by applying the configuration setting below.

2 - OS Configuration

To ensure connections are not lost beyond the 4 minutes timeout limit, you should make sure either your application keeps the session alive, or you can configure the underlying operating system to do so. 

For Linux, you should change the kernel variables below:

sysctl -w net.ipv4.tcp_keepalive_time=120
sysctl -w net.ipv4.tcp_keepalive_intvl=30
sysctl -w net.ipv4.tcp_keepalive_probes=8


The settings above ensure a keep alive packet is sent after 2 minutes (120 seconds) of idle time, and then sent every 30 seconds. And if 8 of those packets fail, the session is dropped

3 - Further details

Original Azure article on this issue: https://github.com/Microsoft/azure-docs/blob/master/includes/guidance-tcp-session-timeout-include.md