如何使用 Linux 中的 TCP keepalive?

tcp-keepalive 是什么

1. tcp-keepalive,顾名思义,它可以尽量让 TCP 连接“活着”,或者让一些对方无响应的 TCP 连接“宣告死亡”。

2. 一些特定环境,防火墙会自动断开长期无活动的 TCP 连接,tcp-keepalive 可以在连接无活动一段时间后,发送一个空 ack,使 TCP 连接不会被防火墙关闭。

3. 一些时候,对方的服务器可能出现宕机或者网络中断等问题, tcp-keepalive 可以帮助断开这些无响应的连接。

4. tcp-keepalive 需要在应用程序层面针对其所用到的 Socket 进行开启。操作系统层面无法强制所有 socket 启用 tcp-keepalive. (本文在 CentOS/RHEL 6/7 环境进行测试)

Linux 中的相关参数

在 Linux 中,可以以下3个参数来调整 tcp-keepalive:

net.ipv4.tcp_keepalive_time = 30
net.ipv4.tcp_keepalive_probes = 2
net.ipv4.tcp_keepalive_intvl = 5

1. 距离 TCP 最后一次活动后,等待 tcp_keepalive_time (30s),之后触发 TCP keepalive 的 probe 行动。

2. 如果 probe 没有得到对方回应,则每隔 tcp_keepalive_intvl (5s) 进行一次重试。一共发送 tcp_keepalive_probes (2个) probe.

Probe 实际上是向对方发送一个 ack,这个 ack 包不带实际数据。

如何启用 TCP Keepalive

tcp-keepalive 需要在应用程序层面启动。以 C 语言为例,首先应用程序会创建一个 socket:

  /* Create the socket */
   if((s = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0) {
      perror("socket()");
      exit(EXIT_FAILURE);
   }

然后为这个 socket 添加 SO_KEEPALIVE 选项,即可启用 tcp-keepalive:

   if(setsockopt(s, SOL_SOCKET, SO_KEEPALIVE, &optval, optlen) < 0) {
      perror("setsockopt()");
      close(s);
      exit(EXIT_FAILURE);
   }

我们可以参考 openssh-server. 它默认开启了 tcp-keepalive:

# man 5 sshd_config

TCPKeepAlive

     Specifies whether the system should send TCP keepalive messages to the other side.  If they are sent, death of the connection or crash of one
     of the machines will be properly noticed.  However, this means that connections will die if the route is down temporarily, and some people
     find it annoying.  On the other hand, if TCP keepalives are not sent, sessions may hang indefinitely on the server, leaving “ghost” users and
     consuming server resources.


     The default is “yes” (to send TCP keepalive messages), and the server will notice if the network goes down or the client host crashes.  This
     avoids infinitely hanging sessions.

     To disable TCP keepalive messages, the value should be set to “no”.

从 openssh-server 的源码中,我们可以看到它使用了 SO_KEEPALIVE 选项:

/* openssh-5.3p1/sshd.c */

1774         /* Set SO_KEEPALIVE if requested. */
1775         if (options.tcp_keep_alive && packet_connection_is_on_socket() &&
1776             setsockopt(sock_in, SOL_SOCKET, SO_KEEPALIVE, &on, sizeof(on)) < 0)
1777                 error("setsockopt SO_KEEPALIVE: %.100s", strerror(errno));

查看 tcp-keepalive 状态

我们可以用 `# netstat -no` 命令来查看当前哪些 tcp 连接开启了 tcp keepalive.

[root@rhel674 ~]# netstat -no | grep ESTAB
tcp        0      0 192.168.122.91:22           192.168.122.1:52646         ESTABLISHED keepalive (3187.25/0/0) <<--- sshd 默认开启了 keepalive.
tcp        0      0 192.168.122.91:22           192.168.122.1:52649         ESTABLISHED keepalive (4466.22/0/0)
tcp        0      0 192.168.122.91:22           192.168.122.1:52648         ESTABLISHED keepalive (4458.21/0/0) 
tcp        0      0 192.168.122.91:8000         192.168.122.204:35201       ESTABLISHED off (0.00/0/0)   <<--- 某一其他程序没有开启 keepalive.

参考文档

TCP Keepalive HOWTO