tcp-keepalive 是什么
1. tcp-keepalive,顾名思义,它可以尽量让 TCP 连接“活着”,或者让一些对方无响应的 TCP 连接“宣告死亡”。
2. 一些特定环境,防火墙会自动断开长期无活动的 TCP 连接,tcp-keepalive 可以在连接无活动一段时间后,发送一个空 ack,使 TCP 连接不会被防火墙关闭。
3. 一些时候,对方的服务器可能出现宕机或者网络中断等问题, tcp-keepalive 可以帮助断开这些无响应的连接。
4. tcp-keepalive 需要在应用程序层面针对其所用到的 Socket 进行开启。操作系统层面无法强制所有 socket 启用 tcp-keepalive. (本文在 CentOS/RHEL 6/7 环境进行测试)
Linux 中的相关参数
在 Linux 中,可以以下3个参数来调整 tcp-keepalive:
net.ipv4.tcp_keepalive_time = 30 net.ipv4.tcp_keepalive_probes = 2 net.ipv4.tcp_keepalive_intvl = 5
1. 距离 TCP 最后一次活动后,等待 tcp_keepalive_time (30s),之后触发 TCP keepalive 的 probe 行动。
2. 如果 probe 没有得到对方回应,则每隔 tcp_keepalive_intvl (5s) 进行一次重试。一共发送 tcp_keepalive_probes (2个) probe.
Probe 实际上是向对方发送一个 ack,这个 ack 包不带实际数据。
如何启用 TCP Keepalive
tcp-keepalive 需要在应用程序层面启动。以 C 语言为例,首先应用程序会创建一个 socket:
/* Create the socket */ if((s = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0) { perror("socket()"); exit(EXIT_FAILURE); }
然后为这个 socket 添加 SO_KEEPALIVE 选项,即可启用 tcp-keepalive:
if(setsockopt(s, SOL_SOCKET, SO_KEEPALIVE, &optval, optlen) < 0) { perror("setsockopt()"); close(s); exit(EXIT_FAILURE); }
我们可以参考 openssh-server. 它默认开启了 tcp-keepalive:
# man 5 sshd_config TCPKeepAlive Specifies whether the system should send TCP keepalive messages to the other side. If they are sent, death of the connection or crash of one of the machines will be properly noticed. However, this means that connections will die if the route is down temporarily, and some people find it annoying. On the other hand, if TCP keepalives are not sent, sessions may hang indefinitely on the server, leaving “ghost” users and consuming server resources. The default is “yes” (to send TCP keepalive messages), and the server will notice if the network goes down or the client host crashes. This avoids infinitely hanging sessions. To disable TCP keepalive messages, the value should be set to “no”.
从 openssh-server 的源码中,我们可以看到它使用了 SO_KEEPALIVE 选项:
/* openssh-5.3p1/sshd.c */ 1774 /* Set SO_KEEPALIVE if requested. */ 1775 if (options.tcp_keep_alive && packet_connection_is_on_socket() && 1776 setsockopt(sock_in, SOL_SOCKET, SO_KEEPALIVE, &on, sizeof(on)) < 0) 1777 error("setsockopt SO_KEEPALIVE: %.100s", strerror(errno));
查看 tcp-keepalive 状态
我们可以用 `# netstat -no` 命令来查看当前哪些 tcp 连接开启了 tcp keepalive.
[root@rhel674 ~]# netstat -no | grep ESTAB tcp 0 0 192.168.122.91:22 192.168.122.1:52646 ESTABLISHED keepalive (3187.25/0/0) <<--- sshd 默认开启了 keepalive. tcp 0 0 192.168.122.91:22 192.168.122.1:52649 ESTABLISHED keepalive (4466.22/0/0) tcp 0 0 192.168.122.91:22 192.168.122.1:52648 ESTABLISHED keepalive (4458.21/0/0) tcp 0 0 192.168.122.91:8000 192.168.122.204:35201 ESTABLISHED off (0.00/0/0) <<--- 某一其他程序没有开启 keepalive.