listen() 的 backlog 及 TCP 相关参数

@2014-05-17 新版功能: 创建

listen 函数的原型为:int listen(int sockfd, int backlog);, 其中 backlog 的解释在 man 2 listen 里说明如下:

The backlog argument defines the maximum length to which  the  queue  of
pending  connections  for  sockfd  may  grow.   If  a connection request
arrives when the queue is full, the client may receive an error with  an
indication  of  ECONNREFUSED  or,  if  the  underlying protocol supports
retransmission, the request may be ignored so that a later reattempt  at
connection succeeds.

The behavior of the backlog argument on TCP sockets changed  with  Linux
2.2.  Now it specifies the queue length for completely established sock-
ets waiting to be accepted, instead of the number of incomplete  connec-
tion  requests.   The maximum length of the queue for incomplete sockets
can be set using /proc/sys/net/ipv4/tcp_max_syn_backlog.  When  syncook-
ies  are  enabled there is no logical maximum length and this setting is
ignored.  See tcp(7) for more information.

If   the   backlog   argument   is   greater   than   the    value    in
/proc/sys/net/core/somaxconn,  then  it  is  silently  truncated to that
value; the default value in this file is 128.  In kernels before 2.4.25,
this limit was a hard coded value, SOMAXCONN, with the value 128.

可见在 2.2 之后,backlog 对应于已经完成握手但还未被 accept 的连接数。 但该值的大小不能超过 net.core.somaxconn。

somaxconn - INTEGER
    Limit of socket listen() backlog, known in userspace as SOMAXCONN.
    Defaults to 128.  See also tcp_max_syn_backlog for additional tuning
    for TCP sockets.

内核里的代码在 net/socket.c:

SYSCALL_DEFINE2(listen, int, fd, int, backlog)
{
    ... ...

    if (sock) {
        somaxconn = sock_net(sock->sk)->core.sysctl_somaxconn;
        if ((unsigned int)backlog > somaxconn)
            backlog = somaxconn;

    ... ...
}

未完成 TCP 握手的的连接队列长度由另一个参数 net.ipv4.tcp_max_syn_backlog 控制。

tcp_max_syn_backlog - INTEGER
    Maximal number of remembered connection requests, which have not
    received an acknowledgment from connecting client.
    The minimal value is 128 for low memory machines, and it will
    increase in proportion to the memory of machine.
    If server suffers from overload, try increasing this number.

man 7 tcp 里的解释如下:

tcp_max_syn_backlog (integer; default: see below; since Linux 2.2)
       The maximum number of queued connection requests which have still
       not  received  an acknowledgement from the connecting client.  If
       this number is exceeded, the kernel will begin dropping requests.
       The  default  value  of  256 is increased to 1024 when the memory
       present in the system is adequate  or  greater  (>=  128Mb),  and
       reduced  to 128 for those systems with very low memory (<= 32Mb).
       It is recommended that if this needs to be increased above  1024,
       TCP_SYNQ_HSIZE   in   include/net/tcp.h   be   modified  to  keep
       TCP_SYNQ_HSIZE*16<=tcp_max_syn_backlog, and the kernel be  recom-
       piled.

该值的实际大小参考 net/core/request_sock.c 里的代码:

int reqsk_queue_alloc(struct request_sock_queue *queue,
              unsigned int nr_table_entries)
{
    ... ...

    nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
    nr_table_entries = max_t(u32, nr_table_entries, 8);
    nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);

    ... ...
}

nr_table_entries 的初始值为 listen 的 backlog 值,经过一系列调整后该值大小为 2 的幂,最小值为 16。上述的 nr_table_entries + 1 也正是某些文档里推荐 listen 的 backlog 值为 511 的原因。在 2.6.20 之前该值为固定值,相应的 commit 参考 这里

在 listen(2) 的文档里也提到了,如果打开了 syncookie (net.ipv4.tcp_syncookies=1) 选项,则 net.ipv4.tcp_max_syn_backlog 的值等价于被忽略。 事实上上述代码的相关分析可以从 net/ipv4/tcp_ipv4.c 里的函数 tcp_syn_flood_action() 处开始切入。SYN Cookie 工作时,内核会输出如下信息:

TCP: TCP: Possible SYN flooding on port 80. Sending cookies.  Check SNMP counters.

SYN Flood 防护相关的几个参数附带说明如下:

tcp_synack_retries - INTEGER
    Number of times SYNACKs for a passive TCP connection attempt will
    be retransmitted. Should not be higher than 255. Default value
    is 5, which corresponds to 31seconds till the last retransmission
    with the current initial RTO of 1second. With this the final timeout
    for a passive TCP connection will happen after 63seconds.

tcp_syncookies - BOOLEAN
    Only valid when the kernel was compiled with CONFIG_SYN_COOKIES
    Send out syncookies when the syn backlog queue of a socket
    overflows. This is to prevent against the common 'SYN flood attack'
    Default: 1

    Note, that syncookies is fallback facility.
    It MUST NOT be used to help highly loaded servers to stand
    against legal connection rate. If you see SYN flood warnings
    in your logs, but investigation>shows that they occur
    because of overload with legal connections, you should tune
    another parameters until this warning disappear.
    See: tcp_max_syn_backlog, tcp_synack_retries, tcp_abort_on_overflow.

    syncookies seriously violate TCP protocol, do not allow
    to use TCP extensions, can result in serious degradation
    of some services (f.e. SMTP relaying), visible not by you,
    but your clients and relays, contacting you. While you see
    SYN flood warnings in logs not being really flooded, your server
    is seriously misconfigured.

    If you want to test which effects syncookies have to your
    network connections you can set this knob to 2 to enable
    unconditionally generation of syncookies.