如何验证 ulimit 中的资源限制?如何查看当前使用量?

ulimit 可以限制进程的资源使用量。在设置 ulimit 之后,如何验证各个资源限制是有效的?如何查看每个进程当前使用了多少资源?本文将尝试对多个设定值进行演示。

ulimit 的设定,是 per-user 的还是 per-process 的?

我们从 ulimit 中看到的设定值,都是 per-process 的。也就是说,每个进程有自己的 limits 值。

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 46898
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 46898
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

ulimit, limits.conf 和 pam_limits 的关系?

首先,limit的设定值是 per-process 的。在 Linux 中,每个普通进程可以调用 getrlimit() 来查看自己的 limits,也可以调用 setrlimit() 来改变自身的 soft limits。 而要改变 hard limit, 则需要进程有 CAP_SYS_RESOURCE 权限。另外,进程 fork() 出来的子进程,会继承父进程的 limits 设定。具体可参考 getrlimit/setrlimit/prlimit 的手册。

`ulimit` 是 shell 的内置命令。在执行`ulimit`命令时,其实是 shell 自身调用 getrlimit()/setrlimit() 来获取/改变自身的 limits. 当我们在 shell 中执行应用程序时,相应的进程就会继承当前 shell 的 limits 设定。

那么一个 shell 的初始 limits 是谁设定的? 通常是 pam_limits 设定的。顾名思义,pam_limits 是一个 PAM 模块,用户登录后,pam_limits 会给用户的 shell 设定在 limits.conf 定义的值。我们可以开启 pam_limits 的 debug 来查看大致过程:

[root@rhel674 ~]# cat /etc/security/limits.conf | grep -v ^# | grep -v ^$
test        hard    memlock         102410

[root@rhel674 ~]# grep pam_limits /etc/pam.d/password-auth-ac
session     required      pam_limits.so debug

[root@rhel674 ~]# tail /var/log/secure
Sep  4 00:12:49 rhel674 sshd[3151]: Accepted publickey for test from 192.168.122.1 port 41556 ssh2
Sep  4 00:12:49 rhel674 sshd[3151]: pam_limits(sshd:session): reading settings from '/etc/security/limits.conf'
Sep  4 00:12:49 rhel674 sshd[3151]: pam_limits(sshd:session): process_limit: processing hard memlock 102410 for USER
Sep  4 00:12:49 rhel674 sshd[3151]: pam_limits(sshd:session): reading settings from '/etc/security/limits.d/90-nproc.conf'
Sep  4 00:12:49 rhel674 sshd[3151]: pam_limits(sshd:session): process_limit: processing soft nproc 1024 for DEFAULT
Sep  4 00:12:49 rhel674 sshd[3151]: pam_unix(sshd:session): session opened for user test by (uid=0)

在 limits.conf 中,对 test 用户设定 memlock 的 hard limit 为 102410. 当用户通过 SSH 登录后,可以看到 pam_limits 为该会话设定了 memlock=102410.

所以,ulimit, limits.conf 和 pam_limits 的关系,大致是这样的:
1. 用户进行登录,触发 pam_limits;
2. pam_limits 读取 limits.conf,相应地设定用户所获得的 shell 的 limits;
3. 用户在 shell 中,可以通过 ulimit 命令,查看或者修改当前 shell 的 limits;
4. 当用户在 shell 中执行程序时,该程序进程会继承 shell 的 limits 值。于是,limits 在进程中生效了。

如何查看某个进程的 limits,以及当前使用量?

进程自己可以通过 getrlimit()/prlimit() 来获得当前 limits 设定值。系统管理员可以通过 /proc//limits 来查看某个进程的 limits 设定值。

要查看某个进程的资源使用量,通常可以通过 /proc//stat 和 /proc//status 来查看。具体某个值的含义,可以参考 proc 的手册。

另外,进程自己也可以调用 getrusage() 获知自身的资源使用量。

接下来将通过一些小程序,分别验证各个 limits.

core - Max core file size

RLIMIT_CORE
      Maximum size of core file.  When 0 no core dump files are created.  When non-zero, larger dumps are truncated to this size.

如果进程意外中止,通常会在当前工作目录(CWD)下,生成一个 core 文件。通俗地说就是死后留全尸,好让调查死亡原因。core 这个 limit 规定的是生成的 core 文件的最大大小。

当前工作目录可以这样查看:

[root@rhel674 ~]# ll -d /proc/1899/cwd
lrwxrwxrwx. 1 root root 0 Sep  3 21:27 /proc/1899/cwd -> /root

系统可以通过 core_pattern 参数,定义系统全局 core 文件保存的位置及文件名字。具体可参考 kernel-doc-2.6.32/Documentation/sysctl/kernel.txt.

以下小程序将帮助我们验证 core 的限制。这个程序会开辟一块 10MB 的数组,当程序意外终止时,core 文件将会有 10+MB.

/*******************************************
 * Filename: big-core.c
 * Usage: ./big-core
 * Author: feichashao@gmail.com
 * Description:
 *    Demo for `ulimit -c`.
 *    This program will allocate a large array and keep writing on it.
 *    We could send signal to kill this process and see if we can collect
 *    the coredump.
 *
 *******************************************/


#include <stdio.h>
#include <stdlib.h>

#define SIZE 10*1024*1024 // 10MB

int main(int argc, char *argv[])
{
    char str[SIZE];
    unsigned long int i;
    // Keep writing useless things.
    while(1){
        for(i = 0; i < SIZE; ++i){
            str[i] = 'a';
        }
    }
    return 0;
}

在没有 core 大小限制时,可以看到 core 文件是 11M:

[test@rhel674 tmp]$ ulimit -c
unlimited
[test@rhel674 tmp]$ ./big-core &
[1] 3781
[test@rhel674 tmp]$ kill -4 3781
[test@rhel674 tmp]$
[1]+  Illegal instruction     (core dumped) ./big-core
[test@rhel674 tmp]$ ls -lh core.3781
-rw-------. 1 test test 11M Sep  4 00:46 core.3781

在设定 core 限制为 5000KB 后,生成的 core 则被裁剪到 4.9M:

[test@rhel674 tmp]$ ulimit -c
5000
[test@rhel674 tmp]$ ./big-core &
[1] 3836
[test@rhel674 tmp]$ kill -4 3836
[test@rhel674 tmp]$
[1]+  Illegal instruction     (core dumped) ./big-core
[test@rhel674 tmp]$ ls -lh core.3836
-rw-------. 1 test test 4.9M Sep  4 00:49 core.3836

nice - Max nice priority

RLIMIT_NICE (since Linux 2.6.12, but see BUGS below)
      Specifies a ceiling to which the process’s nice value can be raised using setpriority(2) or nice(2).  The actual ceiling for the nice  value is calculated as 20 - rlim_cur.  (This strangeness occurs because negative numbers cannot be specified as resource limit values, since they typically have special meanings.  For example, RLIM_INFINITY typically is the same as -1.)

nice 限制的是用户进程所能调整到的最高优先级。nice 的范围是 [-20,19],数字越小,优先级越高。在 limits.conf 中,我们可以直接以[-20,19]进行设定。在 ulimit 中,由于不能使用负数,所以设定值应为 20 - 有效值。要在 shell 中调整某个进程的优先级,可以使用 renice 命令。以下是在 RHEL6.7 进行测试。

设定 nice 的限制为 0,即进程的优先级不能高于 0.(数字越小,优先级越高)

[root@rhel674 ~]# cat /etc/security/limits.conf | grep -v ^# | grep -v ^$
test        soft    nice    0
test        hard    nice    0

[test@rhel674 ~]$ ulimit -e
20

[test@rhel674 ~]$ sleep 600s &
[1] 6588

[test@rhel674 ~]$ cat /proc/6588/stat | awk '{print $19}' ### 查看进程当前优先级
0

[test@rhel674 ~]$ renice +10 6588  ### 调整进程优先级为 10.
6588: old priority 0, new priority 10

[test@rhel674 ~]$ cat /proc/6588/stat | awk '{print $19}'
10

[test@rhel674 ~]$ renice 0 6588  ### 调整进程优先级为 0.
6588: old priority 10, new priority 0

[test@rhel674 ~]$ renice -1 6588  ### 调整进程优先级为 -1, 超过限制,失败。
renice: 6588: setpriority: Permission denied

如果设定 nice 的限制为 -20,则进程可以被调整为[-20,19]的任意优先级:

[root@rhel674 ~]# cat /etc/security/limits.conf | grep -v ^# | grep -v ^$
test        soft    nice    -20
test        hard    nice    -20

[test@rhel674 ~]$ ulimit -e
40

[test@rhel674 ~]$ sleep 600s &
[1] 6683

[test@rhel674 ~]$ cat /proc/6683/stat | awk '{print $19}'
0

[test@rhel674 ~]$ renice -20 6683
6683: old priority 0, new priority -20

[test@rhel674 ~]$ renice +20 6683
6683: old priority -20, new priority 19

fsize - Max file size

RLIMIT_FSIZE
      The maximum size of files that the process may create.  Attempts to extend a file beyond this limit result in delivery of a SIGXFSZ  signal.
      By  default,  this  signal  terminates a process, but a process can catch this signal instead, in which case the relevant system call (e.g. write(2), truncate(2)) fails with the error EFBIG.

fsize 限定的是进程所写文件的大小。例如,fsize 设定为10M,如果进程打开一个20M的文件,则无法在该文件增加内容;如果进程打开的是一个9M的文件,则可以增加1M的内容。

以下程序会创建一个文件,然后每次往文件里添加1M的内容。

/**************************
 * Filename: big_file.c
 * Usage: ./big_file
 * Author: feichashao@gmail.com
 * Description:
 *  This program will create a big file gradually.
 *  Intend to test `ulimit -f`.
 *************************/


#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

#define BUF_SIZE 1*1024*1024  // 1M per write.
#define NR_WRITE 50           // Write how many times.

int main(int argc, char *argv[])
{
    unsigned long int i;
    int j;
    int fd;
    char buf[BUF_SIZE+1];
   
    for( i = 0; i < BUF_SIZE; ++i ){
        buf[i] = 'a';
    }
    buf[BUF_SIZE] = '\0';
   
    fd = open("/tmp/big.txt", O_APPEND | O_CREAT | O_WRONLY, S_IRWXU | S_IRWXG | S_IRWXO);
    if(fd < 0){
        printf("Fail to open file.\n");
        return -1;
    }

    for( j = 0; j < NR_WRITE; ++j ){
        if( write(fd, buf, BUF_SIZE) < 0 ){
            printf("write error.\n");
            return -1;
        }
        printf("Writen %d times.\n", j);
    }

    close(fd);
    return 0;
}

设定 fsize 限制为 10000KB,可以看到写到第10次时,程序被 kill 掉。

[test@rhel674 tmp]$ ulimit -f
10000

[test@rhel674 tmp]$ ./big_file
Writen 0 times.
Writen 1 times.
Writen 2 times.
Writen 3 times.
Writen 4 times.
Writen 5 times.
Writen 6 times.
Writen 7 times.
Writen 8 times.
Writen 9 times.
File size limit exceeded

[test@rhel674 tmp]$ ls -lh big.txt
-rwxrwxr-x. 1 test test 9.8M Sep  4 17:35 big.txt

从 strace 结果,程序是被 SIGXFSZ 信号 kill 掉的。

write(3, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 1048576) = -1 EFBIG (File too large)
+++ killed by SIGXFSZ +++

cpu - Max cpu time

RLIMIT_CPU
      CPU time limit in seconds.  When the process reaches the soft limit, it is sent a SIGXCPU signal.  The default action for this signal is  to terminate the process.  However, the signal can be caught, and the handler can return control to the main program.  If the process continues to consume CPU time, it will be sent SIGXCPU once per second until the hard limit is reached, at which time it is sent SIGKILL.  (This  latter  point  describes  Linux  2.2 through 2.6 behavior.  Implementations vary in how they treat processes which continue to consume CPU time after reaching the soft limit.  Portable applications that need to catch this signal  should  perform  an  orderly  termination  upon  first receipt of SIGXCPU.)

cpu 值限定进程所能占用的 cpu 时间。

以下程序会跑一个循环,可以用它来检验 cpu 时间是否被限制。

/***********************
 * Filename: pure_loop.c
 * Usage: ./pure_loop
 * Author: feichashao@gmail.com
 * Description: Useless loop. Test `ulimit -t`.
 *
 ***********************/


#define NR 100000

int main(int argc, char *argv[])
{
    unsigned long int i,j;
    for(i = 0; i < NR; ++i){
        for(j = 0; j < NR; ++j){
            // do nothing.
        }
    }
    return 0;
}

在没有限制的情况下,这个程序需要消耗 23.983s 来完成:

real    0m23.990s
user    0m23.982s
sys 0m0.001s

(有待确认: 这里的 cpu 时间,是限制 user + sys 的总时间?)

如果设定 cpu 限制为 10s,可以看到程序运行10秒后就被 kill 了:

[test@rhel674 ~]$ cat test.sh
#!/bin/bash
ulimit -t 10
/tmp/pure_loop

[test@rhel674 ~]$ time bash test.sh
test.sh: line 4:  8016 Killed                  /tmp/pure_loop

real    0m10.008s
user    0m10.002s
sys 0m0.004s

memlock - Max locked memory

RLIMIT_MEMLOCK
      The maximum number of bytes of memory that may be locked into RAM.  In effect this limit is rounded down to the nearest multiple of the system  page  size.   This  limit affects mlock(2) and mlockall(2) and the mmap(2) MAP_LOCKED operation.  Since Linux 2.6.9 it also affects the shmctl(2) SHM_LOCK operation, where it sets a maximum on the total bytes in shared memory segments (see shmget(2)) that may be locked by the real  user  ID  of  the calling process.  The shmctl(2) SHM_LOCK locks are accounted for separately from the per-process memory locks established by mlock(2), mlockall(2), and mmap(2) MAP_LOCKED; a process can lock bytes up to this limit in each  of  these  two  categories.   In Linux  kernels before 2.6.9, this limit controlled the amount of memory that could be locked by a privileged process.  Since Linux 2.6.9, no limits are placed on the amount of memory that a privileged process may lock, and this limit instead governs the amount of  memory  that  an unprivileged process may lock.

首先,啥是 locked memory?
有些程序不希望自己申请的内存被 swap 到硬盘中,所以在申请内存时,会加上 MAP_LOCKED 这个 flag,这样就会得到 locked memory.
memlock 就是限制进程所能申请的 locked memory 大小。

下面的程序会申请一个 10M 的 locked memory.

/*******************
 * Filename: memlock.c
 * Usage: ./memlock
 * Author: feichashao@gmail.com
 * Description: Allocate some locked memory.
 *    Test `ulimit -l`
 *******************/

#include <stdio.h>
#include <sys/mman.h>
#include <stdlib.h>

#define SIZE 10*1024*1024 // 10M

int main(int argc, char *argv[])
{
    char *p;
    p = mmap (NULL, SIZE, PROT_READ | PROT_WRITE, MAP_LOCKED | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    if (p == MAP_FAILED){
        printf("mmap failed.\n");
        return -1;
    }
    printf("memory mapped.\n");
    printf("press any key to end.");
    getchar();
    munmap(p, SIZE);
    return 0;
}

在没有 memlock 限制的时候,它能申请到 102400KB locked memory:

[root@rhel674 ~]# ./memlock
memory mapped.

[root@rhel674 ~]# cat /proc/8127/status | grep VmLck
VmLck:     10240 kB

如果限制 memlock 为 10KB,则进程无法申请到内存:

[test@rhel674 ~]$ ulimit -l
102400
[test@rhel674 ~]$ /tmp/memlock
memory mapped.


[test@rhel674 ~]$ ulimit -l
10
[test@rhel674 ~]$ /tmp/memlock
mmap failed.  #### VmLck = 0 KB.

as - Max address space

RLIMIT_AS
      The maximum size of the process’s virtual memory (address space) in bytes.  This limit affects calls to brk(2), mmap(2) and mremap(2), which fail with the error ENOMEM upon exceeding this limit.  Also automatic stack expansion will fail (and generate a SIGSEGV that kills the  process  if  no  alternate stack has been made available via sigaltstack(2)).  Since the value is a long, on machines with a 32-bit long either this limit is at most 2 GiB, or this resource is unlimited.

as 限制进程的地址空间(address space)大小。这里,virtual memory (address space)指的是进程所申请的内存大小,并非实际物理内存(RSS)大小。

下面的程序会申请一个1000MB的内存空间:

/********************
 * Filename: big_as.c
 * Usage: ./big_as
 * Author: feichashao@gmail.com
 * Description: Do a malloc job to test 'ulimit -v'
 ********************/


#include <stdlib.h>
#include <stdio.h>

#define SIZE 1000*1024*1024

int main(int argc, char *argv[])
{
    char *p;
    p = malloc(SIZE);
    if(!p){
        printf("malloc failed.\n");
        return 1;
    }
    getchar();
    free(p);
    return 0;
}

在没有as限制的时候,进程能成功申请到 1000MB 空间:

[root@rhel674 ~]# ulimit -v
unlimited
[root@rhel674 ~]# cat /proc/8503/status | grep VmSize
VmSize:  1027928 kB

如果限定 as 为 500MB,则无法成功申请:

[test@rhel674 ~]$ ulimit -v
500000
[test@rhel674 ~]$ /tmp/big_as
malloc failed.

nproc - Max processes

RLIMIT_NPROC
      The  maximum  number  of  processes  (or, more precisely on Linux, threads) that can be created for the real user ID of the calling process.
      Upon encountering this limit, fork(2) fails with the error EAGAIN.

nproc 限制某个用户所能打开的最多进程&线程数量。这个限制听上去是 per-user 的,其实它是 per-process 的。

在一个终端,设定 nproc 为 1024,然后打开 6 个进程:

[test@rhel674 ~]$ ulimit -u
1024
[test@rhel674 ~]$ sleep 100s &
[1] 8665
[test@rhel674 ~]$ sleep 100s &
[2] 8666
[test@rhel674 ~]$ sleep 100s &
[3] 8667
[test@rhel674 ~]$ sleep 100s &
[4] 8668
[test@rhel674 ~]$ sleep 100s &
[5] 8669
[test@rhel674 ~]$ sleep 100s &
[6] 8670

在另一终端,设定 nproc 为 5, 再尝试打开进程:

[test@rhel674 ~]$ ulimit -u
5
[test@rhel674 ~]$ sleep 99s &  ### 失败
-bash: fork: retry: Resource temporarily unavailable

其实,nproc 是在进程进行 fork() 的时候作出限制的。它会检查进程对应用户的进程/线程数量,然后与当前进程自身的 limits 进行比较。上述例子中,第一个终端的 nproc 为 1024, 所以打开 6 个进程没有压力;第二个终端的 nproc 为 5,在尝试打开第一个进程时,就达到限制了。(这段是瞎说的,有待进一步确定。)

需要留意的是, root 用户以及有 CAP_SYS_ADMIN 或 CAP_SYS_RESOURCE capability 的程序不受 nproc 的限制。

# man fork

EAGAIN It was not possible to create a new process because the caller’s RLIMIT_NPROC resource limit was encountered. To exceed this limit, the process must have either the CAP_SYS_ADMIN or the CAP_SYS_RESOURCE capability.
// Filename: linux-3.10.0-327.18.2.el7.x86_64/kernel/fork.c:1219

static struct task_struct *copy_process(unsigned long clone_flags,
                    unsigned long stack_start,
                    unsigned long stack_size,
                    int __user *child_tidptr,
                    struct pid *pid,
                    int trace)
{


    retval = -EAGAIN;
    if (atomic_read(&p->real_cred->user->processes) >=
            task_rlimit(p, RLIMIT_NPROC)) {
        if (!capable(CAP_SYS_ADMIN) && !capable(CAP_SYS_RESOURCE) &&
            p->real_cred->user != INIT_USER)  //<----------------
            goto bad_fork_free;
    }        
    current->flags &= ~PF_NPROC_EXCEEDED;

}

nofile - Max open files

RLIMIT_NOFILE
      Specifies a value one greater than the maximum file descriptor number that can be opened  by  this  process.   Attempts  (open(2),  pipe(2), dup(2), etc.)  to exceed this limit yield the error EMFILE.  (Historically, this limit was named RLIMIT_OFILE on BSD.)

nofile 限制进程所能最多打开的文件数量。

这个程序会尝试打开30个文件。

/******************
 * Filename: nofile.c
 * Usage: ./nofile
 * Author: feichashao@gmail.com
 * Description:
 *   Create/open many files.
 *   Test `ulimit -n`
 *****************/


#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/stat.h>

#define NR_FILES 30

int main(int argc, char *argv[])
{
    int fds[NR_FILES];
    char filename[30];
    int i;
    for( i = 0; i < NR_FILES; ++i ){
        sprintf(filename,"/tmp/somefile%d", i);
        fds[i] = open(filename, O_RDWR | O_CREAT, S_IRWXU | S_IRWXG | S_IRWXO);
        if (fds[i] < 0){
            printf("Failed to open file %s\n", filename);
        }else{
            printf("Open %s\n", filename);
        }
    }
    printf("Press any key to continue...");
    getchar();
    for( ; i >=0; --i){
        close(fds[i]);
    }
    return 0;
}

设置 nofile 为 10 之后,该进程就只能打开 7 个文件了。

[test@rhel674 ~]$ ulimit -n
10
[test@rhel674 ~]$ /tmp/nofile
Open /tmp/somefile0
Open /tmp/somefile1
Open /tmp/somefile2
Open /tmp/somefile3
Open /tmp/somefile4
Open /tmp/somefile5
Open /tmp/somefile6
Failed to open file /tmp/somefile7
Failed to open file /tmp/somefile8
Failed to open file /tmp/somefile9
Failed to open file /tmp/somefile10
Failed to open file /tmp/somefile11
Failed to open file /tmp/somefile12
Failed to open file /tmp/somefile13
Failed to open file /tmp/somefile14
Failed to open file /tmp/somefile15
Failed to open file /tmp/somefile16
Failed to open file /tmp/somefile17
Failed to open file /tmp/somefile18
Failed to open file /tmp/somefile19
Failed to open file /tmp/somefile20
Failed to open file /tmp/somefile21
Failed to open file /tmp/somefile22
Failed to open file /tmp/somefile23
Failed to open file /tmp/somefile24
Failed to open file /tmp/somefile25
Failed to open file /tmp/somefile26
Failed to open file /tmp/somefile27
Failed to open file /tmp/somefile28
Failed to open file /tmp/somefile29

另外3个文件是标准输入输出和标准错误。

[root@rhel674 ~]# ls -l /proc/9000/fd
total 0
lrwx------. 1 test test 64 Sep  4 20:44 0 -> /dev/pts/3
lrwx------. 1 test test 64 Sep  4 20:44 1 -> /dev/pts/3
lrwx------. 1 test test 64 Sep  4 20:43 2 -> /dev/pts/3
lrwx------. 1 test test 64 Sep  4 20:44 3 -> /tmp/somefile0
lrwx------. 1 test test 64 Sep  4 20:44 4 -> /tmp/somefile1
lrwx------. 1 test test 64 Sep  4 20:44 5 -> /tmp/somefile2
lrwx------. 1 test test 64 Sep  4 20:44 6 -> /tmp/somefile3
lrwx------. 1 test test 64 Sep  4 20:44 7 -> /tmp/somefile4
lrwx------. 1 test test 64 Sep  4 20:44 8 -> /tmp/somefile5
lrwx------. 1 test test 64 Sep  4 20:44 9 -> /tmp/somefile6

多线程的情况?

那么问题来了,如果一个进程有多个线程,那 limits 怎么限制?

通常情况下,进程创建出来的线程,会共享资源,比如内存空间,FD table等。所以,通常我们只需要关注进程本身的资源使用情况,它是所有线程使用量之和。

以 memlock 为例。以下程序会创建 20 个线程,每个线程会尝试申请 10M 的 locked memory.

/**************************
 * Filename: thread1.c
 * Usage: ./thread1
 * Author: feichashao@gmail.com
 *    Part of this code is from unknown source.
 * Description:
 *  1. Create muliple threads.
 *  2. Each thread allocate some locked memory.
 *  3. We could see the memory space are shared.
 *************************/


#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/mman.h>

#define NR_THREADS 20       //  create 20 threads.
#define SIZE (10*1024*1024) //  each thread allocate 10M

void *thread_function(void *arg);
int main()
{
    int res;
    pthread_t threads[NR_THREADS];
    void *thread_result;
    int i;
   
    for( i = 0; i < NR_THREADS; ++i){
        res = pthread_create(&threads[i], NULL, thread_function, (void*)i);
        if(res != 0) {
            perror("Thread creation failed");
            return 1;
        }
    }

    printf("Waiting for thread to finish...\n");
    for ( i = 0; i < NR_THREADS; ++i){
        res = pthread_join(threads[i], &thread_result);
        if (res != 0 ) {
            perror("Thread join failed");
            return 1;
        }
        printf("Thread joined, it returned %s\n", (char *)thread_result);
    }

    return 0;
}

void *thread_function(void *arg){
    char *p = mmap (NULL, SIZE, PROT_READ | PROT_WRITE, MAP_LOCKED | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); // try removing MAP_LOCKED flag, and see the result.
    sleep(120);
    pthread_exit("Thank you for the CPU time");
}

限定 memlock 为 100,000 KB,可以看到只有9个线程能申请到内存:

[test@rhel674 ~]$ ulimit -l 100000
[root@rhel674 ~]# cat /proc/8858/status | grep VmLck
VmLck:     92160 kB

备注

1. 由于本人水平有限,还有很多 limits 设定项未能涉及。

2. SysV 管理的服务(比如 /etc/init.d/httpd),由于不需要经过 pam_limits,所以 limits.conf 的设定是不对这些服务生效的。如需限制这些服务的资源使用,可以在 /etc/init.d/* 脚本中自行添加 ulimit 设定。

3. systemd 可以设定服务的资源限制,详情可参考 systemd.resource-control 的手册。