How to request continuous physical memory in Linux?

Background

A customer noticed an issue. His system has 100+GB free memory (which is pure free memory, not buffer/cache), but system starts reclaiming pages from buffer/cache and swapping out pages at some point.

From system log, we can see things like below at the reclaiming moment.
[cc lang=”text”]
May 2 10:03:56 rhel68-kmalloc kernel: insmod: page allocation failure. order:10, mode:0xd0
May 2 10:03:56 rhel68-kmalloc kernel: Pid: 22319, comm: insmod Not tainted 2.6.32-642.el6.x86_64 #1
May 2 10:03:56 rhel68-kmalloc kernel: Call Trace:
May 2 10:03:56 rhel68-kmalloc kernel: [] ? __alloc_pages_nodemask+0x7dc/0x950
May 2 10:03:56 rhel68-kmalloc kernel: [] ? kmem_getpages+0x62/0x170
May 2 10:03:56 rhel68-kmalloc kernel: [] ? fallback_alloc+0x1ba/0x270
May 2 10:03:56 rhel68-kmalloc kernel: [] ? cache_grow+0x2cf/0x320
May 2 10:03:56 rhel68-kmalloc kernel: [] ? ____cache_alloc_node+0x99/0x160
[/cc]

The message “page allocation failure. order:10, mode:0xd0” indicates that kernel was requesting an order-10 physical memory (4MB continuous physical memory), but failed to allocate.

This could be a cause that triggered buffer/cache reclaiming and swapping. The system might consider such ‘page allocation failure’ as ‘low on memory’, and trigger PFRA, which might lead to swapping out pages, and reclaiming buffer/cache.

Can we reproduce such situation?

Linux’s buddy system

In RHEL5/6/7, the page size is 4KB.
[cc lang=”bash”]
# getconf PAGESIZE
4096
[/cc]

In system’s buddyinfo, we can see how fragmented the memory is.
[cc lang=”bash”]
# cat /proc/buddyinfo
Node 0, zone DMA 2 2 1 1 1 0 1 0 1 1 3
Node 0, zone DMA32 97 54 32 81 51 16 23 23 11 10 403
[/cc]
For example, in Node 0’s DMA32 zone, the memory fragmentation is 97*4KB + 54*8KB + 32*16KB + 81*32KB + 51*64KB + 16*128KB + 23*256KB + 23*512KB + 11*1024KB + 10*2048KB + 403*4096KB.

Write a kernel module to request continuous physical memory

In Linux, a user-space process can not request a continuous physical memory directly, except for HugePage, if I recall correctly.

To request a continuous physical memory, we can call kmalloc() function in a kernel module.

Here’s the source code adapted from a hello world module. When loading this module, it will request 30 * 4096KB continuous physical memory. The size fed into kmalloc() should less or equal than 4096KB, as the largest continuous physical memory provided by system is 4096KB.

[cc lang=”C”]// Filename: big_mem.c
#include #include #include

MODULE_AUTHOR(“siwu_from_CEE”);
MODULE_DESCRIPTION(“Request for high order continous memory”);
MODULE_LICENSE(“GPL”);

#define NR_PAGES 30
#define PER_PAGE_SIZE 4096*1024 // 4096KB

int irq;
module_param(irq, int, 0644);
int sample;
module_param_named(test, sample, int, 0644);

int arr_data[10];
int arr_cnt;
module_param_array(arr_data, int, &arr_cnt, 0644);

void * stuff[NR_PAGES];

static int hello_init(void)
{
int i;
for(i = 0; i < NR_PAGES; i++){ stuff[i] = kmalloc(PER_PAGE_SIZE,GFP_KERNEL); } if(!stuff[0]){ printk("Failed to allocate memory\n"); return 0; } printk(" %zu bytes of memory allocated for %d times\n", ksize(stuff[0]), NR_PAGES); return 0; } static void hello_exit(void) { int i; for(i = 0; i < NR_PAGES; i++){ if (stuff[i]) kfree(stuff[i]); } printk("Bye. Bye..%d\n", irq); } module_init(hello_init); module_exit(hello_exit); [/cc]

Complie the kernel module

First, we need to install the proper kernel header. In RHEL/Centos, we could do so by executing:
[cc lang=”bash”]
# yum install kernel-headers kernel-devel
[/cc]
Be careful that the version of kernel-headers and kernel-devel should match the current running kernel. (`uname -r`)

Second, create a Makefile under the same directory of big_mem.c.
[cc lang=”bash”]
obj-m += big_mem.o
[/cc]

Last, execute below command to compile the module.
[cc lang=”bash”]
# make -C /lib/modules/2.6.32-642.el6.x86_64/build/ M=`pwd` modules
[/cc]
‘2.6.32-642.el6.x86_64’ is the version of my current module.

Insert the module

After running the above ‘make’ command, a module file ‘big_mem.ko’ is generated under the same directory.

Let’s insert it and observe the memory usage.
[cc lang=”bash”]
[root@rhel68-kmalloc ~]# cat /proc/buddyinfo ; free -m
Node 0, zone DMA 2 2 1 1 1 0 1 0 1 1 3
Node 0, zone DMA32 2 7 11 34 24 4 22 27 12 11 365
total used free shared buffers cached
Mem: 1877 345 1531 0 59 149
-/+ buffers/cache: 136 1740
Swap: 1023 0 1023

[root@rhel68-kmalloc kmalloc-2]# insmod big_mem.ko

[root@rhel68-kmalloc ~]# cat /proc/buddyinfo ; free -m
Node 0, zone DMA 2 2 1 1 1 0 1 0 1 1 3
Node 0, zone DMA32 2 7 11 34 24 4 22 27 12 11 335
total used free shared buffers cached
Mem: 1877 465 1411 0 59 149
-/+ buffers/cache: 256 1620
Swap: 1023 0 1023
[/cc]

Reproduce the issue

To reproduce the customer’s issue, I use below tricks.

1. Generate some read/write traffic. So buffer/cache will go high and high order continuous physical memory will decrease. (for example, copy some large files.)

2. Insert the module which requests for a large amount of continuous physical memory.
[cc lang=”bash”]
[root@rhel68-kmalloc ~]# cat /proc/buddyinfo ; free -m
Node 0, zone DMA 2 2 1 2 1 1 1 2 1 1 0
Node 0, zone DMA32 331 548 624 373 225 47 24 11 12 0 1
total used free shared buffers cached
Mem: 996 917 78 0 84 318
-/+ buffers/cache: 514 481
Swap: 1023 1 1022

[root@rhel68-kmalloc kmalloc-1]# insmod big_mem.ko

[root@rhel68-kmalloc ~]# cat /proc/buddyinfo ; free -m
Node 0, zone DMA 17 11 12 13 11 7 3 2 1 1 2
Node 0, zone DMA32 4343 2720 1731 1019 627 324 189 113 54 13 3
total used free shared buffers cached
Mem: 996 608 387 0 84 16
-/+ buffers/cache: 507 488
Swap: 1023 37 986
[/cc]

3. Check system log, and we can see “page allocation failure”.
[cc lang=”text”]
May 2 10:03:56 rhel68-kmalloc kernel: insmod: page allocation failure. order:10, mode:0xd0
May 2 10:03:56 rhel68-kmalloc kernel: Pid: 22319, comm: insmod Not tainted 2.6.32-642.el6.x86_64 #1
May 2 10:03:56 rhel68-kmalloc kernel: Call Trace:
May 2 10:03:56 rhel68-kmalloc kernel: [] ? __alloc_pages_nodemask+0x7dc/0x950
May 2 10:03:56 rhel68-kmalloc kernel: [] ? kmem_getpages+0x62/0x170
May 2 10:03:56 rhel68-kmalloc kernel: [] ? fallback_alloc+0x1ba/0x270
May 2 10:03:56 rhel68-kmalloc kernel: [] ? cache_grow+0x2cf/0x320
May 2 10:03:56 rhel68-kmalloc kernel: [] ? ____cache_alloc_node+0x99/0x160
May 2 10:03:56 rhel68-kmalloc kernel: [] ? kmem_cache_alloc_trace+0x137/0x1c0
May 2 10:03:56 rhel68-kmalloc kernel: [] ? hello_init+0x26/0x84 [big_mem]
May 2 10:03:56 rhel68-kmalloc kernel: [] ? hello_init+0x0/0x84 [big_mem]
May 2 10:03:56 rhel68-kmalloc kernel: [] ? do_one_initcall+0xc0/0x280
May 2 10:03:56 rhel68-kmalloc kernel: [] ? sys_init_module+0xe1/0x250
May 2 10:03:56 rhel68-kmalloc kernel: [] ? system_call_fastpath+0x16/0x1b
[/cc]

Reference

1. The Linux Kernel Module Programming Guide
2. 8.1. The Real Story of kmalloc