Contents
Background
A customer noticed an issue. His system has 100+GB free memory (which is pure free memory, not buffer/cache), but system starts reclaiming pages from buffer/cache and swapping out pages at some point.
From system log, we can see things like below at the reclaiming moment.
[cc lang=”text”]
May 2 10:03:56 rhel68-kmalloc kernel: insmod: page allocation failure. order:10, mode:0xd0
May 2 10:03:56 rhel68-kmalloc kernel: Pid: 22319, comm: insmod Not tainted 2.6.32-642.el6.x86_64 #1
May 2 10:03:56 rhel68-kmalloc kernel: Call Trace:
May 2 10:03:56 rhel68-kmalloc kernel: [
May 2 10:03:56 rhel68-kmalloc kernel: [
May 2 10:03:56 rhel68-kmalloc kernel: [
May 2 10:03:56 rhel68-kmalloc kernel: [
May 2 10:03:56 rhel68-kmalloc kernel: [
[/cc]
The message “page allocation failure. order:10, mode:0xd0” indicates that kernel was requesting an order-10 physical memory (4MB continuous physical memory), but failed to allocate.
This could be a cause that triggered buffer/cache reclaiming and swapping. The system might consider such ‘page allocation failure’ as ‘low on memory’, and trigger PFRA, which might lead to swapping out pages, and reclaiming buffer/cache.
Can we reproduce such situation?
Linux’s buddy system
In RHEL5/6/7, the page size is 4KB.
[cc lang=”bash”]
# getconf PAGESIZE
4096
[/cc]
In system’s buddyinfo, we can see how fragmented the memory is.
[cc lang=”bash”]
# cat /proc/buddyinfo
Node 0, zone DMA 2 2 1 1 1 0 1 0 1 1 3
Node 0, zone DMA32 97 54 32 81 51 16 23 23 11 10 403
[/cc]
For example, in Node 0’s DMA32 zone, the memory fragmentation is 97*4KB + 54*8KB + 32*16KB + 81*32KB + 51*64KB + 16*128KB + 23*256KB + 23*512KB + 11*1024KB + 10*2048KB + 403*4096KB.
Write a kernel module to request continuous physical memory
In Linux, a user-space process can not request a continuous physical memory directly, except for HugePage, if I recall correctly.
To request a continuous physical memory, we can call kmalloc() function in a kernel module.
Here’s the source code adapted from a hello world module. When loading this module, it will request 30 * 4096KB continuous physical memory. The size fed into kmalloc() should less or equal than 4096KB, as the largest continuous physical memory provided by system is 4096KB.
[cc lang=”C”]// Filename: big_mem.c MODULE_AUTHOR(“siwu_from_CEE”); #define NR_PAGES 30 int irq; int arr_data[10]; void * stuff[NR_PAGES]; static int hello_init(void) First, we need to install the proper kernel header. In RHEL/Centos, we could do so by executing: Second, create a Makefile under the same directory of big_mem.c. Last, execute below command to compile the module. After running the above ‘make’ command, a module file ‘big_mem.ko’ is generated under the same directory. Let’s insert it and observe the memory usage. [root@rhel68-kmalloc kmalloc-2]# insmod big_mem.ko [root@rhel68-kmalloc ~]# cat /proc/buddyinfo ; free -m To reproduce the customer’s issue, I use below tricks. 1. Generate some read/write traffic. So buffer/cache will go high and high order continuous physical memory will decrease. (for example, copy some large files.) 2. Insert the module which requests for a large amount of continuous physical memory. [root@rhel68-kmalloc kmalloc-1]# insmod big_mem.ko [root@rhel68-kmalloc ~]# cat /proc/buddyinfo ; free -m 3. Check system log, and we can see “page allocation failure”. 1. The Linux Kernel Module Programming Guide
#include
MODULE_DESCRIPTION(“Request for high order continous memory”);
MODULE_LICENSE(“GPL”);
#define PER_PAGE_SIZE 4096*1024 // 4096KB
module_param(irq, int, 0644);
int sample;
module_param_named(test, sample, int, 0644);
int arr_cnt;
module_param_array(arr_data, int, &arr_cnt, 0644);
{
int i;
for(i = 0; i < NR_PAGES; i++){
stuff[i] = kmalloc(PER_PAGE_SIZE,GFP_KERNEL);
}
if(!stuff[0]){
printk("Failed to allocate memory\n");
return 0;
}
printk(" %zu bytes of memory allocated for %d times\n", ksize(stuff[0]), NR_PAGES);
return 0;
}
static void hello_exit(void)
{
int i;
for(i = 0; i < NR_PAGES; i++){
if (stuff[i]) kfree(stuff[i]);
}
printk("Bye. Bye..%d\n", irq);
}
module_init(hello_init);
module_exit(hello_exit);
[/cc]
Complie the kernel module
[cc lang=”bash”]
# yum install kernel-headers kernel-devel
[/cc]
Be careful that the version of kernel-headers and kernel-devel should match the current running kernel. (`uname -r`)
[cc lang=”bash”]
obj-m += big_mem.o
[/cc]
[cc lang=”bash”]
# make -C /lib/modules/2.6.32-642.el6.x86_64/build/ M=`pwd` modules
[/cc]
‘2.6.32-642.el6.x86_64’ is the version of my current module.Insert the module
[cc lang=”bash”]
[root@rhel68-kmalloc ~]# cat /proc/buddyinfo ; free -m
Node 0, zone DMA 2 2 1 1 1 0 1 0 1 1 3
Node 0, zone DMA32 2 7 11 34 24 4 22 27 12 11 365
total used free shared buffers cached
Mem: 1877 345 1531 0 59 149
-/+ buffers/cache: 136 1740
Swap: 1023 0 1023
Node 0, zone DMA 2 2 1 1 1 0 1 0 1 1 3
Node 0, zone DMA32 2 7 11 34 24 4 22 27 12 11 335
total used free shared buffers cached
Mem: 1877 465 1411 0 59 149
-/+ buffers/cache: 256 1620
Swap: 1023 0 1023
[/cc]Reproduce the issue
[cc lang=”bash”]
[root@rhel68-kmalloc ~]# cat /proc/buddyinfo ; free -m
Node 0, zone DMA 2 2 1 2 1 1 1 2 1 1 0
Node 0, zone DMA32 331 548 624 373 225 47 24 11 12 0 1
total used free shared buffers cached
Mem: 996 917 78 0 84 318
-/+ buffers/cache: 514 481
Swap: 1023 1 1022
Node 0, zone DMA 17 11 12 13 11 7 3 2 1 1 2
Node 0, zone DMA32 4343 2720 1731 1019 627 324 189 113 54 13 3
total used free shared buffers cached
Mem: 996 608 387 0 84 16
-/+ buffers/cache: 507 488
Swap: 1023 37 986
[/cc]
[cc lang=”text”]
May 2 10:03:56 rhel68-kmalloc kernel: insmod: page allocation failure. order:10, mode:0xd0
May 2 10:03:56 rhel68-kmalloc kernel: Pid: 22319, comm: insmod Not tainted 2.6.32-642.el6.x86_64 #1
May 2 10:03:56 rhel68-kmalloc kernel: Call Trace:
May 2 10:03:56 rhel68-kmalloc kernel: [
May 2 10:03:56 rhel68-kmalloc kernel: [
May 2 10:03:56 rhel68-kmalloc kernel: [
May 2 10:03:56 rhel68-kmalloc kernel: [
May 2 10:03:56 rhel68-kmalloc kernel: [
May 2 10:03:56 rhel68-kmalloc kernel: [
May 2 10:03:56 rhel68-kmalloc kernel: [
May 2 10:03:56 rhel68-kmalloc kernel: [
May 2 10:03:56 rhel68-kmalloc kernel: [
May 2 10:03:56 rhel68-kmalloc kernel: [
May 2 10:03:56 rhel68-kmalloc kernel: [
[/cc]Reference
2. 8.1. The Real Story of kmalloc