Linux Kernel 的双向链表实现

在一个太阳毒辣的周五,刚到公司的我还没吃完手里的面包,肯尼斯把我抓到他的屏幕前:“你来说说这段代码什么意思。”

#define container_of(ptr, type, member) ({                      \
        const typeof( ((type *)0)->member ) *__mptr = (ptr);    \
        (type *)( (char *)__mptr - offsetof(type,member) );})

我凝视屏幕一分钟,问道:“这...有注释吗?”
见状,肯尼斯渐渐露出笑容。我仿佛从肯尼斯的笑脸中,看到了下一分钟认怂的我。

一分钟后,肯尼斯从 Kernel 的双向链表给我讲起...

测试环境

下面的 Kernel 源码以及测试环境均为 RHEL7.3:
kernel-3.10.0-514.el7.x86_64
gcc-4.8.5-11.el7.x86_64

通常的双向链表实现

我理解的双向链表是长这个样子的:

struct GList {
  gpointer data;
  GList *next;
  GList *prev;
};

双向链表由一个个的节点组成,每个节点都包含着数据以及分别之前前后节点的指针。画个示意图:

普通双向链表

Linux Kernel 的双向链表实现

Kernel 的链表是通过 "Intrusive list" 来实现的。也就是这样:

struct list_head {
    struct list_head *next, *prev;
};

好吧,数据存哪? 事实上,这个链表本身不存数据,它只负责连接数据。画个示意图:Kernel 双向链表

数据是由单独的数据结构定义,只需要在这些数据结构中加入 list_head, 并把 list_head 放到链表里即可。这样做的一个好处是,一个数据结构可以同时存在于多个链表中。如上图, Data1可以有两个不同的 list_head,两个 list_head 在不同的两个链表中。

问题来了,通过这些 list_head,怎样能找到其所在的数据结构?比如,得到了上图"Data1"中"List1"的地址,怎样可以得到"Data1"的地址?这就用到了最开始的那个宏函数了。(它是有注释的)

/**
 * container_of - cast a member of a structure out to the containing structure
 * @ptr:        the pointer to the member.
 * @type:       the type of the container struct this is embedded in.
 * @member:     the name of the member within the struct.
 *
 */
#define container_of(ptr, type, member) ({                      \
        const typeof( ((type *)0)->member ) *__mptr = (ptr);    \
        (type *)( (char *)__mptr - offsetof(type,member) );})

给出一个 type 里的 member 的指针 ptr, container_of 可以返回该 type 的起始地址。 offsetof() 用于找到 member 相对 type 的偏移 offset,将 member 的地址与 offset 相减,即可得到 type 的起始地址。举个例子:

#include <stdio.h>
#include <stdlib.h>

#define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER)

struct student{
    int id;
    int age;
};

int main(int agrc, char **argv)
{
    struct student lihua;
    struct student *tmp;
    lihua.id = 2001;
    lihua.age = 10;

    printf("Relative offset of age to student = %d\n", ((size_t)&((struct student *)0)->age));

    const typeof( ((struct student *)0)->age ) *__mptr = &lihua.age;
    printf("Pointer to lihua.age = %p\n", __mptr);

    tmp = (struct student *)( (char *)__mptr - offsetof(struct student, age));
    printf("Calculated pointer to lihua = %p\n", tmp);
    printf("The actual pointer = %p\n", &lihua);

    return 0;
}

上面的程序首先定义一个 struct student, 里面有 id 和 age 两个 member。定义一个类型为 struct student 的变量 lihua, 如果能知道 lihua.age 的地址,即可推算出 lihua 的地址。

$ ./test 
Relative offset of age to student = 4
Pointer to lihua.age = 0x7fffdb408b84
Calculated pointer to lihua = 0x7fffdb408b80
The actual pointer = 0x7fffdb408b80

另外,从 container_of 的第一条语句可以知道 __mptr 的值与 ptr 是相等的,这条语句有什么存在意义呢?

const typeof( ((type *)0)->member ) *__mptr = (ptr); 

它的一个作用是,它可以验证 member 是否存在于 type 之中,避免犯错。例如,捏造一个叫 score 的 member,编译的时候就会报错。

const typeof( ((struct student *)0)->score ) *__mptr = &lihua.age;

$ gcc -Wall -o test test.c
test.c:19:40: error: ‘struct student’ has no member named ‘score’
     const typeof( ((struct student *)0)->score ) *__mptr = &lihua.age;

Kernel 的应用例子

双向链表在 Kernel 中大量存在,下面将用 task_struct 进行验证。

对于每一个进程(task),Kernel 都会用一个 task_struct 来记录相关信息。在 task_struct 中,有一个 struct list_head tasks 的定义。接下来将验证能否通过这个 tasks 链表,找到其他 task_struct 的信息。

struct task_struct {
    volatile long state;
    void *stack;
    atomic_t usage;
  <--snip-->
    struct list_head tasks;
}

随便找个进程作为例子:

crash> task_struct 0xffff88007bc53ec0 -ox
  [ffff88007bc53ec0] volatile long state;
  <--snip-->
  [ffff88007bc542f0] struct list_head tasks; <---------

可以算出,struct list head tasks 相对于 task_struct 起始地址的 offset 是 0x430.

crash> eval 0xffff88007bc542f0-0xffff88007bc53ec0
hexadecimal: 430  

从这个 task_struct 的 list_head tasks 中,可以读取下一个 list_head tasks 的地址。

crash> list_head 0xffff88007bc542f0 -ox
struct list_head {
  [ffff88007bc542f0] struct list_head *next;
  [ffff88007bc542f8] struct list_head *prev;
}

crash> list_head 0xffff88007bc542f0 
struct list_head {
  next = 0xffff88007a2d8430, 
  prev = 0xffff88003657a390
}

可以看到,它指向的下一个 list_head tasks 的地址是 0xffff88007a2d8430,进而可以算出对应 task_struct 的地址:

crash> eval 0xffff88007a2d8430-0x430
hexadecimal: ffff88007a2d8000

从这个 task_struct 的 list_head tasks 中,又可以看到它的 prev 其实是指向原先的 task_struct 的 tasks (0xffff88007bc542f0):

 
crash> task_struct 0xffff88007a2d8000 -ox
struct task_struct {
  [ffff88007a2d8000] volatile long state;
  <--snip-->
  [ffff88007a2d8430] struct list_head tasks;

crash> list_head 0xffff88007a2d8430
struct list_head {
  next = 0xffff880036712390, 
  prev = 0xffff88007bc542f0  <---------
}

从另一个方向同样可以验证:

crash> eval 0xffff88003657a390-0x430
hexadecimal: ffff880036579f60  

crash> task_struct 0xffff880036579f60 -ox
struct task_struct {
  [ffff880036579f60] volatile long state;
  <--snip-->
  [ffff88003657a390] struct list_head tasks;

crash> list_head 0xffff88003657a390
struct list_head {
  next = 0xffff88007bc542f0, <---------
  prev = 0xffff88003657e250
}

使用示例

道理都懂,但是这样的双向链表要怎么用?举个例子吧。

首先,从 kernel 代码中抠出双向链表的部分实现,放到 list.h 中。其中,LIST_HEAD() 用于初始化链表, list_add() 用于向链表中添加一个 entry,list_entry 用于获取 entry 所在的数据结构。

# cat list.h 
#define offsetof(TYPE, MEMBER)  ((size_t)&((TYPE *)0)->MEMBER)

/** include/linux/kernel.h **/

/**
 * container_of - cast a member of a structure out to the containing structure
 * @ptr:        the pointer to the member.
 * @type:       the type of the container struct this is embedded in.
 * @member:     the name of the member within the struct.
 *
 */
#define container_of(ptr, type, member) ({                      \
        const typeof( ((type *)0)->member ) *__mptr = (ptr);    \
        (type *)( (char *)__mptr - offsetof(type,member) );})


/** include/linux/types.h **/

struct list_head {
        struct list_head *next, *prev;
};


/** include/linux/list.h **/

#define LIST_HEAD_INIT(name) { &(name), &(name) }

#define LIST_HEAD(name) \
        struct list_head name = LIST_HEAD_INIT(name)

static inline void INIT_LIST_HEAD(struct list_head *list)
{
        list->next = list;
        list->prev = list;
}


/*
 * Insert a new entry between two known consecutive entries.
 *
 * This is only for internal list manipulation where we know
 * the prev/next entries already!
 */
static inline void __list_add(struct list_head *new,
                              struct list_head *prev,
                              struct list_head *next)
{
        next->prev = new;
        new->next = next;
        new->prev = prev;
        prev->next = new;
}


/**
 * list_add - add a new entry
 * @new: new entry to be added
 * @head: list head to add it after
 *
 * Insert a new entry after the specified head.
 * This is good for implementing stacks.
 */
static inline void list_add(struct list_head *new, struct list_head *head)
{
        __list_add(new, head, head->next);
}


/**
 * list_entry - get the struct for this entry
 * @ptr:        the &struct list_head pointer.
 * @type:       the type of the struct this is embedded in.
 * @member:     the name of the list_struct within the struct.
 */
#define list_entry(ptr, type, member) \
        container_of(ptr, type, member)

下面的程序进行以下操作:
- 定义 struct person,里面包含 struct list_head list;
- 初始化链表 persons;
- 创建两个 person,并添加到 persons 链表中;
- 读取 persons 链表中每一个 person 的信息;

# cat test.c 
#include <stdio.h>
#include <stdlib.h>

#include "list.h"

struct person{
	unsigned int id;
	unsigned int height;
	unsigned int weight;
	struct list_head list;
};

int main(int argc, char **argv)
{
	struct person *tmp;
	struct list_head *pos;
	struct person persons;
	INIT_LIST_HEAD(&persons.list);

	// add 1st person
	tmp = (struct person *)malloc(sizeof(struct person));
	tmp->id = 1;
	tmp->height = 170;
	tmp->weight = 65;
	printf("%d, %d, %d\n", tmp->id, tmp->height, tmp->weight);
	list_add(&(tmp->list), &(persons.list));

	// add 2nd person
	tmp = (struct person *)malloc(sizeof(struct person));
	tmp->id = 2;
	tmp->height = 160;
	tmp->weight = 60;
	printf("%d, %d, %d\n", tmp->id, tmp->height, tmp->weight);
	list_add(&(tmp->list), &(persons.list));

	// read persons
	pos = persons.list.next;
	tmp = list_entry(pos, struct person, list);
	printf("item1 %d, %d, %d \n", tmp->id, tmp->height, tmp->weight);

	pos = pos->next;
        tmp = list_entry(pos, struct person, list);
        printf("item2 %d, %d, %d \n", tmp->id, tmp->height, tmp->weight);

	pos = pos->next;
        tmp = list_entry(pos, struct person, list);
        printf("item3 %d, %d, %d \n", tmp->id, tmp->height, tmp->weight);

	return 0;
}

从输出可以验证,只要知道 person 中 list 的地址,就能读取到每个 person 的内容。至于 item3 是链表的头,这个程序没给它进行初始化,所以数值比较奇怪。

# ./a.out 
1, 170, 65
2, 160, 60
item1 2, 160, 60 
item2 1, 170, 65 
item3 1, 0, 0 

参考资料

[1] Data Structures in the Linux Kernel - Linux Inside
https://0xax.gitbooks.io/linux-insides/DataStructures/dlist.html

[2] Linux Kernel Linked List Explained
https://isis.poly.edu/kulesh/stuff/src/klist/

思考题

为什么作者在开头用“太阳毒辣”来形容周五?