Linux – 第5页 – 肥叉烧 feichashao.com

比较 kdump makedumpfile 中的压缩方法

背景

在出现 Kernel Panic 的时候，kdump 可以帮助我们收集 vmcore, 以便后续对故障原因进行分析。然而，近些年的服务器动辄上百G的内存，转储一个 vmcore 所耗费的时间显得相对较长，增加了 down time.

在 RHEL7/CentOS7 中， kdump 提供了三种压缩方法(zlib, lzo, snappy)，我们可以选择一种较快速的压缩方法来节省收集 vmcore 的时间。

压缩方法可以在 kdump.conf 中的 makedumpfile 一行里设置。

# man makedumpfile

-c,-l,-p
Compress dump data by each page using zlib for -c option, lzo for -l option or snappy for -p option. (-l option needs USELZO=on and -p option needs USESNAPPY=on when building)
A user cannot specify this option with -E option, because the ELF format does not support compressed data.
Example:
# makedumpfile -c -d 31 -x vmlinux /proc/vmcore dumpfile

zlib 是 gzip 所使用的压缩方法，通常而言它比 lzo 和 snappy 慢，而压缩比稍微高于 lzo 和 snappy.
继续阅读“比较 kdump makedumpfile 中的压缩方法”

How to request continuous physical memory in Linux?

Background

A customer noticed an issue. His system has 100+GB free memory (which is pure free memory, not buffer/cache), but system starts reclaiming pages from buffer/cache and swapping out pages at some point.

From system log, we can see things like below at the reclaiming moment.

May 2 10:03:56 rhel68-kmalloc kernel: insmod: page allocation failure. order:10, mode:0xd0
May 2 10:03:56 rhel68-kmalloc kernel: Pid: 22319, comm: insmod Not tainted 2.6.32-642.el6.x86_64 #1
May 2 10:03:56 rhel68-kmalloc kernel: Call Trace:
May 2 10:03:56 rhel68-kmalloc kernel: [<ffffffff8113e77c>] ? __alloc_pages_nodemask+0x7dc/0x950
May 2 10:03:56 rhel68-kmalloc kernel: [<ffffffff8117f132>] ? kmem_getpages+0x62/0x170
May 2 10:03:56 rhel68-kmalloc kernel: [<ffffffff8117fd4a>] ? fallback_alloc+0x1ba/0x270
May 2 10:03:56 rhel68-kmalloc kernel: [<ffffffff8117f79f>] ? cache_grow+0x2cf/0x320
May 2 10:03:56 rhel68-kmalloc kernel: [<ffffffff8117fac9>] ? ____cache_alloc_node+0x99/0x160

继续阅读“How to request continuous physical memory in Linux?”

Why do services like sendmail/httpd still query outdated DNS servers after resolv.conf is changed?

Question

One of my colleague raised a question: After changing resolv.conf, sendmail still query the old resolver, why?

Steps:

1. Set 192.168.122.65 as nameserver in resolv.conf, start sendmail service, send a mail to root@hat.com, and take a tcpdump. We can see there's a query asking 192.168.122.65 for MX record of hat.com .

2. Change nameserver from 192.168.122.65 to 192.168.122.72 in resolv.conf, send a mail to root@hat.com and take a tcpdump again. This time we expect a query to 192.168.122.72 (new), but the actual query is to 192.168.122.65 (old).

3. Restart sendmail service, send a mail to root@hat.com, this time we can see a query to 192.168.122.72 (new) as expected.

When we were thinking about what's going wrong with sendmail, another colleague joined in discussion, and she mentioned that httpd and some other long-running services also have this behavior.

Since sendmail is not the only case, I think we should have a check on glibc.

继续阅读“Why do services like sendmail/httpd still query outdated DNS servers after resolv.conf is changed?”

How to use mwm as Window Manager in RHEL7?

A dirty trick.

Install packages

# yum groupinstall "Server with GUI"
# yum install motif xterm
# cat /etc/sysconfig/desktop 
PREFERRED=/usr/bin/mwm

Start X from multi-users.target

# systemctl isolate multi-user.target
# startx

Pacemaker管理工具中 pcs/pcsd 的关系

在 debug pacemaker/pcs/pcsd 的时候，我们通常需要知道，敲下 `pcs xxxx xxxx` 命令后，发生了什么动作。

在应用 pcs 进行管理的 pacemaker 集群中，每个节点都会启动一个 pcsd 守护进程，监听 2224/tcp 端口。随后，我们可以从任一节点中，通过 pcs 命令管理整个集群。

误解

按照套路，通常这是一种 client/server 架构， pcs 命令行工具向相应节点的 pcsd 发送请求， pcsd 在相应节点完成动作。

然而实际与此有所出入。

实际套路

实际上，真正对 pacemaker 执行操作的，是 pcs 这个命令行工具。pcsd 负责接收来自其他节点的请求，随之调用本地的 pcs 工具，最后由本地的 pcs 执行操作。

本地命令示例

以 `pcs cluster start` 命令为例。在 Node A 中执行 `pcs cluster start`， Node A 本地的 cluster 相关服务将被启动。

在此操作中，不需要经过 pcsd. 即， pcs ---> execute. 具体过程如下。
继续阅读“Pacemaker管理工具中 pcs/pcsd 的关系”

如何验证 ulimit 中的资源限制？如何查看当前使用量？

ulimit 可以限制进程的资源使用量。在设置 ulimit 之后，如何验证各个资源限制是有效的？如何查看每个进程当前使用了多少资源？本文将尝试对多个设定值进行演示。

ulimit 的设定，是 per-user 的还是 per-process 的?

我们从 ulimit 中看到的设定值，都是 per-process 的。也就是说，每个进程有自己的 limits 值。

$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 46898
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 46898
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

ulimit, limits.conf 和 pam_limits 的关系？

首先，limit的设定值是 per-process 的。在 Linux 中，每个普通进程可以调用 getrlimit() 来查看自己的 limits，也可以调用 setrlimit() 来改变自身的 soft limits。而要改变 hard limit, 则需要进程有 CAP_SYS_RESOURCE 权限。另外，进程 fork() 出来的子进程，会继承父进程的 limits 设定。具体可参考 getrlimit/setrlimit/prlimit 的手册。

`ulimit` 是 shell 的内置命令。在执行`ulimit`命令时，其实是 shell 自身调用 getrlimit()/setrlimit() 来获取/改变自身的 limits. 当我们在 shell 中执行应用程序时，相应的进程就会继承当前 shell 的 limits 设定。

那么一个 shell 的初始 limits 是谁设定的？通常是 pam_limits 设定的。顾名思义，pam_limits 是一个 PAM 模块，用户登录后，pam_limits 会给用户的 shell 设定在 limits.conf 定义的值。我们可以开启 pam_limits 的 debug 来查看大致过程：

[root@rhel674 ~]# cat /etc/security/limits.conf | grep -v ^# | grep -v ^$
test hard memlock 102410

[root@rhel674 ~]# grep pam_limits /etc/pam.d/password-auth-ac
session required pam_limits.so debug

[root@rhel674 ~]# tail /var/log/secure
Sep 4 00:12:49 rhel674 sshd[3151]: Accepted publickey for test from 192.168.122.1 port 41556 ssh2
Sep 4 00:12:49 rhel674 sshd[3151]: pam_limits(sshd:session): reading settings from '/etc/security/limits.conf'
Sep 4 00:12:49 rhel674 sshd[3151]: pam_limits(sshd:session): process_limit: processing hard memlock 102410 for USER
Sep 4 00:12:49 rhel674 sshd[3151]: pam_limits(sshd:session): reading settings from '/etc/security/limits.d/90-nproc.conf'
Sep 4 00:12:49 rhel674 sshd[3151]: pam_limits(sshd:session): process_limit: processing soft nproc 1024 for DEFAULT
Sep 4 00:12:49 rhel674 sshd[3151]: pam_unix(sshd:session): session opened for user test by (uid=0)

在 limits.conf 中，对 test 用户设定 memlock 的 hard limit 为 102410. 当用户通过 SSH 登录后，可以看到 pam_limits 为该会话设定了 memlock=102410.
继续阅读“如何验证 ulimit 中的资源限制？如何查看当前使用量？”

/proc/interrupts 的数值是如何获得的？

之前为了确认 /proc/interrupts 文件第一列的缩进方式，看了一下相关源码，在这里做一些记录。

系统一共有多少个中断？

系统可用的中断数量主要由架构决定，x86 的具体数量可以参考以下定义。

/* kernel/irq/irqdesc.c */

 96 int nr_irqs = NR_IRQS;
 97 EXPORT_SYMBOL_GPL(nr_irqs);

/* arch/x86/include/asm/irq_vectors.h */

152 #define NR_IRQS_LEGACY            16
153    
154 #define IO_APIC_VECTOR_LIMIT        ( 32 * MAX_IO_APICS )
155    
156 #ifdef CONFIG_X86_IO_APIC
157 # define CPU_VECTOR_LIMIT       (64 * NR_CPUS)
158 # define NR_IRQS                    \
159     (CPU_VECTOR_LIMIT > IO_APIC_VECTOR_LIMIT ?  \
160         (NR_VECTORS + CPU_VECTOR_LIMIT)  :  \
161         (NR_VECTORS + IO_APIC_VECTOR_LIMIT))
162 #else /* !CONFIG_X86_IO_APIC: */
163 # define NR_IRQS            NR_IRQS_LEGACY
164 #endif

继续阅读“/proc/interrupts 的数值是如何获得的？”