random tips on coding, ops, ...

kipmi0 占用 100% CPU 的问题

最近在某台设备上发现 kipmi0 这个内核线程会占用 100% CPU。

最快的临时解决办法是:

# rmmod ipmi_si

显然,该方法是以牺牲一定的功能为代价。根本解决办法需要清楚为什么 kipmi0 会占用 CPU 资源呢?摘录部分内核文档 Documentation/IPMI.txt 如下:

If your IPMI interface does not support interrupts and is a KCS or
SMIC interface, the IPMI driver will start a kernel thread for the
interface to help speed things up.  This is a low-priority kernel
thread that constantly polls the IPMI driver while an IPMI operation
is in progress.  The force_kipmid module parameter will all the user to
force this thread on or off.  If you force it off and don't have
interrupts, the driver will run VERY slowly.  Don't blame me,
these interfaces suck.

Unfortunately, this thread can use a lot of CPU depending on the
interface's performance.  This can waste a lot of CPU and cause
various issues with detecting idle CPU and using extra power.  To
avoid this, the kipmid_max_busy_us sets the maximum amount of time, in
microseconds, that kipmid will spin before sleeping for a tick.  This
value sets a balance between performance and CPU waste and needs to be
tuned to your needs.  Maybe, someday, auto-tuning will be added, but
that's not a simple thing and even the auto-tuning would need to be
tuned to the user's desired performance.

可以看到,本质原因是由于部分 IPMI 接口不支持中断,只能通过轮询来获取相应的 信息。通过设置模块参数能控制每次轮询最多持续的时间。该参数名为 kipmid_max_busy_us。如果内核模块已经加载,可以使用如下命令调整:

# echo 100 > /sys/module/ipmi_si/parameters/kipmid_max_busy_us

如果希望模块加载时自动设置该参数,则可以配置如下文件:

# cat /etc/modprobe.d/ipmi.conf
options ipmi_si kipmid_max_busy_us=100

另外,从上面文档中可以看到,尽管 kipmi0 会占用比较多的 CPU 资源,但该内核线程 的优先级设置比较低,对系统性能的冲击是有限的。


comments powered by Disqus