Modify

Opened 5 years ago

Closed 3 years ago

#12765 closed defect (fixed)

High CPU utilisation due to /sbin/hotplug-call with r35047

Reported by: hnyman <hannu.nyman@…> Owned by: developers
Priority: normal Milestone: Chaos Calmer 15.05
Component: base system Version: Trunk
Keywords: CPU load hotplug Cc:

Description

I am experiencing high CPU utilisation with 35047.
Load shoots to 1.0 right after the boot with an idle router.
htop shows /sbin/hotplug-call to consume all the CPU time.

The same router worked ok with 35016 build.

I have already reverted to default settings with firstboot and the firmware was built after "make dirclean".

When looking to the recent trunk changes, there is nothing obvious. First suspect to come into mind is the libubox version bump (or the gcc-linaro version bump?)

ar71xx/wndr3800
Ubuntu 12.10 x64 in Virtualbox as the build environment.

-----------------------------------------------------
 BARRIER BREAKER (Bleeding Edge, r35047)
 -----------------------------------------------------
  * 1/2 oz Galliano         Pour all ingredients into
  * 4 oz cold Coffee        an irish coffee mug filled
  * 1 1/2 oz Dark Rum       with crushed ice. Stir.
  * 2 tsp. Creme de Cacao
 -----------------------------------------------------
root@OpenWrt:/# htop

  CPU[|||||||||||||||||********************100.0%]     Tasks: 26, 3 thr; 2 running
  Mem[||||||##****                       16/123MB]     Load average: 1.29 0.56 0.21
  Swp[                                      0/0MB]     Uptime: 00:02:17

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
 1323 root       23   3  1596   388   228 R 98.1  0.3  1:09.29 /bin/sh /sbin/hotplug-call block
 2107 root       25   5  1352   740   560 R  1.3  0.6  0:00.07 htop
    1 root       20   0  1524   436   360 S  0.0  0.3  0:04.13 init
  731 root       20   0  1524   232   156 S  0.0  0.2  0:00.00 init
  828 root       20   0  1576   368   304 S  0.0  0.3  0:00.01 /sbin/syslogd -C64
  830 root       20   0  1512   296   224 S  0.0  0.2  0:00.01 /sbin/klogd
  832 root       20   0   864   388   264 S  0.0  0.3  0:00.63 /sbin/hotplug2 --override --persistent --se
  838 root       20   0  1016   356   228 S  0.0  0.3  0:00.11 /sbin/procd
  841 root       20   0   864   292   168 S  0.0  0.2  0:00.02 /sbin/hotplug2 --override --persistent --se
  843 root       20   0  1012   404   280 S  0.0  0.3  0:00.03 /usr/sbin/6distributed
  847 root        0 -20   876   268   208 S  0.0  0.2  0:00.04 ubusd
  862 root       20   0  1528   684   496 S  0.0  0.5  0:00.10 /sbin/netifd
 1118 root       20   0  1528   432   348 S  0.0  0.3  0:00.00 udhcpc -p /var/run/udhcpc-eth1.pid -s /lib/
 1162 root       20   0  1520   328   252 S  0.0  0.3  0:00.03 /sbin/watchdog -t 5 /dev/watchdog
 1242 root       20   0  1560   460   348 S  0.0  0.4  0:00.01 /bin/sh /sbin/hotplug-call block
 1310 root       20   0  1596   468   312 S  0.0  0.4  0:00.04 /bin/sh /sbin/hotplug-call block
 1312 root       20   0   796   360   292 S  0.0  0.3  0:00.02 /usr/sbin/6relayd -S . br-lan

root@OpenWrt:/# ps
  PID USER       VSZ STAT COMMAND
    1 root      1524 S    init
    2 root         0 SW   [kthreadd]
    3 root         0 SW   [ksoftirqd/0]
    4 root         0 SW   [kworker/0:0]
    5 root         0 SW<  [kworker/0:0H]
    6 root         0 SW   [kworker/u:0]
    7 root         0 SW<  [kworker/u:0H]
    8 root         0 SW<  [khelper]
    9 root         0 SW   [kworker/u:1]
   67 root         0 SW   [bdi-default]
   69 root         0 SW<  [kblockd]
  101 root         0 SW   [kswapd0]
  146 root         0 SW   [fsnotify_mark]
  160 root         0 SW<  [ath79-spi]
  171 root         0 SW   [mtdblock0]
  176 root         0 SW   [mtdblock1]
  181 root         0 SW   [mtdblock2]
  186 root         0 SW   [mtdblock3]
  191 root         0 SW   [mtdblock4]
  196 root         0 SW   [mtdblock5]
  201 root         0 SW   [mtdblock6]
  247 root         0 SW<  [deferwq]
  248 root         0 SW   [kworker/0:1]
  455 root         0 SWN  [jffs2_gcd_mtd4]
  457 root         0 SW   [flush-mtd-unmap]
  480 root         0 SW   [khubd]
  561 root         0 SW<  [cfg80211]
  587 root         0 SW<  [cifsiod]
  731 root      1524 S    init
  828 root      1576 S    /sbin/syslogd -C64
  830 root      1512 S    /sbin/klogd
  832 root       864 S    /sbin/hotplug2 --override --persistent --set-rules-file /etc/hotplug2.rules --
  838 root      1016 S    /sbin/procd
  841 root       864 S    /sbin/hotplug2 --override --persistent --set-rules-file /etc/hotplug2.rules --
  843 root      1012 S    /usr/sbin/6distributed
  847 root       876 S <  ubusd
  862 root      1528 S    /sbin/netifd
 1118 root      1528 S    udhcpc -p /var/run/udhcpc-eth1.pid -s /lib/netifd/dhcp.script -f -t 0 -i eth1
 1162 root      1520 S    /sbin/watchdog -t 5 /dev/watchdog
 1242 root      1560 S    {hotplug-call} /bin/sh /sbin/hotplug-call block
 1310 root      1596 S    {hotplug-call} /bin/sh /sbin/hotplug-call block
 1312 root       796 S    /usr/sbin/6relayd -S . br-lan
 1323 root      1596 R N  {hotplug-call} /bin/sh /sbin/hotplug-call block
 1709 root       788 S    /usr/sbin/odhcp6c -s/lib/ipv6/dhcpv6.sh -P0 eth1
 1827 root      1520 S    /usr/sbin/ntpd -n -p 0.openwrt.pool.ntp.org -p 1.openwrt.pool.ntp.org -p 2.ope
 1990 root      1516 S    /usr/sbin/telnetd -l /bin/login.sh
 2001 root      2984 S    /usr/sbin/uhttpd -f -h /www -r OpenWrt -x /cgi-bin -t 60 -T 30 -A 1 -n 3 -R -p
 2007 root      1024 S    /usr/sbin/vsftpd
 2036 nobody     952 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf
 2061 root      9124 S N  /usr/sbin/collectd
 2088 root      1156 S    /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 22
 2089 root      1528 S N  /bin/ash --login
 2106 root      1516 R N  ps

Attachments (0)

Change History (18)

comment:1 Changed 5 years ago by nbd

it's probably not the libubox update, since that lib is not used by busybox/hotplug2.
i'll do a fresh build to see if i can reproduce it.

in the mean time, could you please run a test with latest trunk and gcc reverted to the older version?

comment:2 Changed 5 years ago by John Thomas <openwrt-devel-2012@…>

I also experienced this bug. I went back to older builds and all is well. I am happy to test proposed solutions.

comment:3 Changed 5 years ago by hnyman <hannu.nyman@…>

Reverting gcc-4.6-linaro and gcc-4.7-linaro version bumps (r35043 and r35044) did not help. Still the full utilisation.

comment:4 Changed 5 years ago by hnyman <hannu.nyman@…>

I have narrowed down the regression range somewhat:

  • 35016 works ok
  • 35028 has already the high CPU utilisation

I built today both versions after make dirclean (and a clean .config recipe for make defconfig).

Changes between them: https://dev.openwrt.org/changeset?new=35028%40%2F&old=35016%40%2F

When looking for changes possibly affecting a ar71xx/wndr3800 build with kernel 3.6.11, there aren't so many possibilities. There some Regmap config changes, a few zram-related busybox changes. Not much else.

The Regmap changes did not have effect on my .config, as the following comparison shows. There is not much difference, mainly the two busybox utilities being built(?):

--- D:/FTP/WNDR3700/CPUloadtest/WNDR3700-trunk-r35016-2013-01-08-2221.config	Tue Jan 08 22:22:47 2013
+++ D:/FTP/WNDR3700/CPUloadtest/WNDR3700-trunk-r35028-2013-01-08-2123.config	Tue Jan 08 21:25:01 2013
@@ -3,2 +3,2 @@
-# OpenWrt version: Barrier Breaker (r35016)
-# Tue Jan  8 21:35:19 2013
+# OpenWrt version: Barrier Breaker (r35028)
+# Tue Jan  8 20:34:25 2013
@@ -333,0 +334 @@
+# CONFIG_TARGET_ramips_rt288x_BR6524N is not set
@@ -1019 +1020,2 @@
-# CONFIG_BUSYBOX_CONFIG_MKSWAP is not set
+CONFIG_BUSYBOX_CONFIG_MKSWAP=y
+# CONFIG_BUSYBOX_CONFIG_FEATURE_MKSWAP_UUID is not set
@@ -1038 +1040,2 @@
-# CONFIG_BUSYBOX_CONFIG_SWAPONOFF is not set
+CONFIG_BUSYBOX_CONFIG_SWAPONOFF=y
+# CONFIG_BUSYBOX_CONFIG_FEATURE_SWAPON_PRI is not set
@@ -1339,0 +1343 @@
+# CONFIG_PACKAGE_zram-swap is not set
@@ -1932,0 +1937 @@
+# CONFIG_PACKAGE_kmod-zram is not set

I will continue looking for the culprit changeset.

Ps. One small obervation: with a faulty version, the F10 key does not work to exit 'htop'. In a normally working version it works.


comment:5 Changed 5 years ago by hnyman <hannu.nyman@…>

Could it be some kind of conflict between several versions of "mkswap" or libraries used by it?

r35025 changed busybox to include its version of mkswap.
https://dev.openwrt.org/changeset/35025/#file0

But my system already has a /usr/sbin/mkswap in the system due the package block-mount, which depends on "swap-utils" that provides /usr/sbin/mkswap.
https://dev.openwrt.org/browser/trunk/package/block-mount/Makefile#L24
https://dev.openwrt.org/browser/trunk/package/util-linux/Makefile#L336

Thus there are now two mkswap utilities.
(An old bug #9373 mentions this.)

I have just built and flashed the version 35024 and it works ok. The regression is between 35024 and 35028.
https://dev.openwrt.org/changeset?new=35028%40%2F&old=35024%40%2F

comment:6 Changed 5 years ago by hnyman <hannu.nyman@…>

My last night's guess was right:
r35025 is the culprit for the high CPU load.

The reason is apparently some conflict between /usr/sbin/mkswap provided by swap-utils due to block-mount and /sbin/mkswap provided by busybox after r35025.

"which mkswap" says /sbin/mkswap, so the busybox version gets preferred in the path.

Manually removing symlink /sbin/mkswap (leading to busybox) fixes things. After a reboot the CPU load is normal.

I haven't yet figured out what exactly causes that incompatibility. (where does mkswap get called if zram is not enabled?)

Ps.
zram Makefile seems to have a conditional fallback for the swap-utils,

DEPENDS:=+kmod-zram +!(BUSYBOX_CONFIG_MKSWAP&&BUSYBOX_CONFIG_SWAPONOFF):swap-utils

Currently that fallback never gets turned on, as the new default busybox setting is to always enable mkswap. I wonder about the wisdom to force mkswap in busybox config. Should it be vice versa? (If swap-utils is not already enabled, then add dependency for busybox options...)

comment:7 Changed 5 years ago by hnyman <hannu.nyman@…>

I found an additional way to cure this:

If I enable also the "UUID support" option for the busybox mkswap, the CPU utilisation gets fixed even if the busybox mkswap is present in the path. Apparently the missing support for UUID is the underlying cause for the failure and something getting stuck.

Busybox mkswap UUID option, 'n' as the current default:
https://dev.openwrt.org/browser/trunk/package/busybox/config/util-linux/Config.in#L488

Busybox mkswap option, currently on by default:
https://dev.openwrt.org/browser/trunk/package/busybox/config/util-linux/Config.in#L475

Enabling UUID support adds some 350 bytes to the binary:

Busybox mkswap, but NO UUID included
root@OpenWrt:~# ls -l /bin/busybox
-rwxr-xr-x 1 root root 444272 Jan 9 19:44 /bin/busybox

Busybox mkswap with UUID option enabled:
root@OpenWrt:~# ls -l /bin/busybox
-rwxr-xr-x 1 root root 444624 Jan 9 21:13 /bin/busybox

I will probably turn busybox mkswap off for my build, but this conflict caused by the current defaults should be fixed.

comment:8 Changed 5 years ago by hnyman <hannu.nyman@…>

Well, I compiled busybox mkswap turned off with:

# CONFIG_BUSYBOX_CONFIG_MKSWAP is not set

And that failed. There was no busybox mkswap, but there was still high CPU utilisation after boot.

Then I compiled with:

CONFIG_BUSYBOX_CONFIG_MKSWAP=y
CONFIG_BUSYBOX_CONFIG_FEATURE_MKSWAP_UUID=y

and that succeeded. Normal CPU usage after boot.

After that I am not quite sure about the real reason for the symptoms, is it a conflict or what, but just hope that adding the UUID support will continue to work for me...

comment:9 Changed 5 years ago by hnyman <hannu.nyman@…>

Actually, one more notion:

  • with the high CPU utilisation, there are always two /sbin/hotplug2 processes shown by ps. (832 and 841 in the example above).
  • in a normally working router, there is just one such process.

comment:10 Changed 5 years ago by hnyman <hannu.nyman@…>

The more I consider that "two /sbin/hotplug2 processes" thing, the more I think that my previous assumption about mkswap is wrong.

From the example above:

  830 root      1512 S    /sbin/klogd
  832 root       864 S    /sbin/hotplug2 --override --persistent --set-rules-file /etc/hotplug2.rules --
  838 root      1016 S    /sbin/procd
  841 root       864 S    /sbin/hotplug2 --override --persistent --set-rules-file /etc/hotplug2.rules --
  843 root      1012 S    /usr/sbin/6distributed
...
 1323 root      1596 R N  {hotplug-call} /bin/sh /sbin/hotplug-call block

There are two "/sbin/hotplug2 --override" proceses and procd is between them in the ps output. And the only place in the source code with "/sbin/hotplug2 --override" is the line 88 in the start() function in /etc/init.d/boot (of base-files)
https://dev.openwrt.org/browser/trunk/package/base-files/files/etc/init.d/boot#L87

As far as I understand, the boot start() should be called only once, right?

Additionally, there is "killall -q hotplug2" on the previous line 87, so any possible other hotplug2 processes should be killed before launching a new one, right? How can there be two active processes at the same time?

The hang happens later with "hotplug-call" from process 1323, but why there are two "/sbin/hotplug2 --override" processes in the first place? Maybe the presence of several hotplug2 processes causes some lock to hang.

comment:11 Changed 5 years ago by jow

Note that there's hotplug-call lingering around which processes events for the "block" subsystem.
Do you have any references to mkswap in /etc/hotplug.d/block/ ?

comment:12 Changed 5 years ago by hnyman <hannu.nyman@…>

I have not introduced any calls to mkswap and I can't find such ones from the Openwrt source code in trunk. (I haven't yet check from the build_dir or /staging_dir in buildroot, if some packages introduce calls)

But from that perspective, maybe one of the fstab defaults from block-mount is a suspect:
my /etc/config/fstab includes the defaults from block-mount, which is:
https://dev.openwrt.org/browser/trunk/package/block-mount/files/fstab.config

I have modified the "mount" section, but I have not touched the others.

Could it be that either the disabled "swap" section or the "autoswap" section causes a hangup if the busybox mkswap without UUID support gets called? If that is so, then the defaults in block-mount should maybe be modified to reflect incompatibility with the new busybox defaults, or the block-mount scripts should be modified to explicitly call its "own" mkswap from util-linux. https://dev.openwrt.org/browser/trunk/package/block-mount/files?order=name

Autoswap is not documented to any large extent:
http://wiki.openwrt.org/doc/uci/fstab#autoswap

I will test with a failing firmware the removal of those sections.

comment:13 Changed 5 years ago by hnyman <hannu.nyman@…>

editing /etc/config/fstab did not help. Removing those sections has no effect on the problem. Additionally I tested removing the hotplug files from/etc/hotplug.d/block one by one, but no effect.

In all the failing scenarios there were those two hotplug2 daemons running at the same time. Any ideas how to debug the creation of them?

comment:14 Changed 5 years ago by cchhat01@…

I will add my own experience in this as well.
I too am often hit by the high utilization by hotplug2 call. However, for me, the situation is a bit more random. As in if I make a few changes to the router config (etc, wireless config for example), and reboot, I *MAY* find a hotplug call using up 100% CPU, or maybe not.
My solution has simply been to reboot the device and hope that it doesn't cause the hotplug call.

I'm currently running build r35330 on a Netgear WNDR3700v2 and a Buffalo WZR-HP-AG300H (in wds client and ap mode configuration through wireless pseudobridge via relayd, respectively).

I can share my openwrt .config files (they are identical, with the exception of the WNDR3700v2 leds modules).

.config: https://docs.google.com/file/d/0B8cmL6zE2dvDNksybk1xNk5JZGs/edit?usp=sharing

Hope I can help in getting this sorted out.

comment:15 Changed 5 years ago by arokh <trondah@…>

Same issue here. Setting CONFIG_BUSYBOX_CONFIG_FEATURE_MKSWAP_UUID does not help.

comment:16 follow-up: Changed 5 years ago by nbd

please test if r35857/r35859 fixes the issue

comment:17 in reply to: ↑ 16 Changed 5 years ago by hnyman <hannu.nyman@…>

Replying to nbd:

please test if r35857/r35859 fixes the issue

I have built r35864 in two different ways:

  • with CONFIG_BUSYBOX_CONFIG_FEATURE_MKSWAP_UUID. Works normally in my WNDR3800
  • without CONFIG_BUSYBOX_CONFIG_FEATURE_MKSWAP_UUID. Causes steady high CPU usage by the hotplug process fir my WNDR3700v1.

So, I think the changes did not fix this bug.

root@OpenWrt:~# htop

  CPU[|||||||||||||||||||||||||||||||||100.0%]     Tasks: 27, 4 thr; 2 running
  Mem[|||||||||||||||||||||||||       19/60MB]     Load average: 1.09 0.79 0.36
  Swp[                                  0/0MB]     Uptime: 00:05:50

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
 1497 root       20   0  1600   392   228 R 96.2  0.6  4:26.06 /bin/sh /sbin/hotplug-call block
 2301 root       20   0  1352   748   564 R  1.9  1.2  0:00.16 htop
 1575 root       20   0  1668   528   424 S  0.6  0.9  0:01.51 hostapd -P /var/run/wifi-phy0.pid -B
  827 root       20   0   864   392   264 S  0.0  0.6  0:01.15 /sbin/hotplug2 --override --persiste
 2286 root       20   0  1220   524   380 S  0.0  0.9  0:01.25 /usr/sbin/dropbear -F -P /var/run/dr
    1 root       20   0  1524   432   356 S  0.0  0.7  0:04.13 init
  509 root       20   0  1524   232   156 S  0.0  0.4  0:00.00 init
  823 root       20   0  1576   372   304 S  0.0  0.6  0:00.02 /sbin/syslogd -C64
  825 root       20   0  1512   292   224 S  0.0  0.5  0:00.06 /sbin/klogd
  833 root       20   0   864   296   168 S  0.0  0.5  0:00.02 /sbin/hotplug2 --override --persiste
  834 root       20   0  1016   356   228 S  0.0  0.6  0:00.08 /sbin/procd
  843 root        0 -20   876   268   208 S  0.0  0.4  0:00.04 ubusd
  853 root       20   0  1576   732   504 S  0.0  1.2  0:00.28 /sbin/netifd
  972 root       20   0   796   332   268 S  0.0  0.5  0:00.04 odhcp6c -s /lib/netifd/dhcpv6.script
  988 root       20   0  1520   412   336 S  0.0  0.7  0:00.03 udhcpc -p /var/run/udhcpc-eth1.pid -
 1092 root       20   0  1520   324   252 S  0.0  0.5  0:00.00 /sbin/watchdog -t 5 /dev/watchdog
 1290 root       20   0   796   352   280 S  0.0  0.6  0:00.01 /usr/sbin/6relayd -Rserver -Dserver
 1458 root       20   0  1564   468   348 S  0.0  0.8  0:00.01 /bin/sh /sbin/hotplug-call block
 1491 root       20   0  1600   476   312 S  0.0  0.8  0:00.03 /bin/sh /sbin/hotplug-call block

comment:18 Changed 3 years ago by nbd

  • Resolution set to fixed
  • Status changed from new to closed

hotplug2 is gone, and so is the bug.

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.