Modify

Opened 2 years ago

Last modified 2 years ago

#21705 new defect

Segmentation Faults when using Swap on ar71xx (zram)

Reported by: mt@… Owned by: developers
Priority: normal Milestone:
Component: kernel Version: Chaos Calmer 15.05
Keywords: swap, zram Cc:

Description

Hi,

when using zram-swap and memory usage is high random applications start to crash. At first I suspected it's some problem with the zram Block device that get's corrupted but I can write and read data just fine on the device and I'm able to fill it with data and read that back without problems. So the issue seems to be somewhere related to swapping.

The happens on current trunk (4.1.25) as well as current ChaosCalmer.

Here is a typical result with lots of segfaults (with
kernel.print-fatal-signals=1):

[  223.902219] do_page_fault(): sending SIGSEGV to netifd for invalid
write access to 00000000
[  223.910852] epc = 77398a6c in libc.so[7731c000+92000]
[  223.916159] ra  = 77398a6c in libc.so[7731c000+92000]
[  223.921440]
[  224.281923] potentially unexpected fatal signal 11.
[  224.286990] CPU: 0 PID: 1110 Comm: netifd Not tainted 4.1.15 #2
[  224.293176] task: 81932ef8 ti: 816f6000 task.ti: 816f6000
[  224.298750] $ 0   : 00000000 00000000 00000003 00000000
[  224.304450] $ 4   : 773a88ed ff456461 80808080 fefefeff
[  224.309887] $ 8   : f73abb54 00000000 00004000 8eac4000
[  224.315370] $12   : 0000000a 00000000 0000000e 00023ab1
[  224.320808] $16   : 773a88ed 773abb55 7fbe7048 7fbe6f50
[  224.326257] $20   : 00000000 773abb55 00000062 00400019
[  224.331687] $24   : 00000000 7738d8d0
[  224.337143] $28   : 773c6320 7fbe6ec8 7fbe7048 77398a6c
[  224.342821] Hi    : 0000003a
[  224.345799] Lo    : 00000012
[  224.348782] epc   : 77398a6c 0x77398a6c
[  224.352815] ra    : 77398a6c 0x77398a6c
[  224.356776] Status: 0100f413 USER EXL IE
[  224.360951] Cause : 0080000c
[  224.363953] BadVA : 00000000
[  224.366928] PrId  : 0001974c (MIPS 74Kc)
[  230.081537]
[  230.081537] do_page_fault(): sending SIGSEGV to sh for invalid read
access from 00000000
[  230.090122] epc = 00000000 in busybox[400000+46000]
[  230.095227] ra  = 00000000 in busybox[400000+46000]
[  230.100289]
[  230.791874] potentially unexpected fatal signal 11.
[  230.796942] CPU: 0 PID: 3982 Comm: sh Not tainted 4.1.15 #2
[  230.802763] task: 8071da18 ti: 81cc0000 task.ti: 81cc0000
[  230.808342] $ 0   : 00000000 00000001 00000f8f 00000000
[  230.814067] $ 4   : ffffffff 7fad75fc 00000000 00000000
[  230.819506] $ 8   : 00000000 80064f20 803c9154 803244d8
[  230.824962] $12   : 0000001a 00000013 0000000e 00000007
[  230.830626] $16   : 00000000 00000000 00000000 00000000
[  230.836100] $20   : 00000000 77fba000 7fad84f4 77fbd490
[  230.841529] $24   : 00000001 77f936c0
[  230.847122] $28   : 00000000 7fad75a8 00000000 00000000
[  230.852606] Hi    : 00000000
[  230.855580] Lo    : 00000007
[  230.858556] epc   : 00000000   (null)
[  230.862363] ra    : 00000000   (null)
[  230.866143] Status: 0100f413 USER EXL IE
[  230.870318] Cause : 00800008
[  230.873317] BadVA : 00000000
[  230.876449] PrId  : 0001974c (MIPS 74Kc)
[  244.728483] potentially unexpected fatal signal 11.
[  244.733635] CPU: 0 PID: 3976 Comm: ubusd Not tainted 4.1.15 #2
[  244.739663] task: 8071f430 ti: 81496000 task.ti: 81496000
[  244.745275] $ 0   : 00000000 00000000 2462ffff 779ba020
[  244.750739] $ 4   : 779ba000 00000001 779ba020 00000000
[  244.756405] $ 8   : 00000000 0000f400 00000011 85000014
[  244.761891] $12   : 00077469 00000000 00000000 6d656f75
[  244.767322] $16   : 779b29b0 779ba000 77a590b0 77a57538
[  244.772961] $20   : 77a54000 77a54000 7fdfaaf4 77a57490
[  244.778398] $24   : 00000000 779d0560
[  244.783873] $28   : 779ba020 7fdfaa30 00000000 7799dbf9
[  244.789312] Hi    : 00399e63
[  244.792307] Lo    : 49be433d
[  244.795285] epc   : 7799dce9 0x7799dce9
[  244.799322] ra    : 7799dbf9 0x7799dbf9
[  244.803335] Status: 0100f413 USER EXL IE
[  244.807510] Cause : 00800010
[  244.810485] BadVA : 24630003
[  244.813650] PrId  : 0001974c (MIPS 74Kc)
[  248.265148]

Attachments (0)

Change History (8)

comment:1 Changed 2 years ago by mt@…

To reproduce: Enable zram-swap and use a lot of memory - now crashes start appearing. Setting sysctl -w kernel.print-fatal-signals=1 helps with seeing them.

comment:2 Changed 2 years ago by diizzyy@…

How much memory is on your device and how much are you actually using? While zram do compression I image that it cannot do wonders and you're probably seeing this because its running out of ram. That said, it should tell you that you're out of ram rather than crashing.

comment:3 Changed 2 years ago by mt@…

@diizzyy

It's 32mb memory - usable are around 28mb and zram by default uses 14mb for swap. This usally compresses to 1/3 so at most 4-5mb memory should be used by zram.

I've thought this and experimented with limited the amount zram can use: there is
/sys/block/zram0/limit but it had no effect

And I've reduced zram usage to 10 and 4mb for the swap device. /proc/meminfo says 8mb are free and having only 4mb zram would use no more than 1-2mb memory.

However independent of the amount of swap or zram I've see the crashes. This also happyens when just using luci and low memory pressure albeit not as often.

I suspect it's some bug related to swap and not directly related to zram. I've filled a 5mb zram device sucessfully with /dev/urandom and reading that back works fine.

The crashes are happening on do_page_fault in the kernel, so swapped out pages are accessed and something seems to wrong.

There seems to be issues - here is an unmerged patch that mentions crashes when using swap: http://patchwork.linux-mips.org/patch/7615/

I'm not a kernel dev by any means and applying the patch series did not work out of the box so I'm not sure if that's related.


comment:4 Changed 2 years ago by anonymous

I've also tried bulding a kernel with CONFIG_HIGHMEM=n - as I've saw something in the commit logs that could be related: http://git.openwrt.org/?p=openwrt.git;a=commit;h=230601f4a481927ed8c6aff8df24a0c0af1efc44

zsmalloc uses a highmem allocator by default but I'm really not sure if that highmem stuff is used with 32mb anyway - this is something that should only affect routers with 256mb or more memory.

comment:5 Changed 2 years ago by Nilfred

Same here DESIGNATED DRIVER (Bleeding Edge, r48648):

echo $((2 * 256 * 4096)) > /sys/block/zram0/disksize
mkswap /dev/zram0
Setting up swapspace version 1, size = 2093056 bytes
swapon -p 5 /dev/zram0
free
             total       used       free     shared    buffers     cached
Mem:         28304      23980       4324        804       2324       5340
-/+ buffers/cache:      16316      11988
Swap:         2044        752       1292
cat /sys/block/zram0/compr_data_size 
183414

Was I did to get this:

Sun Feb 21 11:33:40 2016 kern.info kernel: [ 3277.133648] zram0: detected capacity change from 0 to 2097152
Sun Feb 21 11:33:40 2016 kern.info kernel: [ 3277.606150] Adding 2044k swap on /dev/zram0.  Priority:5 extents:1 across:2044k SS
Sun Feb 21 11:39:28 2016 kern.info kernel: [ 3625.512938] 
Sun Feb 21 11:39:28 2016 kern.info kernel: [ 3625.512938] do_page_fault(): sending SIGSEGV to odhcpd for invalid read access from 008082a8
Sun Feb 21 11:39:28 2016 kern.info kernel: [ 3625.512963] epc = 004062b5 in odhcpd[400000+b000]
Sun Feb 21 11:39:28 2016 kern.info kernel: [ 3625.514766] ra  = 004062a7 in odhcpd[400000+b000]
Sun Feb 21 11:39:28 2016 kern.info kernel: [ 3625.516718] 
...
Sun Feb 21 11:50:28 2016 kern.info kernel: [ 4284.910718] 
Sun Feb 21 11:50:28 2016 kern.info kernel: [ 4284.910718] do_page_fault(): sending SIGSEGV to sleep for invalid read access from 00449140
Sun Feb 21 11:50:28 2016 kern.info kernel: [ 4284.910745] epc = 00449140 in busybox[459000+1000]
Sun Feb 21 11:50:28 2016 kern.info kernel: [ 4284.912623] ra  = 004039f4 in busybox[400000+49000]
Sun Feb 21 11:50:28 2016 kern.info kernel: [ 4284.914757] 
Sun Feb 21 11:50:31 2016 kern.info kernel: [ 4288.465950] 
Sun Feb 21 11:50:31 2016 kern.info kernel: [ 4288.465950] do_page_fault(): sending SIGSEGV to ntpd for invalid read access from 00000000
Sun Feb 21 11:50:31 2016 kern.info kernel: [ 4288.465976] epc = 00000000 in busybox[400000+49000]
Sun Feb 21 11:50:31 2016 kern.info kernel: [ 4288.467956] ra  = 00000000 in busybox[400000+49000]
Sun Feb 21 11:50:31 2016 kern.info kernel: [ 4288.470084] 
...
Sun Feb 21 11:57:22 2016 kern.warn kernel: [ 4699.657629] zram: 10268 (cat) Attribute compr_data_size (and others) will be removed. See zram documentation.
...
Sun Feb 21 11:59:10 2016 kern.info kernel: [ 4806.768964] 
Sun Feb 21 11:59:10 2016 kern.info kernel: [ 4806.768964] do_page_fault(): sending SIGSEGV to lua for invalid read access from 77cfb278
Sun Feb 21 11:59:10 2016 kern.info kernel: [ 4806.768991] epc = 7717de10 in libc.so[77108000+90000]
Sun Feb 21 11:59:10 2016 kern.info kernel: [ 4806.771143] ra  = 770ec05f in liblua.so.5.1.5[770da000+2c000]
Sun Feb 21 11:59:10 2016 kern.info kernel: [ 4806.774441] 
...
Sun Feb 21 14:34:28 2016 kern.info kernel: [14125.581874] 
Sun Feb 21 14:34:28 2016 kern.info kernel: [14125.581874] do_page_fault(): sending SIGSEGV to uhttpd for invalid write access to 00007488
Sun Feb 21 14:34:28 2016 kern.info kernel: [14125.581901] epc = 77931ba1 in libubox.so[7792e000+16000]
Sun Feb 21 14:34:28 2016 kern.info kernel: [14125.584365] ra  = 77931b7f in libubox.so[7792e000+16000]
Sun Feb 21 14:34:28 2016 kern.info kernel: [14125.586928] 

Just testing, I didn't install zram-swap. Should I test something else?

comment:6 Changed 2 years ago by bittorf@…

this is a known issue with kernel 4.1 - please wait till the target is at kernel 4.4 - see ticket #21705

comment:7 Changed 2 years ago by anonymous

This is ticket #27105! :-)

comment:8 Changed 2 years ago by anonymous

...I mean, this is ticket #21705!

Add Comment

Modify Ticket

Action
as new .
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.