Modify

Opened 8 years ago

Closed 6 years ago

Last modified 4 years ago

#6728 closed defect (fixed)

AR71xx/WRT160NL/MIPS Kernel 2.6.32.8 SLUB: Unable to allocate memory on node -1

Reported by: shortcolin Owned by: juhosg
Priority: normal Milestone: Barrier Breaker 14.07
Component: kernel Version: Trunk
Keywords: WRT160NL MIPS Cc:

Description

Error messages are appearing in the kernel logs in recent builds of OpenWRT for the WRT160NL. I've only noticed these recently, but not in earlier kernels.

This is using a Linksys WRT160NL (MIPS) and kernel 2.6.32.8. I built the code 22nd Feb 2010 from openwrt trunk SVN revision 19815.

I enabled ksyms and so get the stacktrace below.

The "SLUB: Unable to allocate memory" message was discussed here; with a patch

http://osdir.com/ml/linux-kernel/2009-06/msg05521.html

....but I notice that this patch has been applied in this kernel build (file trunk/build_dir/linux-ar71xx/linux-2.6.32.8/mm/slub.c)

A similar report seems to be here:

http://lkml.indiana.edu/hypermail/linux … 01106.html

The issue has been discussed a couple of times with regard to the NSLU2 ARM build - I'm curious as to why I only recently (seem) to be seeing it on MIPS.

Thanks,

Colin


skbuff alloc of size 3872 failed
e2fsck: page allocation failure. order:1, mode:0x4020
Call Trace:
[<800682bc>] dump_stack+0x8/0x34
[<800b5350>] __alloc_pages_nodemask+0x508/0x578
[<800d63b8>] __slab_alloc+0x194/0x3e0
[<800d689c>] __kmalloc_track_caller+0x114/0x16c
[<801db214>] __alloc_skb+0x7c/0x150
[<81f12038>] ath_rxbuf_alloc+0x38/0xb8 [ath]
[<80cc55a0>] ath_rx_tasklet+0x228/0x54c [ath9k]
[<80cc477c>] ath9k_tasklet+0x60/0xec [ath9k]
[<80081ff4>] tasklet_action+0x88/0xe4
[<800827e8>] __do_softirq+0xb0/0x148
[<800828c8>] do_softirq+0x48/0x6c
[<8006082c>] ret_from_irq+0x0/0x4
[<80063ffc>] __copy_user+0x38/0x2bc
[<800adc28>] file_read_actor+0xa0/0x120
[<800b0dc0>] generic_file_aio_read+0x3ec/0x70c
[<800d9400>] do_sync_read+0xd4/0x13c
[<800da1d8>] sys_read+0x58/0xa0
[<80062544>] stack_done+0x20/0x3c

Mem-Info:
Normal per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
active_anon:1063 inactive_anon:1660 isolated_anon:0
active_file:389 inactive_file:794 isolated_file:0
unevictable:0 dirty:0 writeback:0 unstable:0
free:188 slab_reclaimable:169 slab_unreclaimable:2054
mapped:242 shmem:6 pagetables:63 bounce:0
Normal free:752kB min:720kB low:900kB high:1080kB active_anon:4252kB inactive_anon:6640kB active_file:1556kB inactive_file:3176kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:32512kB mlocked:0kB dirty:0kB writeback:0kB mapped:968kB shmem:24kB slab_reclaimable:676kB slab_unreclaimable:8216kB kernel_stack:328kB pagetables:252kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0
Normal: 158*4kB 3*8kB 4*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 752kB
1796 total pagecache pages
599 pages in swap cache
Swap cache stats: add 5583, delete 4984, find 1105/1478
Free swap  = 501464kB
Total swap = 506036kB
8192 pages RAM
831 pages reserved
1455 pages shared
5922 pages non-shared
SLUB: Unable to allocate memory on node -1 (gfp=0x20)
  cache: kmalloc-8192, object size: 8192, buffer size: 8192, default order: 3, min order: 1
  node 0: slabs: 0, objs: 0, free: 0

cat /proc/cpuinfo
system type        : Atheros AR9130 rev 2
machine            : Linksys WRT160NL
processor        : 0
cpu model        : MIPS 24Kc V7.4
BogoMIPS        : 266.24
wait instruction    : yes
microsecond timers    : yes
tlb_entries        : 16
extra interrupt vector    : yes
hardware watchpoint    : yes, count: 4, address/irw mask: [0x0000, 0x0ff8, 0x0ff8, 0x0ff8]
ASEs implemented    : mips16
shadow register sets    : 1
core            : 0
VCED exceptions        : not available
VCEI exceptions        : not available



cat /proc/meminfo 
MemTotal:          29444 kB
MemFree:           11064 kB
Buffers:            1168 kB
Cached:             2656 kB
SwapCached:          244 kB
Active:             2316 kB
Inactive:           2880 kB
Active(anon):        772 kB
Inactive(anon):      628 kB
Active(file):       1544 kB
Inactive(file):     2252 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:        506036 kB
SwapFree:         505172 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:          1232 kB
Mapped:              996 kB
Shmem:                28 kB
Slab:               8880 kB
SReclaimable:        508 kB
SUnreclaim:         8372 kB
KernelStack:         376 kB
PageTables:          300 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      520756 kB
Committed_AS:       5748 kB
VmallocTotal:    1048372 kB
VmallocUsed:         868 kB
VmallocChunk:    1034480 kB

Attachments (0)

Change History (25)

comment:1 Changed 8 years ago by shortcolin

This report looks suspiciously similar to this one:

http://old.nabble.com/MIPS-kernel-snapshots---2.6.32-rc6-td26316291.html

from a Debian MIPS discussion 2.6.32

The router did lock up over the weekend with no network, using the 2.6.32.7 kernel. Unfortunately I didn't have a handy serial cable to debug. This may or may not be related to this report.

comment:2 Changed 8 years ago by Colin Paton <shortcolin>

I'm not sure if this is a satisfactory fix or not, but it stopped my error messages...

The SLUB allocator seems more suited to multiprocessor type systems - which I guess the WRT160NL is not. I did 'make kernel_menuconfig' and changed the kernel config option to use the SLOB allocator, which appears to be running OK.

Colin

comment:3 Changed 8 years ago by juhosg

  • Owner changed from developers to juhosg
  • Status changed from new to assigned

comment:4 Changed 8 years ago by dirtyfreebooter <openwrt@…>

I am using a D-LINK DIR815-B1 and using Backfire or Trunk from SVN I see the same issues. Both SLUB and SLAB seemed to result in these errors. Using SLOB seems to have stopped them.

CPU Information:

system type		: Atheros AR7161 rev 2
machine			: D-Link DIR-825 rev. B1
processor		: 0
cpu model		: MIPS 24Kc V7.4
BogoMIPS		: 452.19
wait instruction	: yes
microsecond timers	: yes
tlb_entries		: 16
extra interrupt vector	: yes
hardware watchpoint	: yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ff8]
ASEs implemented	: mips16
shadow register sets	: 1
core			: 0
VCED exceptions		: not available
VCEI exceptions		: not available

comment:5 Changed 7 years ago by arokh <trondah@…>

Same issue here, will try SLOB...

comment:6 Changed 7 years ago by anonymous

Happens often with backfire latest kernels - 2.6.32.25 and 2.6.32.27. Trying SLOB and getting all the same:

Mem-Info:
Normal per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
active_anon:299 inactive_anon:680 isolated_anon:0
 active_file:774 inactive_file:1462 isolated_file:0
 unevictable:0 dirty:0 writeback:0 unstable:0
 free:68 slab_reclaimable:0 slab_unreclaimable:0
 mapped:529 shmem:38 pagetables:63 bounce:0
Normal free:272kB min:720kB low:900kB high:1080kB active_anon:1196kB inactive_anon:2720kB active_file:3096kB inactive_file:5848kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:32512kB mlocked:0kB dirty:0kB writeback:0kB mapped:2116kB shmem:152kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:360kB pagetables:252kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 272kB
2356 total pagecache pages
82 pages in swap cache
Swap cache stats: add 2051, delete 1969, find 820/1183
Free swap  = 255912kB
Total swap = 257000kB
8192 pages RAM
806 pages reserved
1977 pages shared
6370 pages non-shared
skbuff alloc of size 3872 failed
vsftpd: page allocation failure. order:1, mode:0x4020
Call Trace:[<80068374>] 0x80068374
[<80068374>] 0x80068374
[<800b39fc>] 0x800b39fc
[<800d433c>] 0x800d433c
[<800d4258>] 0x800d4258
[<800d4604>] 0x800d4604
[<801dcd38>] 0x801dcd38
[<814fc038>] 0x814fc038
[<815c5e0c>] 0x815c5e0c
[<801e67e8>] 0x801e67e8
[<815c4c74>] 0x815c4c74
[<8008216c>] 0x8008216c
[<80082960>] 0x80082960
[<801dd6c4>] 0x801dd6c4
[<80082a40>] 0x80082a40
[<80211e20>] 0x80211e20
[<80082cc8>] 0x80082cc8
[<801e7130>] 0x801e7130
[<802121bc>] 0x802121bc
[<802119a4>] 0x802119a4
[<800d433c>] 0x800d433c
[<802268f4>] 0x802268f4
[<8022451c>] 0x8022451c
[<801dcd38>] 0x801dcd38
[<80226d7c>] 0x80226d7c
[<80068aac>] 0x80068aac
[<802298e8>] 0x802298e8
[<80226f34>] 0x80226f34
[<8021acdc>] 0x8021acdc
[<800f762c>] 0x800f762c
[<800f6338>] 0x800f6338
[<800f63c0>] 0x800f63c0
[<800f6450>] 0x800f6450
[<800f6338>] 0x800f6338
[<800f6874>] 0x800f6874
[<800f692c>] 0x800f692c
[<800f69e8>] 0x800f69e8
[<800f6338>] 0x800f6338
[<800f6f34>] 0x800f6f34
[<800f7728>] 0x800f7728
[<800f7728>] 0x800f7728
[<800f78bc>] 0x800f78bc
[<800f6f10>] 0x800f6f10
[<800d6f08>] 0x800d6f08
[<800a80fc>] 0x800a80fc
[<800d7118>] 0x800d7118
[<80062544>] 0x80062544
[<80087b1c>] 0x80087b1c

comment:7 follow-up: Changed 7 years ago by arokh <trondah@…>

I've found a solution: Increase vm.min_free_kbytes until you don't get allocation failures any more. It controls the minimum memory the kernel should keep for emergency reserve allocations. On my 64MB WNDR3700 it seems fine on 4000. The default is 1000.

comment:8 in reply to: ↑ 7 Changed 7 years ago by sniperpr@…

Replying to arokh <trondah@…>:

I've found a solution: Increase vm.min_free_kbytes until you don't get allocation failures any more. It controls the minimum memory the kernel should keep for emergency reserve allocations. On my 64MB WNDR3700 it seems fine on 4000. The default is 1000.

HOW TO DO IT?
thanks.

comment:9 Changed 7 years ago by arokh <trondah@…>

In /etc/sysctl.conf insert:

vm.min_free_kbytes=4000

Then reboot.

comment:10 Changed 7 years ago by dirtyfreebooter <openwrt-devel@…>

I use these settings, which seem to prevent allocation errors (I use transmission-daemon, afpd), and a 128MB swap partition on a USB disk. A minimum of 2MB. 4MB seems a bit excessive.

sysctl.conf:

vm.min_free_kbytes=2048
vm.swappiness=95
vm.vfs_cache_pressure=999
vm.overcommit_memory=2
vm.overcommit_ratio=50

comment:11 Changed 7 years ago by arokh <trondah@…>

Actually, 2MB doesn't cut it for me I still get allocation failures. With 4MB it's gone and I don't need swap or to mess with the VM in any other way.

Isn't 200 the limit for vfs_cache_pressure? Anyways, I think your performance is going to be very poor if you tell your VM to reclaim cache and swap out everything.

comment:12 Changed 7 years ago by dirtyfreebooter <openwrt-devel@…>

95% of my memory usage is from transmission (i.e. open files)... I personally don't mind if transmission gets bumped by swap. To me, transmission is second to any other daemon/process on the router.. So I was trying to setup the VM to want to give back FS cache sooner.

As far as 200 limit goes, the way it is now in the kernel, there is no limit.. but 200 or 999 will probably result in the same behavior. I accidentally copied the values from an experimental sysctl.conf, I wanted to experiment with more aggressive values.

Here is my default sysctl.conf:

vm.min_free_kbytes=2048
vm.swappiness=90
vm.vfs_cache_pressure=200
vm.overcommit_memory=2
vm.overcommit_ratio=50

comment:13 Changed 7 years ago by XAK

I can also confirm this issue in r24915 using NetGear WNDR3700 (ar71xx, MIPS).

slab_cache (SUnreclaim in /proc/meminfo) is constantly growing so I can't last more than one day without a need for reboot.
Here is how this issue looks in Munin graphs:

http://dl.dropbox.com/u/2188438/wndr3700-slab-memory-day.png

comment:14 Changed 7 years ago by arokh <trondah@…>

I've completely remedied the situation for me by using this value in /etc/sysctl.conf:

vm.min_free_kbytes=6000

What it does is trying to make sure there is always 6MB of RAM available for emergency allocation. No need for adjusting cache_pressure or other values.

This only seems to be an issue when using very memory intensive applications, in my case it's NZBGet usenet downloader.

comment:15 Changed 7 years ago by anonymous

Nothing changed after switch to SLOB allocator in kernel config, but I've managed to create very minimal firmware installation with only a few packages that didn't leak SLAB memory. I'll try to add remaining packages one by one to find which one was the cause of this memory leak.

Here is the new graphs of memory usage:
http://dl.dropbox.com/u/2188438/wndr3700_slab_vs_slob.png

comment:16 Changed 7 years ago by arokh <trondah@…>

Try setting vm.min_free_kbytes=5000 in /etc/sysctl.conf

comment:17 Changed 7 years ago by Denis Gryzlov <gryzlov@…>

I did that (set it to 8MB), but it will only prevent device from collapsing due to a lack of free memory. The slab memory usage is constantly growing despite the min_free_kbytes setting.

comment:18 Changed 7 years ago by ddxx0n

There is also the patch from https://patchwork.kernel.org/patch/104271/ set to fix the problem, though vm.min_free_kbytes=4096 does the job, too, with transmission on my WNDR3700.

comment:19 follow-up: Changed 7 years ago by Denis Gryzlov <gryzlov@…>

There is plenty of RAM on WNDR3700, so allocator was not the problem.
My issue was related to l7-filters for QoS, after turning them off that annoying memory leaks are now gone:

/ticket/8590.html

comment:20 Changed 7 years ago by arokh

@ddxx0n

That patch is from july 2010, it does not apply to updated ath9k drivers.

comment:21 in reply to: ↑ 19 Changed 7 years ago by dirtyfreebooter <openwrt-devel@…>

Replying to Denis Gryzlov <gryzlov@…>:

There is plenty of RAM on WNDR3700, so allocator was not the problem.
My issue was related to l7-filters for QoS, after turning them off that annoying memory leaks are now gone:

/ticket/8590.html

I don't even use QoS and it still happens to me once in a while. Even with vm.min_free_kbytes=4096 (took a lot longer [weeks] with this setting to trigger it)

comment:22 Changed 7 years ago by tuigje

Update:

I haven't seen this one in at least a few weeks yet, while running more
applications than ever, so it seems to have been solved

comment:23 Changed 7 years ago by Denis Gryzlov <gryzlov@…>

I also can't reproduce any memory leaks while heavily using OpenWRT (svn trunk) on WDNR3700 with QoS enabled

comment:24 Changed 6 years ago by nbd

  • Resolution set to fixed
  • Status changed from accepted to closed

comment:25 Changed 4 years ago by jow

  • Milestone changed from Attitude Adjustment 12.09 to Barrier Breaker 14.07

Milestone Attitude Adjustment 12.09 deleted

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.