Modify

Opened 8 years ago

Closed 8 years ago

Last modified 4 years ago

#5996 closed defect (fixed)

wifi not work in r18047 brcm47xx

Reported by: oleg100@… Owned by: developers
Priority: normal Milestone: Barrier Breaker 14.07
Component: packages Version: Trunk
Keywords: wifi brcm47xx bug Cc:

Description

wifi start. After ping router down.

Attachments (2)

dmesg.txt (24.2 KB) - added by oleg100@… 8 years ago.
dmesg output
dmesg_ds.txt (3.5 KB) - added by anonymous 8 years ago.
b43 bug with debug symbols

Download all attachments as: .zip

Change History (40)

Changed 8 years ago by oleg100@…

dmesg output

comment:1 follow-up: Changed 8 years ago by acoul <alex at ozo.com>

can you find latest working trunk?

comment:2 in reply to: ↑ 1 Changed 8 years ago by umiki (at) med.unideb.hu

Replying to acoul <alex at ozo.com>:

can you find latest working trunk?

I believe it is since the compat-wireless is updated, at least the oops only appear in the log after the wireless is activated. I had working 2.6.30.8 on my wl-500gpV1, so the fault is introduced between the kernel and the compat-wireless update (or I would say at the latter).

comment:3 follow-up: Changed 8 years ago by acoul <alex at ozo.com>

you may do some tests if you feel like and report back here. You can check the latest compat-wireless and perhaps try a latest 2.6.30.x or 2.6.31.x kernel (should compile fine either of them under brcm47xx)

you will need to adjust the following files:

package/mac80211/Makefile
target/linux/brcm47xx/Makefile

comment:4 Changed 8 years ago by mludi

I have repoduced the same bug with debug symbols, I will attach the resulting kmsgs with the file dmesg_ds.txt

Changed 8 years ago by anonymous

b43 bug with debug symbols

comment:5 in reply to: ↑ 3 Changed 8 years ago by anonymous

Replying to acoul <alex at ozo.com>:

you may do some tests if you feel like and report back here. You can check the latest compat-wireless and perhaps try a latest 2.6.30.x or 2.6.31.x kernel (should compile fine either of them under brcm47xx)

you will need to adjust the following files:

package/mac80211/Makefile
target/linux/brcm47xx/Makefile

Unfortunately 2.6.31.4 with 2009-10-16 is not better either.

comment:6 follow-up: Changed 8 years ago by mludi

I forgot to say, I am running 2.5.31.8. I am in the process of building the wireless-compat package as suggested, however the patches do not apply cleanly, so it may take some time to test.

comment:7 Changed 8 years ago by mludi

Sorry for the mistake, it is also the 2.6.31.4 kernel.

comment:8 in reply to: ↑ 6 Changed 8 years ago by anonymous

Replying to [comm-ent:6 mludi]:-

I forgot to say, I am running 2.5.31.8. I am in the process of building the wireless-compat package as suggested, however the patches do not apply cleanly, so it may take some time to test.

Compat-wireless 2009-10-16 will compile if you just delete the last two patches (405 and 406 I think). But it doesn't work any better.

comment:9 follow-ups: Changed 8 years ago by mludi

Well, I fixed these two patches:

009-remove_mac80211_module_dependence.patch
010-b43_config.patch

and left the other ones out since some of them resulted in rejs and its seems
that they are not necessary for me (broadcom card).

Recompiling compat-wireless (stable for 2.6.31.4) and using the same kernel version fixed the
problem for me.

Apart from the problem I would be grateful if someone could point me to some doc on how to get more tracing information from the kernel in this case. The trace I attached yesterday is in the upper part (without debugging symbols enabled) a lot bigger than in the case second part where I enabled debugging symbols. This seems incoherent to me. Are there other kernel/compile options which allow further tracing of the problem?

comment:10 in reply to: ↑ 9 Changed 8 years ago by anonymous

Replying to mludi:

Well, I fixed these two patches:

009-remove_mac80211_module_dependence.patch
010-b43_config.patch

and left the other ones out since some of them resulted in rejs and its seems
that they are not necessary for me (broadcom card).

Recompiling compat-wireless (stable for 2.6.31.4) and using the same kernel version fixed the
problem for me.

Apart from the problem I would be grateful if someone could point me to some doc on how to get more tracing information from the kernel in this case. The trace I attached yesterday is in the upper part (without debugging symbols enabled) a lot bigger than in the case second part where I enabled debugging symbols. This seems incoherent to me. Are there other kernel/compile options which allow further tracing of the problem?

What I see, without looking too much in it, seems to be the scheduling while atomic part. It seems to be originating from the irq handler(?), so my guess would be something introduced in the threaded irq support? Or maybe not. Would be nice to post this issue to some relevant compat-wireless mailing list.

comment:11 Changed 8 years ago by umiki (at) med.unideb.hu

OK, this is, how the whole thing goes wrong:

Oct 18 15:51:51 asus authpriv.notice dropbear[1908]: password auth succeeded for 'root' from 192.168.1.199:58232                                                      
Oct 18 15:51:54 asus daemon.info dnsmasq-dhcp[1918]: DHCPREQUEST(br-lan) 192.168.1.134 00:13:02:43:b1:61                                                              
Oct 18 15:51:54 asus daemon.info dnsmasq-dhcp[1918]: DHCPACK(br-lan) 192.168.1.134 00:13:02:43:b1:61 umiki-laptop                                                     
Oct 18 15:53:54 asus user.debug kernel: b43-phy0 debug: Updated beacon template at 0x68                                                                               
Oct 18 15:53:54 asus user.err kernel: BUG: scheduling while atomic: irq/6-b43/1109/0x00000100                                                                         
Oct 18 15:53:54 asus user.warn kernel: Modules linked in: sch_red sch_sfq sch_hfsc cls_fw imq usb_storage nf_nat_tftp nf_conntrack_tftp nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp xt_HL xt_hl xt_MARK ipt_ECN xt_CLASSIFY xt_time xt_tcpmss xt_statistic xt_                                                                                                                                
Oct 18 15:53:54 asus user.warn kernel: Call Trace:                                                                                                                                                   
Oct 18 15:53:54 asus user.warn kernel: [<8000a0dc>] dump_stack+0x8/0x34                                                                                                                              
Oct 18 15:53:54 asus user.warn kernel: [<8000a7b8>] schedule+0x78/0x684                                                                                                                              
Oct 18 15:53:54 asus user.warn kernel: [<8000b990>] __mutex_lock_slowpath+0x180/0x1d0                                                                                                                
Oct 18 15:53:54 asus user.warn kernel: [<808c14fc>] b43_shm_read32+0x688/0x8ec [b43]                                                                                                                 
Oct 18 15:54:11 asus user.err kernel: BUG: scheduling while atomic: swapper/0/0x00000100                                                                                                             
Oct 18 15:54:11 asus user.warn kernel: Modules linked in: sch_red sch_sfq sch_hfsc cls_fw imq usb_storage nf_nat_tftp nf_conntrack_tftp nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp xt_HL xt_hl xt_MARK ipt_ECN xt_CLASSIFY xt_time xt_tcpmss xt_statistic xt_                                                                                                                                
Oct 18 15:54:11 asus user.warn kernel: Cpu 0                                                                                                                                                         
Oct 18 15:54:11 asus user.warn kernel: $ 0   : 00000000 1000d801 80001660 02000000                                                                                                                   
Oct 18 15:54:11 asus user.warn kernel: $ 4   : 80297a18 81c6ca28 1000d800 ffff00fe                                                                                                                   
Oct 18 15:54:11 asus user.warn kernel: $ 8   : 00000000 0000d800 00000000 80a82000                                                                                                                   
Oct 18 15:54:11 asus user.warn kernel: $12   : 4adb3a22 00000000 ffffffff 80bb89a0                                                                                                                   
Oct 18 15:54:11 asus user.warn kernel: $16   : 802e0000 802ca5dc 802d9acc 00000010                                                                                                                   
Oct 18 15:54:11 asus user.warn kernel: $20   : 00004000 00000200 00000005 00000000                                                                                                                   
Oct 18 15:54:11 asus user.warn kernel: $24   : 00000000 2aca4bc8                                                                                                                                     
Oct 18 15:54:11 asus user.warn kernel: $28   : 80294000 80295ed8 00000000 8000f528                                                                                                                   
Oct 18 15:54:11 asus user.warn kernel: Hi    : 00000033                                                                                                                                              
Oct 18 15:54:11 asus user.warn kernel: Lo    : b6310d00                                                                                                                                              
Oct 18 15:54:11 asus user.warn kernel: epc   : 8000f528 cpu_idle+0x24/0x44                                                                                                                           
Oct 18 15:54:11 asus user.warn kernel:     Tainted: P                                                                                                                                                
Oct 18 15:54:11 asus user.warn kernel: ra    : 8000f528 cpu_idle+0x24/0x44                                                                                                                           
Oct 18 15:54:11 asus user.warn kernel: Status: 1000d803    KERNEL EXL IE                                                                                                                             
Oct 18 15:54:11 asus user.warn kernel: Cause : 00808000                                                                                                                                              
Oct 18 15:54:11 asus user.warn kernel: PrId  : 00029006 (Broadcom BCM3302)                                                                                                                           
Oct 18 15:54:11 asus user.err kernel: bad: scheduling from the idle thread!                                                                                                                          
Oct 18 15:54:11 asus user.warn kernel: Call Trace:                                                                                                                                                   
Oct 18 15:54:11 asus user.warn kernel: [<8000a0dc>] dump_stack+0x8/0x34                                                                                                                              
Oct 18 15:54:11 asus user.warn kernel: [<8001daec>] dequeue_task_idle+0x3c/0x68                                                                                                                      
Oct 18 15:54:11 asus user.warn kernel: [<8001cd34>] dequeue_task+0xf8/0x10c                                                                                                                          
Oct 18 15:54:11 asus user.warn kernel: [<8000a90c>] schedule+0x1cc/0x684                                                                                                                             
Oct 18 15:54:11 asus user.warn kernel: [<8000b990>] __mutex_lock_slowpath+0x180/0x1d0                                                                                                                
Oct 18 15:54:11 asus user.warn kernel: [<808c14fc>] b43_shm_read32+0x688/0x8ec [b43] 

comment:12 in reply to: ↑ 9 Changed 8 years ago by martin@…

Replying to mludi:

Well, I fixed these two patches:
Recompiling compat-wireless (stable for 2.6.31.4) and using the same kernel version fixed the
problem for me.

Is there a package I could install on top of a current trunk snapshot, available from downloads.openwrt.org? Would appreciate if there is a simple installable that could help (or critical files I could replace manually)

comment:13 Changed 8 years ago by luoluo

I met the same problem

comment:14 Changed 8 years ago by anonymous

I confirm that compat-wireless 2009/10 introduces this bug. compat-wireless 2009/08 doesn't have this issue.
Reverting to rev @17986 just for /trunk/package/mac80211 directory, leaving everything else at head rev fixes the bug.

comment:15 Changed 8 years ago by mgrant@…

I can confirm that this bug also exists on the Buffalo WHR-HP-G54 with revision 18274.

comment:16 Changed 8 years ago by anonymous

I tried checking out this version of mac80211 as follows:
svn co svn://svn.openwrt.org/openwrt/trunk/package/mac80211@17986

The wifi still stops working after a very short time. I manage to ping the buffalo 4 times before it stops.

root@OpenWrt:/# logread -f
Jan  1 00:02:19 OpenWrt user.notice root: removing lan (br-lan) from firewall zone lan
Jan  1 00:02:20 OpenWrt user.info kernel: br-lan: port 1(eth0.0) entering disabled state
Jan  1 00:02:20 OpenWrt user.info kernel: device eth0 left promiscuous mode
Jan  1 00:02:20 OpenWrt user.info kernel: device eth0.0 left promiscuous mode
Jan  1 00:02:20 OpenWrt user.info kernel: br-lan: port 1(eth0.0) entering disabled state
0.openwrt.pool.ntp.org: Unknown host
1.openwrt.pool.ntp.org: Unknown host
2.openwrt.pool.ntp.org: Unknown host
3.openwrt.pool.ntp.org: Unknown host
Jan  1 00:02:27 OpenWrt user.info kernel: device eth0.0 entered promiscuous mode
Jan  1 00:02:27 OpenWrt user.info kernel: device eth0 entered promiscuous mode
Jan  1 00:02:27 OpenWrt user.info kernel: br-lan: port 1(eth0.0) entering forwarding state
udhcpc (v1.14.4) started


Jan  1 00:02:31 OpenWrt user.notice root: adding lan (br-lan) to firewall zone lan
Jan  1 00:02:33 OpenWrt user.info kernel: b43 ssb0:3: firmware: requesting b43/ucode5.fw
Sending discover...
Jan  1 00:02:34 OpenWrt user.info kernel: b43 ssb0:3: firmware: requesting b43/pcm5.fw
Jan  1 00:02:35 OpenWrt user.info kernel: b43 ssb0:3: firmware: requesting b43/b0g0initvals5.fw
Jan  1 00:02:35 OpenWrt user.info kernel: b43 ssb0:3: firmware: requesting b43/b0g0bsinitvals5.fw
Jan  1 00:02:35 OpenWrt user.info kernel: b43-phy0: Loading firmware version 410.2160 (2007-05-26 15:32:10)
Jan  1 00:02:35 OpenWrt user.info kernel: ADDRCONF(NETDEV_UP): wlan0: link is not ready
Configuration file: /var/run/hostapd-wlan0.conf
Jan  1 00:02:36 OpenWrt user.info kernel: device wlan0 entered promiscuous mode
Jan  1 00:02:36 OpenWrt user.info kernel: br-lan: port 2(wlan0) entering disabled state
Using interface wlan0 with hwaddr 00:1d:73:de:01:fb and ssid 'OpenWrt'
Sending discover...
Jan  1 00:02:37 OpenWrt user.debug kernel: eth0.0: no IPv6 routers present
Jan  1 00:02:37 OpenWrt user.info kernel: b43-phy0: Loading firmware version 410.2160 (2007-05-26 15:32:10)
Jan  1 00:02:37 OpenWrt user.info kernel: br-lan: port 2(wlan0) entering forwarding state
Jan  1 00:02:37 OpenWrt user.debug kernel: br-lan: no IPv6 routers present
Sending discover...
Jan  1 00:02:40 OpenWrt user.debug kernel: eth0.1: no IPv6 routers present
Jan  1 00:02:48 OpenWrt user.debug kernel: wlan0: no IPv6 routers present
Jan  1 00:02:50 OpenWrt daemon.info hostapd: wlan0: STA 00:25:d3:14:b9:f8 IEEE 802.11: authenticated
Jan  1 00:02:50 OpenWrt daemon.info hostapd: wlan0: STA 00:25:d3:14:b9:f8 IEEE 802.11: associated (aid 1)
Jan  1 00:02:50 OpenWrt daemon.info hostapd: wlan0: STA 00:25:d3:14:b9:f8 RADIUS: starting accounting session 0000009D-00000000
Jan  1 00:02:53 OpenWrt daemon.info dnsmasq-dhcp[820]: DHCPDISCOVER(br-lan) 10.0.0.194 00:25:d3:14:b9:f8
Jan  1 00:02:53 OpenWrt daemon.info dnsmasq-dhcp[820]: DHCPOFFER(br-lan) 192.168.1.241 00:25:d3:14:b9:f8
Jan  1 00:02:53 OpenWrt daemon.info dnsmasq-dhcp[820]: DHCPREQUEST(br-lan) 192.168.1.241 00:25:d3:14:b9:f8
Jan  1 00:02:53 OpenWrt daemon.info dnsmasq-dhcp[820]: DHCPACK(br-lan) 192.168.1.241 00:25:d3:14:b9:f8 squeek

Jan  1 00:03:24 OpenWrt user.err kernel: BUG: scheduling while atomic: swapper/0/0x00000100
Jan  1 00:03:24 OpenWrt user.warn kernel: Modules linked in: nf_nat_tftp nf_conntrack_tftp nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat xt_NOTRACK iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack pppoe pppo
Jan  1 00:03:24 OpenWrt user.warn kernel: Cpu 0
Jan  1 00:03:24 OpenWrt user.warn kernel: $ 0   : 00000000 1000b801 00000000 80266008
Jan  1 00:03:24 OpenWrt user.warn kernel: $ 4   : 80001660 8035d108 1000b800 ffff00fe
Jan  1 00:03:24 OpenWrt user.warn kernel: $ 8   : 00000000 0000b800 00000000 8036a000
Jan  1 00:03:24 OpenWrt user.warn kernel: $12   : 000000cc 7f8e6050 00000000 00000000
Jan  1 00:03:24 OpenWrt user.warn kernel: $16   : 802b0000 80299fec 802a9aec 00000010
Jan  1 00:03:24 OpenWrt user.warn kernel: $20   : 00004000 00000200 0aaa0555 00000001
Jan  1 00:03:24 OpenWrt user.warn kernel: $24   : 00000000 2ab19040             
Jan  1 00:03:24 OpenWrt user.warn kernel: $28   : 80266000 80267f88 00000000 8000f06c
Jan  1 00:03:24 OpenWrt user.warn kernel: Lo    : 9bf53000
Jan  1 00:03:24 OpenWrt user.warn kernel: epc   : 80001680 0x80001680
Jan  1 00:03:24 OpenWrt user.warn kernel:     Not tainted
Jan  1 00:03:24 OpenWrt user.warn kernel: ra    : 8000f06c 0x8000f06c
Jan  1 00:03:24 OpenWrt user.warn kernel: Status: 1000b803    KERNEL EXL IE
Jan  1 00:03:24 OpenWrt user.warn kernel: Cause : 00808000
Jan  1 00:03:24 OpenWrt user.warn kernel: PrId  : 00029008 (Broadcom BCM3302)
Jan  1 00:03:24 OpenWrt user.err kernel: bad: scheduling from the idle thread!
Jan  1 00:03:24 OpenWrt user.warn kernel: Call Trace:[<800240e8>] 0x800240e8
Jan  1 00:03:24 OpenWrt user.warn kernel: [<800098dc>] 0x800098dc
Jan  1 00:03:24 OpenWrt user.warn kernel: [<800098dc>] 0x800098dc
Jan  1 00:03:24 OpenWrt user.warn kernel: [<8001cf00>] 0x8001cf00
Jan  1 00:03:24 OpenWrt user.warn kernel: [<8002f2cc>] 0x8002f2cc
Jan  1 00:03:25 OpenWrt user.warn kernel: [<80029ed4>] 0x80029ed4
Jan  1 00:03:25 OpenWrt user.warn kernel: [<80029fe0>] 0x80029fe0
Jan  1 00:03:25 OpenWrt user.warn kernel: [<8011ad28>] 0x8011ad28
Jan  1 00:03:25 OpenWrt user.warn kernel: [<8000da44>] 0x8000da44
Jan  1 00:03:25 OpenWrt user.warn kernel: [<80001444>] 0x80001444
Jan  1 00:03:25 OpenWrt user.warn kernel: [<80001660>] 0x80001660
Jan  1 00:03:25 OpenWrt user.warn kernel: [<8000f06c>] 0x8000f06c
Jan  1 00:03:25 OpenWrt user.warn kernel: [<80001680>] 0x80001680
Jan  1 00:03:25 OpenWrt user.warn kernel: [<8027fa4c>] 0x8027fa4c
Jan  1 00:03:25 OpenWrt user.warn kernel: [<8027f370>] 0x8027f370
29ed4>] 0x80029ed4
Jan  1 00:03:26 OpenWrt user.warn kernel: [<80029fe0>] 0x80029fe0
Jan  1 00:03:26 OpenWrt user.warn kernel: [<8011ad28>] 0x8011ad28
Jan  1 00:03:26 OpenWrt user.warn kernel: [<8000da44>] 0x8000da44
Jan  1 00:03:26 OpenWrt user.warn kernel: [<80001444>] 0x80001444
Jan  1 00:03:26 OpenWrt user.warn kernel: [<80001660>] 0x80001660
Jan  1 00:03:26 OpenWrt user.warn kernel: [<8000f06c>] 0x8000f06c
Jan  1 00:03:26 OpenWrt user.warn kernel: [<80001680>] 0x80001680
Jan  1 00:03:26 OpenWrt user.warn kernel: [<8027fa4c>] 0x8027fa4c
Jan  1 00:03:26 OpenWrt user.warn kernel: [<8027f370>] 0x8027f370
29ed4>] 0x80029ed4
...

comment:17 follow-up: Changed 8 years ago by nbd

Please try with r18294 or later

comment:18 in reply to: ↑ 17 Changed 8 years ago by umiki (at) med.unideb.hu

Replying to nbd:

Please try with r18294 or later

It is no better. Really annoying, renders the router useless, if you want to connect to it wirelessly.

If you have any idea, what I could do to help you to solve it (I mean simple things, I am no programmer), I would be glad to help.

comment:19 Changed 8 years ago by siocran

yes, i confirm that bug.

comment:20 follow-up: Changed 8 years ago by m.storchak@…

r18327 works for me. Asus WL500gP works well under

ping -f -s 8 ip.of.the.router

load via wifi part of br-lan.
No more "bad: scheduling from the idle thread!" and "BUG: scheduling while atomic: swapper/0/0x00000100" or any other bug/oops/etc reports

comment:21 Changed 8 years ago by nbd

If it still doesn't work for any one of oyu, please enable kernel symbol table information under 'global settings', rebuild and add a few more kernel stack traces here.
It would be useful, if you could run 'logread -f > log.txt &' before bringing up wifi and then copying the output here. It's important that if there are multiple stack traces, you at least put the first one and maybe the second one in here, the others might just be followups. I'll look into it when I have time, or forward it to the linux wireless list if necessary.

comment:22 in reply to: ↑ 20 Changed 8 years ago by umiki (at) med.unideb.hu

Replying to m.storchak@…:

r18327 works for me. Asus WL500gP works well under

ping -f -s 8 ip.of.the.router

load via wifi part of br-lan.
No more "bad: scheduling from the idle thread!" and "BUG: scheduling while atomic: swapper/0/0x00000100" or any other bug/oops/etc reports

I can not agree with you. By coincidence I have the same build on my router and I loose connection pretty soon. Try a bit more demanding application (for instance run a measurement on speedtest.net and see what happens).

My router is: wl-500gPv1
build: r18327 / gcc 4.3.4 / eglibc 2.8
((default gcc 4.3.3+cs does not boot for me (tried distclean), I use eglibc, as uclibc does not have posix_fallocate, useful with rtorrent to prevent fragmentation))

To nbd: Thanks for looking into this. I have included such a log a few comments ago, which included the first event (BUG: scheduling while atomic: irq/6-b43/1109/0x00000100), the rest just keeps repeating indefinitely (BUG: scheduling while atomic: swapper/0/0x00000100). Also that one includes (some) symbol information as well.

The wireless wasn't stable anyways even before this went wrong: if there was activity on the router itself (for instance rtorrent), the wireless died soon under load. This was annoying as it rendered copying over the samba share of the router impossible (and made running rtorrent on the router pointless). Could be the two things connected?

comment:23 Changed 8 years ago by nbd

I committed something that might fix this issue, please try r18338

comment:24 follow-up: Changed 8 years ago by m.storchak@…

Replying to umiki (at) med.unideb.hu:

I can not agree with you. By coincidence I have the same build on my router and I loose connection pretty soon. Try a bit more demanding application (for instance run a measurement on speedtest.net and see what happens).

My router is: wl-500gPv1
build: r18327 / gcc 4.3.4 / eglibc 2.8
((default gcc 4.3.3+cs does not boot for me (tried distclean), I use eglibc, as uclibc does not have posix_fallocate, useful with rtorrent to prevent fragmentation))

I use uclibc and hope fragmentation will not hit me too much. Maybe I am wrong.

I use two instances of ping -f (flooding) - with 8 and 1472 payload size. Here are some stats:

root@vortex-box:~# ip -s l l dev wlan0 ; sleep 10; ip -s l l dev wlan0
7: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
    link/ether 00:1b:fc:57:ac:e7 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast   
    1794452219 11828409 0       0       0       0      
    TX: bytes  packets  errors  dropped carrier collsns 
    2007351090 11828316 0       0       0       0      
7: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
    link/ether 00:1b:fc:57:ac:e7 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast   
    1799033361 11834300 0       0       0       0      
    TX: bytes  packets  errors  dropped carrier collsns 
    2012038270 11834207 0       0       0       0      
root@vortex-box:~# echo $(( (11834300-11828409)/10 ))
589

So, about 600 pkts/sec for 2 hours and running. Ping indicates no loss and dmesg is clean.
Maybe the reason is that I use tickless kernel (selected in make kernel_menuconfig).

It seams that last update of compat-wireless helped in my case.
Here is output of top. This is typical in the case of such flood-pinging

Mem: 28036K used, 1264K free, 0K shrd, 832K buff, 13892K cached
CPU:   0% usr  16% sys   0% nic  38% idle   0% io   3% irq  40% sirq
Load average: 1.27 1.08 1.00 2/48 2374
  PID  PPID USER     STAT   VSZ %MEM %CPU COMMAND
 1449     2 root     RW<      0   0%  51% [compirq/6-b43]
  698     2 root     SW<      0   0%   5% [phy0]

The wireless wasn't stable anyways even before this went wrong: if there was activity on the router itself (for instance rtorrent), the wireless died soon under load. This was annoying as it rendered copying over the samba share of the router impossible (and made running rtorrent on the router pointless). Could be the two things connected?

I tried to download some files via ftp (from usb-hdd, via vsftpd with sendfile disabled, wired interface, "wireless" ping is still active). Here is final report of wget: "Downloaded: 32 files, 156M in 1m 21s (1,92 MB/s)".

The only thing is NOT active is NAT (nothing goes through the router, pings and ftp are directed to the router itself)

Hope this will help spot and squash this bug. Also, I can post configs of kernel and build system if it is necessary.

comment:25 in reply to: ↑ 24 ; follow-up: Changed 8 years ago by anonymous

Replying to m.storchak@…:

Thanks for the elaborate answer, I will try the things that you have mentioned (the tickless setting for instance) and report back. I don't think that uclibc vs eglibc should make a difference, but I'll also try that one, if all else fails (have to rebuild everything for that, which takes a lot of time), perhaps gcc version can make a difference here. I also wasn't quite precise saying flood pinging is not enough load, what I meant was that I thought that higher throughput might trigger the error, but you're answer excludes that one too. I would really be happy to see wireless that solid.

To nbd: Thanks, I will definitely try that, and report back.

comment:26 in reply to: ↑ 25 ; follow-ups: Changed 8 years ago by Maxim Storchak <m.storchak@…>

Replying to anonymous:

Replying to m.storchak@…:

perhaps gcc version can make a difference here.

I used default gcc (4.3.3+cs)

I also wasn't quite precise saying flood pinging is not enough load, what I meant was that I thought that higher throughput might trigger the error, but you're answer excludes that one too. I would really be happy to see wireless that solid.

Good luck!

I'll try to switch tickless off and see if it really matters.

comment:27 in reply to: ↑ 26 Changed 8 years ago by anonymous

Replying to Maxim Storchak <m.storchak@…>:

After very short testing this seems to be stable, I am quite happy with it so far. I will test copying from samba later, I hope that, it will be OK as well.
I think that both the patch r18327 from nbd and the tickless setup might be important:
*The oops occured (as seen in the log I attached) directly after a template update, which might be a coincidence, but that patch seems to be related to that issue.
*Since tickless, I noticed this in the logs:

Nov  7 23:41:51 asus daemon.notice miniupnpd[1779]: HTTP listening on port 5000
Nov  7 23:41:55 asus user.err kernel: NOHZ: local_softirq_pending 08
Nov  7 23:41:55 asus user.err kernel: NOHZ: local_softirq_pending 08
Nov  7 23:41:55 asus user.err kernel: NOHZ: local_softirq_pending 08
Nov  7 23:41:55 asus user.err kernel: NOHZ: local_softirq_pending 08
Nov  7 23:41:55 asus user.err kernel: NOHZ: local_softirq_pending 08
Nov  7 23:41:55 asus user.err kernel: NOHZ: local_softirq_pending 08
Nov  7 23:41:55 asus user.err kernel: NOHZ: local_softirq_pending 08
Nov  7 23:41:55 asus user.err kernel: NOHZ: local_softirq_pending 08
Nov  7 23:41:55 asus user.err kernel: NOHZ: local_softirq_pending 08
Nov  7 23:41:55 asus user.err kernel: NOHZ: local_softirq_pending 08
Nov  7 23:41:58 asus daemon.info hostapd: wlan0: STA 00:13:02:43:b1:61 IEEE 802.11: authenticated
Nov  7 23:41:58 asus daemon.info hostapd: wlan0: STA 00:13:02:43:b1:61 IEEE 802.11: associated (aid 1)
Nov  7 23:41:58 asus daemon.info hostapd: wlan0: STA 00:13:02:43:b1:61 RADIUS: starting accounting session 00000031-00000000
Nov  7 23:41:58 asus daemon.info hostapd: wlan0: STA 00:13:02:43:b1:61 WPA: pairwise key handshake completed (RSN)

which might indicate, that b43 also has some other issues with irqs, but maybe the tickless setup handles this better (however these messages did not repeat since, nor there is evidence that they were originating from b43).

So thank you guys!
(I will do some further testing and see what happens.)

comment:28 Changed 8 years ago by nbd

  • Resolution set to fixed
  • Status changed from new to closed

comment:29 follow-up: Changed 8 years ago by anonymous

If the fix is to trade into the code a patch with "FIXME" line with obvious possible future bug, I don't believe it is really *fixed*, especially if no other bug remains open for that thing?

comment:30 in reply to: ↑ 26 ; follow-up: Changed 8 years ago by umiki (at) med.unideb.hu

First of all, thanks for solving this.

Replying to Maxim Storchak <m.storchak@…>:

Good news is, that it is not crashing under normal usage.
OFF:
Bad news is that it shows its old behaviour for me -- it is crashing when copying from a samba share, etc. Your config seems not to have this problem, could you please help me by posting the relevant config files?

comment:31 in reply to: ↑ 29 Changed 8 years ago by nbd

Replying to anonymous:

If the fix is to trade into the code a patch with "FIXME" line with obvious possible future bug, I don't believe it is really *fixed*, especially if no other bug remains open for that thing?

The potential race opened by this bug is unlikely to ever trigger (IMHO it's relevant only during module unload). There's no reason to keep a ticket for this open in the OpenWrt Trac. If either locking in b43 gets reworked or the constraints of the callback change, it will be fixed upstream, and the FIXME will go away. If not, I don't think this will be a problem for OpenWrt users.

comment:32 in reply to: ↑ 30 ; follow-up: Changed 8 years ago by nbd

Replying to umiki (at) med.unideb.hu:

Bad news is that it shows its old behaviour for me -- it is crashing when copying from a samba share, etc. Your config seems not to have this problem, could you please help me by posting the relevant config files?

Any details on those crashes?

comment:33 in reply to: ↑ 32 ; follow-up: Changed 8 years ago by anonymous

Replying to nbd:

Replying to umiki (at) med.unideb.hu:

Bad news is that it shows its old behaviour for me -- it is crashing when copying from a samba share, etc. Your config seems not to have this problem, could you please help me by posting the relevant config files?

Any details on those crashes?

Unfortunately it isn't exactly a classic crash, as there is nothing in the logs (earlier releases used to have the infamous "local deauth" message, before becoming unreponsive). The wireless just plainly stops responding to clients. The wireless led keeps on blinking on the router as it was working normally, but the clients can not connect to it (my laptop has an Intel wireless card, my mobile has something else; they see it, but can not connect). Of course this could also be hostapd or something else as well. I can try to catch the moment with wireshark if it helps.

comment:34 in reply to: ↑ 33 Changed 8 years ago by thomas@…

Replying to anonymous:

Replying to nbd:

Any details on those crashes?

Unfortunately it isn't exactly a classic crash, as there is nothing in the logs (earlier releases used to have the infamous "local deauth" message, before becoming unreponsive). The wireless just plainly stops responding to clients. The wireless led keeps on blinking on the router as it was working normally, but the clients can not connect to it (my laptop has an Intel wireless card, my mobile has something else; they see it, but can not connect). Of course this could also be hostapd or something else as well. I can try to catch the moment with wireshark if it helps.

Same here, but without the patch (I built just a few hours before the patch got released, haven't rebuilt with it yet). No messages in dmesg - just when I start some torrents, I get this exact behaviour: My client stops working, router is fine - running wifi on the router restores wireless.

comment:35 Changed 8 years ago by m.storchak@…

Running rtorrnet on the router kills WiFi. Restarting hostapd revives it for a short time, then wifi dies again. No dmesg messages, nothing spacial from hostapd

...skipped...
AP-STA-CONNECTED 00:1b:77:20:c2:e2
wlan0: STA 00:1b:77:20:c2:e2 IEEE 802.1X: authorizing port
wlan0: STA 00:1b:77:20:c2:e2 RADIUS: starting accounting session 4AFC5F1B-00000000
wlan0: STA 00:1b:77:20:c2:e2 IEEE 802.1X: authenticated - EAP type: 0 (Unknown)
RSN: added PMKSA cache entry for 00:1b:77:20:c2:e2
RSN: added PMKID - hexdump(len=16): c8 97 44 85 3d 7f 0f fe f1 ef c6 d1 94 ee 71 e8
wlan0: STA 00:1b:77:20:c2:e2 WPA: Added PMKSA cache entry (IEEE 802.1X)
IEEE 802.1X: 00:1b:77:20:c2:e2 - (EAP) retransWhile --> 0
IEEE 802.1X: 00:1b:77:20:c2:e2 - aWhile --> 0

2-3 minutes later wifi dies

STA 00:1b:77:20:c2:e2 sent probe request for broadcast SSID
STA 00:1b:77:20:c2:e2 sent probe request for broadcast SSID
STA 00:1b:77:20:c2:e2 sent probe request for our SSID
STA 00:1b:77:20:c2:e2 sent probe request for broadcast SSID
STA 00:1b:77:20:c2:e2 sent probe request for our SSID
STA 00:1b:77:20:c2:e2 sent probe request for broadcast SSID
STA 00:1b:77:20:c2:e2 sent probe request for our SSID
MGMT               
mgmt::auth         
authentication: STA=00:1b:77:20:c2:e2 auth_alg=0 auth_transaction=1 status_code=0 wep=0
wlan0: STA 00:1b:77:20:c2:e2 IEEE 802.11: authentication OK (open system)
wlan0: STA 00:1b:77:20:c2:e2 WPA: event 0 notification
nl_set_encr: ifindex=7 alg=0 addr=0x475140 key_idx=0 set_tx=1 seq_len=0 key_len=0
   addr=00:1b:77:20:c2:e2
wlan0: STA 00:1b:77:20:c2:e2 MLME: MLME-AUTHENTICATE.indication(00:1b:77:20:c2:e2, OPEN_SYSTEM)
wlan0: STA 00:1b:77:20:c2:e2 MLME: MLME-DELETEKEYS.request(00:1b:77:20:c2:e2)
nl_set_encr: ifindex=7 alg=0 addr=0x475140 key_idx=0 set_tx=1 seq_len=0 key_len=0
   addr=00:1b:77:20:c2:e2
authentication reply: STA=00:1b:77:20:c2:e2 auth_alg=0 auth_transaction=2 resp=0 (IE len=0)
MGMT               
mgmt::auth
....

Ctrl-C Pressed

Signal 2 received - terminating
wlan0: STA 00:1b:77:20:c2:e2 MLME: MLME-DEAUTHENTICATE.indication(00:1b:77:20:c2:e2, 1)
wlan0: STA 00:1b:77:20:c2:e2 MLME: MLME-DELETEKEYS.request(00:1b:77:20:c2:e2)
nl_set_encr: ifindex=7 alg=0 addr=0x475140 key_idx=0 set_tx=1 seq_len=0 key_len=0
   addr=00:1b:77:20:c2:e2
Removing station 00:1b:77:20:c2:e2
Failed to remove interface (ifidx=0).

Dmesg shows firmware load on every start of hostapd

b43-phy0: Loading firmware version 410.2160 (2007-05-26 15:32:10)
br-lan: port 2(wlan0) entering forwarding state

but lines found on the first start of hostapd are absent

b43 ssb1:0: firmware: requesting b43/ucode5.fw
b43 ssb1:0: firmware: requesting b43/pcm5.fw
b43 ssb1:0: firmware: requesting b43/b0g0initvals5.fw
b43 ssb1:0: firmware: requesting b43/b0g0bsinitvals5.fw

They appear in dmesg only once.

Also I've got the "scheduling while atomic" once (tickless is on). Now I'm trying to reproduce it with kernel symbol enabled.

comment:36 Changed 8 years ago by anonymous

Same here. Wifi works for 2mbps internet connection (most of the time) but trying to transfer a 400MB file over skype from one wifi client to another one crashes wifi. No messages, nothing. Reboot helps.

comment:37 Changed 8 years ago by sergk@…

Same bug, but I think wifi hangs only when used simultaneously with usb storage. For example when I download something from usb hdd connected to router to netbook. But when I transfer data from desktop to netbook wifi works normally. May be problem occurs because b43 and usb2 share one interrupt:

6: 571642 MIPS ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, b43

Is there a way to reroute b43 to another interrupt?

comment:38 Changed 4 years ago by jow

  • Milestone changed from Attitude Adjustment 12.09 to Barrier Breaker 14.07

Milestone Attitude Adjustment 12.09 deleted

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.