Modify

Opened 3 years ago

Closed 2 years ago

#18486 closed defect (fixed)

network raid 5 unstable in 3.14

Reported by: Culex Owned by: developers
Priority: high Milestone:
Component: packages Version: Trunk
Keywords: Cc:

Description

Within 5 minutes of mounting on r43457 on a WNDR3800

md/raid:md0: Disk failure on loop3, disabling device.

mdadm --manage /dev/md0 --re-add /dev/loop3
mdadm: Cannot open /dev/loop3: Device or resource busy

# losetup -d /dev/loop3

# losetup -a

/dev/loop1: 0 /media/live1/node1.img
/dev/loop2: 0 /media/live2/node2.img
/dev/loop3: 0 /media/live3/node3.img

# lsof | grep "loop3"
loop3 2577 root cwd DIR 0,15 0 160 /
loop3 2577 root rtd DIR 0,15 0 160 /
loop3 2577 root txt unknown
/proc/2577/exe

# ps | grep "2577"

2577 root 0 SW< [loop3]

# reboot
# mdadm --manage /dev/md0 --re-add /dev/loop3
mdadm: re-added /dev/loop3
root@OpenWrt:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 loop3[3] loop1[0] loop2[1]

3878157312 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
[>....................] recovery = 0.0% (2596/1939078656)

finish=24783.6min speed=1298K/sec

bitmap: 5/15 pages [20KB], 65536KB chunk

Anytime the router reboots loop3 gets dropped, re-add it and it returns to 0.0% which I thought bitmaps were supposed to prevent?

I have reverted back to r43205 and loop3 isn't getting dropped anymore.

Attachments (0)

Change History (15)

comment:1 Changed 3 years ago by anonymous

anything in the kernel log?

comment:2 Changed 3 years ago by nbd

  • Resolution set to no_response
  • Status changed from new to closed

comment:3 Changed 3 years ago by Culex

It just says that loop3 has failed. Nothing else. Reverting to 3.10.8 allowed the raid to rebuild and stable over 12 days. In r43457 it would drop within minutes with the error mdadm: Cannot open /dev/loop3: Device or resource busy.

Sorry for the delayed response, wanted to make sure it was working just fine.

comment:4 Changed 3 years ago by Culex

  • Resolution no_response deleted
  • Status changed from closed to reopened

comment:5 Changed 3 years ago by nbd

please try current trunk with a clean kernel tree - maybe the fixes to the unaligned access patch help.

comment:6 Changed 3 years ago by Culex

Yeah it'll will take a little while to wipe everything and then go hand in hand to check modifications in the code cause now there's about 12 packages which are specifically set to =n in make menuconfig that are getting built and failing. maybe something to do with luci-addons.

Last edited 3 years ago by Culex (previous) (diff)

comment:7 Changed 3 years ago by nbd

any news?

comment:8 Changed 3 years ago by nbd

  • Resolution set to no_response
  • Status changed from reopened to closed

comment:9 Changed 2 years ago by FireCulex@…

  • Resolution no_response deleted
  • Status changed from closed to reopened

My account no longer responds to my email nor to my password.

Anyhow as of OpenWrt Dxxx Dxxx r46809 Within a few hours the raid dropped and lost the superblock, now has to be rebuilt. It gets about 2% of the way done on the rebuild and then the raid stops working, no error messages, nothing. Anything running on the raid will get a D or a Z process list. I notice that top will have a loadavg of about 30-40.

Looks like I'm stuck with r43205, the last known working raid build.

comment:10 Changed 2 years ago by anonymous

Wed Sep 9 23:19:39 2015 user.emerg : losetup: /media/live1/node1.img: Resource busy

comment:11 Changed 2 years ago by Matthew M. Dean

umount /media/live1
mount 192.168.1.84 /media/live1
losetup /dev/loop1 /media/live1/node1.img
losetup: /media/live1/node1.img: failed to set up loop device: Resource busy

Can't even mount the raid now. Reverting to r43205. Worked.

comment:12 Changed 2 years ago by fireculex@…

drives assembled fine, 7 days for a rebuild. nice.

comment:13 Changed 2 years ago by fireculex@…

Rebuilt took 3-4, it might not have had to rebuild the entire drive.

comment:14 Changed 2 years ago by Matthew M. Dean

I guess it got fixed somewhere between r46809 and r48016. Appears to be running fine now.

comment:15 Changed 2 years ago by nbd

  • Resolution set to fixed
  • Status changed from reopened to closed

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.