Modify

Opened 5 years ago

Closed 5 years ago

Last modified 4 years ago

#13420 closed defect (fixed)

OpenWRT does not recognize partitions over a position on big disks

Reported by: luizluca@… Owned by: developers
Priority: high Milestone: Barrier Breaker 14.07
Component: kernel Version: Attitude Adjustment 12.09 Beta
Keywords: Cc:

Description

Hello,

I'm using my router as a NAS by pluging an 1.5TB USB disk to its USB port. However, dmesg warns that its partition is over the EOD. See the log:

[  173.870000] sd 1:0:0:0: [sdb] No Caching mode page present
[  173.870000] sd 1:0:0:0: [sdb] Assuming drive cache: write through
[  180.330000]  sdb: sdb1 sdb2 < sdb5 >
[  180.340000] sdb: partition table partially beyond EOD, enabling native capacity
[  180.350000] sd 1:0:0:0: [sdb] No Caching mode page present
[  180.360000] sd 1:0:0:0: [sdb] Assuming drive cache: write through
[  180.370000]  sdb: sdb1 sdb2 < sdb5 >
[  180.370000] sdb: partition table partially beyond EOD, truncated
[  180.380000] sd 1:0:0:0: [sdb] No Caching mode page present
[  180.390000] sd 1:0:0:0: [sdb] Assuming drive cache: write through
[  180.400000] sd 1:0:0:0: [sdb] Attached SCSI disk

And /proc/partitions shows:

   8       16 1464452096 sdb
   8       17   62926573 sdb1
   8       18          1 sdb2
   8       21 1073744406 sdb5

On an ubuntu 12.10, I got it correcly:

   8       80 1464452096 sdf
   8       81   62926573 sdf1
   8       82          1 sdf2
   8       85 1073744406 sdf5
   8       86    4192933 sdf6

With no dmesg warnings. I also tested with more partitions but nothing after the big sda5 is shown.

I tracked the msg to block/partition-generic.c:452 but now I need to check where the state->access_beyond_eod comes from.

Maybe this is a 32-bit problem?

Attachments (0)

Change History (6)

comment:1 Changed 5 years ago by luizluca@…

I looked at the kern source about this message and I did not find something obvious.

The message comes from this test:

block/partition-generic.c:452
if (state->access_beyond_eod) {

and access_beyond_eod is only set in:

block/partitions/check.h:31
static inline void *read_part_sector(struct parsed_partitions *state,
                     sector_t n, Sector *p)
{
    if (n >= get_capacity(state->bdev->bd_disk)) {
        state->access_beyond_eod = true;
        return NULL;
    }
    return read_dev_sector(state->bdev, n, p);
}

As I get everything from the first partition to the first extended partition, it might be something inside the extended parsing. This is done here:

block/partitions/msdos.c:108
static void parse_extended(struct parsed_partitions *state,
               sector_t first_sector, sector_t first_size)

That calls the function that might set the EOD flag, in a loop for each partition:

this_sector = first_sector;
while (1) {
    (...)
    data = read_part_sector(state, this_sector, &sect);
    (...)
    p = (struct partition *) (data + 0x1be);
    (...)
    this_sector = first_sector + start_sect(p) * sector_size
}

As p is just the partition data, there is little chance of problem. Indeed, start_sect has some interesting comments:

/*
 * Many architectures don't like unaligned accesses, while
 * the nr_sects and start_sect partition table entries are
 * at a 2 (mod 4) address.
 */
(...)
static inline sector_t start_sect(struct partition *p)
{
    return (sector_t)get_unaligned_le32(&p->start_sect);
}

And endian problem?

The other possible source is the "get_capacity(state->bdev->bd_disk)", but dmesg, /proc/partitions and /sys reports it correctly. As the first extended pass it works, it might be something else like the start_sect.

I don't know if this is enough but here goes the first 512b of my extended partition:

0000000 448d ca5b e7ee 86a3 5c05 cc67 2c6d 48d4
*
00001b0 448d ca5b e7ee 86a3 5c05 cc67 2c6d 00fe
00001c0 ffff 83fe ffff 3f00 0000 2c14 0080 00fe
00001d0 ffff 05fe ffff 6b14 0080 7b15 0000 0000
00001e0 0000 0000 0000 0000 0000 0000 0000 0000
00001f0 0000 0000 0000 0000 0000 0000 0000 55aa

comment:2 Changed 5 years ago by luizluca@…

Using wikipedia, I wrote my own MBR/EBR parser. Everything seems to be fine inside the disk. It is just openwrt kernel that misread the second logical partition.

I created a small partition, sda3, between sda1 and sda2, an out of order partition, until this problem is fixed. However, the problem was present before it. I also replaced my old sdb6 by a small 1MB partition.

Here is the output of my script. I guess that anyone, with the help of wikipedia, can read this output:

Checking /dev/sdb...
MBR: FAB800108ED0BC00B0B800008ED88EC0FBBE007CBF0006B90002F3A4EA21060000BEBE073804750B83C61081FEFE0775F3EB16B402B001BB007CB2808A74018B4C02CD13EA007C0000EBFE0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000E303040000000001010007FEFFFF3F000000C167600700FEFFFF0FFEFFFF1A5E8007E62913A700FEFFFF82FEFFFF0068600700F01F000000000000000000000000000000000055AA
Checking /dev/sdb1...present
Primary Entry: 0001010007FEFFFF3F000000C1676007
Start(CHS): (010100)
End  (CHS): (FEFFFF)
Code      : 07
Start(LBA): 63 (3F000000)
Size(sect): 123758529 sectors, 63364 Mbytes (C1676007)

Checking /dev/sdb2...present
Primary Entry: 00FEFFFF0FFEFFFF1A5E8007E62913A7
Start(CHS): (FEFFFF)
End  (CHS): (FEFFFF)
Code      : 0F
Start(LBA): 125853210 (1A5E8007)
Size(sect): 2803050982 sectors, 1435162 Mbytes (E62913A7)

Extended partition detected! Reading logical partitions...

Checking /dev/sdb5...present
EBR: 448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D00FEFFFF83FEFFFF3F0000002C14008000FEFFFF05FEFFFF6B1400807B150000000000000000000000000000000000000000000000000000000000000000000055AA
Primary Entry: 00FEFFFF83FEFFFF3F0000002C140080
Active? no
Start(CHS): (FEFFFF)
End  (CHS): (FEFFFF)
Code      : 83
Start(LBA): 63 (3F000000)
Size(sect): 2147488812 sectors, 1099514 Mbytes (2C140080)

Secondary Entry: 00FEFFFF05FEFFFF6B1400807B150000
Next ERB(LBA): 2147488875 (6B140080)
Size(sect): 5499 sectors, 2 Mbytes (7B150000)

Checking /dev/sdb6...present
EBR: 448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D48D4448DCA5BE7EE86A35C05CC672C6D00FEFFFF83FEFFFF7B0D00000008000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000055AA
Primary Entry: 00FEFFFF83FEFFFF7B0D000000080000
Active? no
Start(CHS): (FEFFFF)
End  (CHS): (FEFFFF)
Code      : 83
Start(LBA): 3451 (7B0D0000)
Size(sect): 2048 sectors, 1 Mbytes (00080000)

Secondary Entry: 00000000000000000000000000000000
This is the last logical partition!

Checking /dev/sdb3...present
Primary Entry: 00FEFFFF82FEFFFF0068600700F01F00
Start(CHS): (FEFFFF)
End  (CHS): (FEFFFF)
Code      : 82
Start(LBA): 123758592 (00686007)
Size(sect): 2093056 sectors, 1071 Mbytes (00F01F00)

Checking /dev/sdb4...not present

Now I would need someway to get this values inside kernel... I currently does not have the serial port available so playing with kernel is not very safe. Also, anyone with an accessible external disk can test openwrt with an extended partition with two logical partitions.

comment:3 Changed 5 years ago by luizluca@…

I wrote some printk inside the kernel and managed to find the problematic point. It was almost where I suposed. It is this line:

/block/partitions/msdos.c:

this_sector = first_sector + start_sect(p) * sector_size;

All vars are sector_t, which is a u64. However, check their output (%llu)

[   98.230000] parse_extended: sector_size = 1
[   98.230000] parse_extended: start_sect(p) = 2147488875
[   98.230000] parse_extended: start_sect(p)*sector_size = 2147488875
[   98.240000] parse_extended: first_sector = 125853210
[   98.250000] parse_extended: first_sector + start_sect(p) * sector_size = this_sector = 18446744071687926405

The correct output would be 2273342085 and not 18446744071687926405. Comparing both, the MSB 32bit of first_sector becomes all 1.

2273342085 = 0x0000000087807285
18446744071687926405 = 0xFFFFFFFF87807285

What happened? Maybe this has something to do with target/linux/ar71xx/patches-3.3/902-unaligned_access_hacks.patch?

BTW, I tested this problem with ar71xx in TL-WR1043ND v1 and TL-WR2543ND v1.

comment:4 Changed 5 years ago by luizluca@…

As discussed in devel list, this bug is caused by some undefined behavior of function get_unaligned_le32. This patch fixes the problem.

http://patchwork.openwrt.org/patch/3588/

comment:5 Changed 5 years ago by jogo

  • Resolution set to fixed
  • Status changed from new to closed

The problem was actually a compiler bug that was already fixed in newer version. The fix was backported to all affected gcc versions in r36486 and r36500.

comment:6 Changed 4 years ago by jow

  • Milestone changed from Attitude Adjustment 12.09 to Barrier Breaker 14.07

Milestone Attitude Adjustment 12.09 deleted

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.