|
|
|
|
||||||
| linux.debian.user debian-user@lists.debian.org. |
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Hi,
I'm getting massive filesystem corruption on an md RAID comprising 4 SATA disks. I tried ext3, xfs and reiserfs on RAID level 5 as well as ext3 on RAID level 1 (using only 2 disks); all can be crashed reliably by running bonnie++ for just a few minutes. In the case of ext3, I usually get dmesg output like this: [...] md0: rw=1, want=1482184800, limit=490223232 attempt to access beyond end of device md0: rw=1, want=1482184800, limit=490223232 attempt to access beyond end of device md0: rw=1, want=1482184800, limit=490223232 Buffer I/O error on device md0, logical block 185273099 lost page write due to I/O error on md0 Aborting journal on device md0. EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device md0) in ext3_dirty_inode: Journal has aborted EXT3-fs error (device md0) in ext3_new_blocks: Journal has aborted ext3_abort called. EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only The filesystems are impossible to repair afterwards, e2fsck in particular will run for ages, and eventually segfault. By contrast, ext3 directly on the physical disk partition works fine and withstood days of continouus bonnieing. This is with Etch, kernel 2.6.18-4-686-bigmem. FWIW, the machine used to run Sarge with a 2.4 kernel, where the RAID worked fine. Now, it seems quite unlikely that RAID is completely broken in 2.6, so I suppose it might be related to the hardware: it's a Pentium 4 @ 2.8 GHz, 1.5 GiB RAM, the SATA Controller is a Promise S150 SX4 using the sata_sx4 kernel module. Any ideas on this? -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
Quoting Sebastian Flothow <flothow@gip.com>:
> Hi, > > I'm getting massive filesystem corruption on an md RAID comprising 4 > SATA disks. I tried ext3, xfs and reiserfs on RAID level 5 as well as > ext3 on RAID level 1 (using only 2 disks); all can be crashed reliably > by running bonnie++ for just a few minutes. In the case of ext3, I > usually get dmesg output like this: > > [...] > md0: rw=1, want=1482184800, limit=490223232 > attempt to access beyond end of device > md0: rw=1, want=1482184800, limit=490223232 > attempt to access beyond end of device > md0: rw=1, want=1482184800, limit=490223232 > Buffer I/O error on device md0, logical block 185273099 > lost page write due to I/O error on md0 > Aborting journal on device md0. > EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted > EXT3-fs error (device md0) in ext3_dirty_inode: Journal has aborted > EXT3-fs error (device md0) in ext3_new_blocks: Journal has aborted > ext3_abort called. > EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal > Remounting filesystem read-only > > The filesystems are impossible to repair afterwards, e2fsck in > particular will run for ages, and eventually segfault. > > By contrast, ext3 directly on the physical disk partition works fine and > withstood days of continouus bonnieing. > > This is with Etch, kernel 2.6.18-4-686-bigmem. FWIW, the machine used to > run Sarge with a 2.4 kernel, where the RAID worked fine. > > Now, it seems quite unlikely that RAID is completely broken in 2.6, so I > suppose it might be related to the hardware: it's a Pentium 4 @ 2.8 GHz, > 1.5 GiB RAM, the SATA Controller is a Promise S150 SX4 using the > sata_sx4 kernel module. > > Defintely sounds like hardware is failing. You could try installing smartmontools onto your system and use it to scan your drives. It might tell you if you have some bad sectors, or some other failing component. Also, try not using the bigmem kernel. AFAIK, its designed for 32 bit systems with RAM exceeding 4 Gigs. ?? (Although I would guess that shouldn't make a difference) |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
michael@estone.ca wrote:
> Defintely sounds like hardware is failing. > You could try installing smartmontools onto your system and use it > to scan your drives. It might tell you if you have some bad sectors, or > some other failing component. The hardware is fine - I checked the SMART status, did a full read/write test with badblocks on md0, and in fact the very same hardware and RAID setup worked fine for the past year, using a 2.4 kernel, with high filesystem load every day. It's just when I put a filesystem on top of an md device that things break - my assumption is that there is a race in the kernel involving sata_sx4 and the md modules. Given that the Promise SX4 is not really a shining piece of hardware, and not that popular, I wouldn't be surprised if the driver is a bit flaky too. Anyway, we've decided to replace it with a real RAID controller, that should sort things out. -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org |
|
![]() |
| Outils de la discussion | |
|
|