PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Forums Hébergement > Forum Serveur - Sécurité et techniques > linux.debian.user > Filesystem corruption on md (Software) RAID
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
linux.debian.user debian-user@lists.debian.org.

Filesystem corruption on md (Software) RAID

Réponse
 
LinkBack Outils de la discussion
Vieux 06/08/2007, 18h00   #1
Sebastian Flothow
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Filesystem corruption on md (Software) RAID

Hi,

I'm getting massive filesystem corruption on an md RAID comprising 4
SATA disks. I tried ext3, xfs and reiserfs on RAID level 5 as well as
ext3 on RAID level 1 (using only 2 disks); all can be crashed reliably
by running bonnie++ for just a few minutes. In the case of ext3, I
usually get dmesg output like this:

[...]
md0: rw=1, want=1482184800, limit=490223232
attempt to access beyond end of device
md0: rw=1, want=1482184800, limit=490223232
attempt to access beyond end of device
md0: rw=1, want=1482184800, limit=490223232
Buffer I/O error on device md0, logical block 185273099
lost page write due to I/O error on md0
Aborting journal on device md0.
EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device md0) in ext3_dirty_inode: Journal has aborted
EXT3-fs error (device md0) in ext3_new_blocks: Journal has aborted
ext3_abort called.
EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only

The filesystems are impossible to repair afterwards, e2fsck in
particular will run for ages, and eventually segfault.

By contrast, ext3 directly on the physical disk partition works fine and
withstood days of continouus bonnieing.

This is with Etch, kernel 2.6.18-4-686-bigmem. FWIW, the machine used to
run Sarge with a 2.4 kernel, where the RAID worked fine.

Now, it seems quite unlikely that RAID is completely broken in 2.6, so I
suppose it might be related to the hardware: it's a Pentium 4 @ 2.8 GHz,
1.5 GiB RAM, the SATA Controller is a Promise S150 SX4 using the
sata_sx4 kernel module.

Any ideas on this?


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
  Réponse avec citation
Vieux 06/08/2007, 18h40   #2
michael@estone.ca
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Filesystem corruption on md (Software) RAID

Quoting Sebastian Flothow <flothow@gip.com>:

> Hi,
>
> I'm getting massive filesystem corruption on an md RAID comprising 4
> SATA disks. I tried ext3, xfs and reiserfs on RAID level 5 as well as
> ext3 on RAID level 1 (using only 2 disks); all can be crashed reliably
> by running bonnie++ for just a few minutes. In the case of ext3, I
> usually get dmesg output like this:
>
> [...]
> md0: rw=1, want=1482184800, limit=490223232
> attempt to access beyond end of device
> md0: rw=1, want=1482184800, limit=490223232
> attempt to access beyond end of device
> md0: rw=1, want=1482184800, limit=490223232
> Buffer I/O error on device md0, logical block 185273099
> lost page write due to I/O error on md0
> Aborting journal on device md0.
> EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted
> EXT3-fs error (device md0) in ext3_dirty_inode: Journal has aborted
> EXT3-fs error (device md0) in ext3_new_blocks: Journal has aborted
> ext3_abort called.
> EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
> Remounting filesystem read-only
>
> The filesystems are impossible to repair afterwards, e2fsck in
> particular will run for ages, and eventually segfault.
>
> By contrast, ext3 directly on the physical disk partition works fine and
> withstood days of continouus bonnieing.
>
> This is with Etch, kernel 2.6.18-4-686-bigmem. FWIW, the machine used to
> run Sarge with a 2.4 kernel, where the RAID worked fine.
>
> Now, it seems quite unlikely that RAID is completely broken in 2.6, so I
> suppose it might be related to the hardware: it's a Pentium 4 @ 2.8 GHz,
> 1.5 GiB RAM, the SATA Controller is a Promise S150 SX4 using the
> sata_sx4 kernel module.
>
>

Defintely sounds like hardware is failing.
You could try installing smartmontools onto your system and use it
to scan your drives. It might tell you if you have some bad sectors, or some
other failing component.
Also, try not using the bigmem kernel. AFAIK, its designed for 32 bit
systems with RAM exceeding 4 Gigs. ?? (Although I would guess that
shouldn't make a difference)
  Réponse avec citation
Vieux 10/08/2007, 17h00   #3
Sebastian Flothow
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Filesystem corruption on md (Software) RAID

michael@estone.ca wrote:
> Defintely sounds like hardware is failing.
> You could try installing smartmontools onto your system and use it
> to scan your drives. It might tell you if you have some bad sectors, or
> some other failing component.


The hardware is fine - I checked the SMART status, did a full read/write
test with badblocks on md0, and in fact the very same hardware and RAID
setup worked fine for the past year, using a 2.4 kernel, with high
filesystem load every day.

It's just when I put a filesystem on top of an md device that things
break - my assumption is that there is a race in the kernel involving
sata_sx4 and the md modules. Given that the Promise SX4 is not really a
shining piece of hardware, and not that popular, I wouldn't be surprised
if the driver is a bit flaky too.

Anyway, we've decided to replace it with a real RAID controller, that
should sort things out.


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 03h20.


Édité par : vBulletin®
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,09782 seconds with 11 queries