PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Forums Hébergement > Forum Serveur - Sécurité et techniques > comp.unix.shell > Get the md5sum of every 64MB block in a large file using bash.
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
comp.unix.shell Using and programming the Unix shell.

Get the md5sum of every 64MB block in a large file using bash.

Réponse
 
LinkBack Outils de la discussion
Vieux 07/05/2008, 20h18   #1
sillyhat@yahoo.com
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Get the md5sum of every 64MB block in a large file using bash.

Hello, can someone please .

I have an 8GB file and need the md5sum of every 64MB block in the
file.

I'm looking for some ideas on how to write a script to do this using
bash - not interested in perl or other language solutions.

The size of my disk is 10GB with a smallish linux system and 256MB
free disk space.

Thanks for all constructive posts.

Hal
  Réponse avec citation
Vieux 07/05/2008, 22h18   #2
mo
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Get the md5sum of every 64MB block in a large file using bash.

On Wed, 07 May 2008 16:18:27 -0300, <sillyhat@yahoo.com> wrote:

> Hello, can someone please .
>
> I have an 8GB file and need the md5sum of every 64MB block in the
> file.
>
> I'm looking for some ideas on how to write a script to do this using
> bash - not interested in perl or other language solutions.
>
> The size of my disk is 10GB with a smallish linux system and 256MB
> free disk space.
>
> Thanks for all constructive posts.
>
> Hal


If time is not a problem you can try:

MD5=;x=0;while [ "$MD5" != d41d8cd98f00b204e9800998ecf8427e ];do
MD5=`dd status=noxfer if=file bs=64M skip=$x|md5sum|cut -d '' -f 1`
x=$[x+1]
echo $x $MD5 # for block 1, skipped 0
done >blocks

Don't tested!
See also command split.

The problem with the idea above is the restarting file read
after each block:
$ echo $[8000/64]
125
  Réponse avec citation
Vieux 07/05/2008, 22h35   #3
Luuk
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Get the md5sum of every 64MB block in a large file using bash.

sillyhat@yahoo.com schreef:
> Hello, can someone please .
>
> I have an 8GB file and need the md5sum of every 64MB block in the
> file.
>
> I'm looking for some ideas on how to write a script to do this using
> bash - not interested in perl or other language solutions.
>
> The size of my disk is 10GB with a smallish linux system and 256MB
> free disk space.
>
> Thanks for all constructive posts.
>
> Hal


something like:

[1] for ((x=0; x<128; x++)) ;
[2] do
[3] dd if=largefile ibs=64M obs=64M skip=$x skip=$x of=tmp;
[4] echo $x;
[5] md5sum tmp;
[6] done


reading in 128 steps, through your file, creating a (temp)file 'tmp' of
which the md5sum is created.

--
Luuk
  Réponse avec citation
Vieux 08/05/2008, 09h18   #4
Dave B
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Get the md5sum of every 64MB block in a large file using bash.

On Wednesday 7 May 2008 23:35, Luuk wrote:

> something like:
>
> [1] for ((x=0; x<128; x++)) ;
> [2] do
> [3] dd if=largefile ibs=64M obs=64M skip=$x skip=$x of=tmp;
> [4] echo $x;
> [5] md5sum tmp;
> [6] done
>
>
> reading in 128 steps, through your file, creating a (temp)file 'tmp' of
> which the md5sum is created.


You don't need the temp file, since you can pipe the output of dd directly
to md5sum. Just omit the "of=" part in the dd command, and dd will write to
stdout.

--
D.
  Réponse avec citation
Vieux 08/05/2008, 11h55   #5
Stephane CHAZELAS
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Get the md5sum of every 64MB block in a large file using bash.

2008-05-7, 12:18(-07), sillyhat@yahoo.com:
> Hello, can someone please .
>
> I have an 8GB file and need the md5sum of every 64MB block in the
> file.
>
> I'm looking for some ideas on how to write a script to do this using
> bash - not interested in perl or other language solutions.
>
> The size of my disk is 10GB with a smallish linux system and 256MB
> free disk space.

[...]

You should need to use any disk space:
while
{
details=$(
{
LC_ALL=C dd bs="$((64*1024*1024))" count=1 2>&3 | md5sum >&4
} 3>&1
)
} 4>&1 &&
case $details in (*"1+0 records in"*) ;; (*) false;; esac
do :
done < your-big-file



--
Stéphane
  Réponse avec citation
Vieux 09/05/2008, 03h38   #6
mo
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Get the md5sum of every 64MB block in a large file using bash.

On Wed, 07 May 2008 18:18:02 -0300, mo <invalid@mail.address> wrote:

> On Wed, 07 May 2008 16:18:27 -0300, <sillyhat@yahoo.com> wrote:
>
>> Hello, can someone please .
>>
>> I have an 8GB file and need the md5sum of every 64MB block in the
>> file.
>>
>> I'm looking for some ideas on how to write a script to do this using
>> bash - not interested in perl or other language solutions.
>>
>> The size of my disk is 10GB with a smallish linux system and 256MB
>> free disk space.
>>
>> Thanks for all constructive posts.
>>
>> Hal

>
> If time is not a problem you can try:
>
> MD5=;x=0;while [ "$MD5" != d41d8cd98f00b204e9800998ecf8427e ];do
> MD5=`dd status=noxfer if=file bs=64M skip=$x|md5sum|cut -d' ' -f 1`
> x=$[x+1]
> echo $x $MD5 # for block 1, skipped 0
> done >blocks
>
> Don't tested!
> See also command split.
>
> The problem with the idea above is the restarting file read
> after each block:
> $ echo $[8000/64]
> 125



Updating my previous code with CHAZELAS' smart idea to the feeder
and now using head instead dd:

md5(){ [ $1 ]||{ echo "md5 <block_size_in_bytes>" >&2;return 1;}
x=0MD5=
while MD5=`head -c$1|md5sum|cut -d ' ' -f 1`&&[ "$pMD5" != "$MD5" ];do
pMD5=$MD5
x=$[x+1]
echo "$MD5 $x"
done
}

###Using:
$ md5
md5 <block_size_in_bytes>
$

$ md5 512 <x.log
ae85a3ff6457e95c723c7d90232cf738 1
b09cdeb518910d3b8b8fb09bd8a57488 2
4fd51c79224201930e294d71feb47c7a 3
d41d8cd98f00b204e9800998ecf8427e 4
$

$ cat x.log|md5 1024
dd66ec03b7c59ed2828090dae4421818 1
4fd51c79224201930e294d71feb47c7a 2
d41d8cd98f00b204e9800998ecf8427e 3
$

With this new code the last line is always from a null string.
  Réponse avec citation
Vieux 10/05/2008, 19h28   #7
sillyhat@yahoo.com
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Get the md5sum of every 64MB block in a large file using bash.

Thanks for the responses. Good work!

I also put together this attempt which seems to do the trick as well.
It's based on previous responses and some ideas found in the article
'Mincing Your Data into Arbitrary Chunks (in bash)' from the book
Linux Server Hacks. I found the article online.

If there are some glaring errors, please let me know.

Hal
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#!/bin/sh

#IN="/dev/hda1"
IN="bigfile"

if [ "${IN:0:5}" == "/dev/" ] ; then
echo "filename starts '/dev/...' so assume its a disk or
partition."
INX=`echo $IN | sed 's:/:\\\\/:g'`
SIZE=`fdisk -l $IN | awk '/'$INX'\:/ {print $5}'`
else
echo "filename does not start '/dev/..' so assume its a normal
file."
SIZE=`ls -l $IN | awk '{print $5}'`
fi

echo "SIZE="$SIZE

OUT="out"
B="$((64*1024*1024))"

total=0
while [ $total -lt $SIZE ]; do
dd bs="$B" count=1 2> /dev/null | md5sum
total=$((total + B))
done < $IN
  Réponse avec citation
Vieux 10/05/2008, 20h11   #8
Stephane CHAZELAS
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Get the md5sum of every 64MB block in a large file using bash.

2008-05-10, 11:28(-07), sillyhat@yahoo.com:
[...]
> #!/bin/sh
>
> #IN="/dev/hda1"
> IN="bigfile"
>
> if [ "${IN:0:5}" == "/dev/" ] ; then


That's not standard sh syntax, the ${IN:0:5} and "==" is ksh93
syntax also recognised by bash (and not by GNU "[", BTW).

case $IN in
(/dev/*) ...

> echo "filename starts '/dev/...' so assume its a disk or
> partition."


But to check whether it's a block device, you can simply do:

[ -b "$IN" ]

> INX=`echo $IN | sed 's:/:\\\\/:g'`
> SIZE=`fdisk -l $IN | awk '/'$INX'\:/ {print $5}'`


which works for disks but not for other block devices.

And instead of escaping IN and use awk /.../, you could have
done:

awk -v disk="$IN" 'index($0, disk ":") {print $5}'

Which is a substring search instead of a pattern search.

See also

fdisk -l -- "$IN" | sed -n "s#.* $IN:.* \([0-9]*\) bytes#\1#p"

The util-linux tools have fdisk but also the blockdev command:

SIZE=$(blockdev --getsize64 "$IN")

> else
> echo "filename does not start '/dev/..' so assume its a normal
> file."
> SIZE=`ls -l $IN | awk '{print $5}'`


SIZE=$(wc -c < "$IN")

You may want to check that it's a regular file as well:

[ -f "$IN" ]

With ls, you'd need the "-L" option because for symlinks, you
want the size of the file pointed to, not of the symlink which
is irrelevant.

> fi
>
> echo "SIZE="$SIZE


Funny that you put quotes where it was not necessary and not
where it was. In shells, double quotes are to be put around
variables:

echo "SIZE=$SIZE"

Without quotes, variable expansions are subject to word
splitting and filename generation.

>
> OUT="out"
> B="$((64*1024*1024))"
>
> total=0
> while [ $total -lt $SIZE ]; do
> dd bs="$B" count=1 2> /dev/null | md5sum
> total=$((total + B))


total=$(($total + $B)) is prefered (not necessary in most recent
implementations of sh though)

> done < $IN


The solution that I had given that parses the stderr output of
dd allows you not to have to find out the size beforehand. It
stops when dd can't read a whole input block.

--
Stéphane
  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 19h12.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,20200 seconds with 16 queries