|
|
|
|
||||||
| comp.unix.shell Using and programming the Unix shell. |
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Hello, can someone please .
I have an 8GB file and need the md5sum of every 64MB block in the file. I'm looking for some ideas on how to write a script to do this using bash - not interested in perl or other language solutions. The size of my disk is 10GB with a smallish linux system and 256MB free disk space. Thanks for all constructive posts. Hal |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
On Wed, 07 May 2008 16:18:27 -0300, <sillyhat@yahoo.com> wrote:
> Hello, can someone please . > > I have an 8GB file and need the md5sum of every 64MB block in the > file. > > I'm looking for some ideas on how to write a script to do this using > bash - not interested in perl or other language solutions. > > The size of my disk is 10GB with a smallish linux system and 256MB > free disk space. > > Thanks for all constructive posts. > > Hal If time is not a problem you can try: MD5=;x=0;while [ "$MD5" != d41d8cd98f00b204e9800998ecf8427e ];do MD5=`dd status=noxfer if=file bs=64M skip=$x|md5sum|cut -d '' -f 1` x=$[x+1] echo $x $MD5 # for block 1, skipped 0 done >blocks Don't tested! See also command split. The problem with the idea above is the restarting file read after each block: $ echo $[8000/64] 125 |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
sillyhat@yahoo.com schreef:
> Hello, can someone please . > > I have an 8GB file and need the md5sum of every 64MB block in the > file. > > I'm looking for some ideas on how to write a script to do this using > bash - not interested in perl or other language solutions. > > The size of my disk is 10GB with a smallish linux system and 256MB > free disk space. > > Thanks for all constructive posts. > > Hal something like: [1] for ((x=0; x<128; x++)) ; [2] do [3] dd if=largefile ibs=64M obs=64M skip=$x skip=$x of=tmp; [4] echo $x; [5] md5sum tmp; [6] done reading in 128 steps, through your file, creating a (temp)file 'tmp' of which the md5sum is created. -- Luuk |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
On Wednesday 7 May 2008 23:35, Luuk wrote:
> something like: > > [1] for ((x=0; x<128; x++)) ; > [2] do > [3] dd if=largefile ibs=64M obs=64M skip=$x skip=$x of=tmp; > [4] echo $x; > [5] md5sum tmp; > [6] done > > > reading in 128 steps, through your file, creating a (temp)file 'tmp' of > which the md5sum is created. You don't need the temp file, since you can pipe the output of dd directly to md5sum. Just omit the "of=" part in the dd command, and dd will write to stdout. -- D. |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
2008-05-7, 12:18(-07), sillyhat@yahoo.com:
> Hello, can someone please . > > I have an 8GB file and need the md5sum of every 64MB block in the > file. > > I'm looking for some ideas on how to write a script to do this using > bash - not interested in perl or other language solutions. > > The size of my disk is 10GB with a smallish linux system and 256MB > free disk space. [...] You should need to use any disk space: while { details=$( { LC_ALL=C dd bs="$((64*1024*1024))" count=1 2>&3 | md5sum >&4 } 3>&1 ) } 4>&1 && case $details in (*"1+0 records in"*) ;; (*) false;; esac do : done < your-big-file -- Stéphane |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
On Wed, 07 May 2008 18:18:02 -0300, mo <invalid@mail.address> wrote:
> On Wed, 07 May 2008 16:18:27 -0300, <sillyhat@yahoo.com> wrote: > >> Hello, can someone please . >> >> I have an 8GB file and need the md5sum of every 64MB block in the >> file. >> >> I'm looking for some ideas on how to write a script to do this using >> bash - not interested in perl or other language solutions. >> >> The size of my disk is 10GB with a smallish linux system and 256MB >> free disk space. >> >> Thanks for all constructive posts. >> >> Hal > > If time is not a problem you can try: > > MD5=;x=0;while [ "$MD5" != d41d8cd98f00b204e9800998ecf8427e ];do > MD5=`dd status=noxfer if=file bs=64M skip=$x|md5sum|cut -d' ' -f 1` > x=$[x+1] > echo $x $MD5 # for block 1, skipped 0 > done >blocks > > Don't tested! > See also command split. > > The problem with the idea above is the restarting file read > after each block: > $ echo $[8000/64] > 125 Updating my previous code with CHAZELAS' smart idea to the feeder and now using head instead dd: md5(){ [ $1 ]||{ echo "md5 <block_size_in_bytes>" >&2;return 1;} x=0 MD5=while MD5=`head -c$1|md5sum|cut -d ' ' -f 1`&&[ "$pMD5" != "$MD5" ];do pMD5=$MD5 x=$[x+1] echo "$MD5 $x" done } ###Using: $ md5 md5 <block_size_in_bytes> $ $ md5 512 <x.log ae85a3ff6457e95c723c7d90232cf738 1 b09cdeb518910d3b8b8fb09bd8a57488 2 4fd51c79224201930e294d71feb47c7a 3 d41d8cd98f00b204e9800998ecf8427e 4 $ $ cat x.log|md5 1024 dd66ec03b7c59ed2828090dae4421818 1 4fd51c79224201930e294d71feb47c7a 2 d41d8cd98f00b204e9800998ecf8427e 3 $ With this new code the last line is always from a null string. |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
Thanks for the responses. Good work!
I also put together this attempt which seems to do the trick as well. It's based on previous responses and some ideas found in the article 'Mincing Your Data into Arbitrary Chunks (in bash)' from the book Linux Server Hacks. I found the article online. If there are some glaring errors, please let me know. Hal ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #!/bin/sh #IN="/dev/hda1" IN="bigfile" if [ "${IN:0:5}" == "/dev/" ] ; then echo "filename starts '/dev/...' so assume its a disk or partition." INX=`echo $IN | sed 's:/:\\\\/:g'` SIZE=`fdisk -l $IN | awk '/'$INX'\:/ {print $5}'` else echo "filename does not start '/dev/..' so assume its a normal file." SIZE=`ls -l $IN | awk '{print $5}'` fi echo "SIZE="$SIZE OUT="out" B="$((64*1024*1024))" total=0 while [ $total -lt $SIZE ]; do dd bs="$B" count=1 2> /dev/null | md5sum total=$((total + B)) done < $IN |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
2008-05-10, 11:28(-07), sillyhat@yahoo.com:
[...] > #!/bin/sh > > #IN="/dev/hda1" > IN="bigfile" > > if [ "${IN:0:5}" == "/dev/" ] ; then That's not standard sh syntax, the ${IN:0:5} and "==" is ksh93 syntax also recognised by bash (and not by GNU "[", BTW). case $IN in (/dev/*) ... > echo "filename starts '/dev/...' so assume its a disk or > partition." But to check whether it's a block device, you can simply do: [ -b "$IN" ] > INX=`echo $IN | sed 's:/:\\\\/:g'` > SIZE=`fdisk -l $IN | awk '/'$INX'\:/ {print $5}'` which works for disks but not for other block devices. And instead of escaping IN and use awk /.../, you could have done: awk -v disk="$IN" 'index($0, disk ":") {print $5}' Which is a substring search instead of a pattern search. See also fdisk -l -- "$IN" | sed -n "s#.* $IN:.* \([0-9]*\) bytes#\1#p" The util-linux tools have fdisk but also the blockdev command: SIZE=$(blockdev --getsize64 "$IN") > else > echo "filename does not start '/dev/..' so assume its a normal > file." > SIZE=`ls -l $IN | awk '{print $5}'` SIZE=$(wc -c < "$IN") You may want to check that it's a regular file as well: [ -f "$IN" ] With ls, you'd need the "-L" option because for symlinks, you want the size of the file pointed to, not of the symlink which is irrelevant. > fi > > echo "SIZE="$SIZE Funny that you put quotes where it was not necessary and not where it was. In shells, double quotes are to be put around variables: echo "SIZE=$SIZE" Without quotes, variable expansions are subject to word splitting and filename generation. > > OUT="out" > B="$((64*1024*1024))" > > total=0 > while [ $total -lt $SIZE ]; do > dd bs="$B" count=1 2> /dev/null | md5sum > total=$((total + B)) total=$(($total + $B)) is prefered (not necessary in most recent implementations of sh though) > done < $IN The solution that I had given that parses the stderr output of dd allows you not to have to find out the size beforehand. It stops when dd can't read a whole input block. -- Stéphane |
|
![]() |
| Outils de la discussion | |
|
|