PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Forums Hébergement > Forum Serveur - Sécurité et techniques > comp.unix.shell > Howto read file line-by-line in bash
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
comp.unix.shell Using and programming the Unix shell.

Howto read file line-by-line in bash

Réponse
 
LinkBack Outils de la discussion
Vieux 20/03/2008, 10h38   #1
Viatly
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Howto read file line-by-line in bash

The content of test.data is

-bash-3.2# cat test.data
line1
line2
line3
It is important that there is no trailing EOL at the end of file.
I read test.data with the following script:

-bash-3.2# cat test.sh
#!/bin/bash
while read line
do
echo "$line"
done < "test.data"

-bash-3.2# ./test.sh
line1
line2

That is, "line3" is lost.
Questions:
1. What is a nice way to fix this code?
2. The code of the script pretends to be a stdandard way if reading
text file line-by-line because it is recommended by respected
resources (e.g. http://bash-hackers.org/wiki/doku.php/tests/bashfaq).
Provided the code is ok, does it mean that a typical text file in Unix/
Linux should have EOL at the end?
  Réponse avec citation
Vieux 20/03/2008, 10h49   #2
Stephane CHAZELAS
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Howto read file line-by-line in bash

2008-03-20, 02:38(-07), Viatly:
> The content of test.data is
>
> -bash-3.2# cat test.data
> line1
> line2
> line3
> It is important that there is no trailing EOL at the end of file.
> I read test.data with the following script:
>
> -bash-3.2# cat test.sh
> #!/bin/bash
> while read line
> do
> echo "$line"
> done < "test.data"
>
> -bash-3.2# ./test.sh
> line1
> line2
>
> That is, "line3" is lost.
> Questions:
> 1. What is a nice way to fix this code?


The nicest way is to avoid while read loops in shells.

> 2. The code of the script pretends to be a stdandard way if reading
> text file line-by-line because it is recommended by respected
> resources (e.g. http://bash-hackers.org/wiki/doku.php/tests/bashfaq).
> Provided the code is ok, does it mean that a typical text file in Unix/
> Linux should have EOL at the end?


"read" returns false if a full line is not read, but $line will
contain those extra characters after the last NL character

while IFS= read -r line; do
printf '%s\n' "$line"
done < test.data
printf %s "$line"

Will do it, but

cat test.data

will do the same (and work even better if for instance test.data
contains NUL bytes).

Note that a file that doesn't end in a NL character is not a
text file as per the POSIX definition of a text file. (that
means for instance that the behavior of a text utility
processing it is unspecified most of the time).

--
Stéphane
  Réponse avec citation
Vieux 20/03/2008, 11h04   #3
Viatly
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Howto read file line-by-line in bash

On 20 mrt, 10:49, Stephane CHAZELAS <this.addr...@is.invalid> wrote:
> 2008-03-20, 02:38(-07), Viatly:
>
>
>
> > The content of test.data is

>
> > -bash-3.2# cat test.data
> > line1
> > line2
> > line3
> > It is important that there is no trailing EOL at the end of file.
> > I read test.data with the following script:

>
> > -bash-3.2# cat test.sh
> > #!/bin/bash
> > while read line
> > do
> > echo "$line"
> > done < "test.data"

>
> > -bash-3.2# ./test.sh
> > line1
> > line2

>
> > That is, "line3" is lost.
> > Questions:
> > 1. What is a nice way to fix this code?

>
> The nicest way is to avoid while read loops in shells.


Why? In fact what I need is
while read line
do
# do some processing
done < "test.data"

>
> > 2. The code of the script pretends to be a stdandard way if reading
> > text file line-by-line because it is recommended by respected
> > resources (e.g.http://bash-hackers.org/wiki/doku.php/tests/bashfaq).
> > Provided the code is ok, does it mean that a typical text file in Unix/
> > Linux should have EOL at the end?

>
> "read" returns false if a full line is not read, but $line will
> contain those extra characters after the last NL character
>
> while IFS= read -r line; do
> printf '%s\n' "$line"
> done < test.data
> printf %s "$line"
>
> Will do it, but
>
> cat test.data


Do you mean:

for line in `cat test.data`;
do
echo $line;
done

In this case if the line contains words separated by a whitespace,
this whitespace will be used as a separator. Which I do not need.

>
> Note that a file that doesn't end in a NL character is not a
> text file as per the POSIX definition of a text file. (that
> means for instance that the behavior of a text utility
> processing it is unspecified most of the time).
>


This is good argument. So, the problem is not in code, but rather in
ill-formed text file. Right?

> --
> Stéphane


  Réponse avec citation
Vieux 20/03/2008, 11h18   #4
Stephane CHAZELAS
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Howto read file line-by-line in bash

2008-03-20, 03:04(-07), Viatly:
[...]
> Why? In fact what I need is
> while read line
> do
> # do some processing
> done < "test.data"


What kind of processing? If it's text processing, just use a
text processing tool that will take care of looping through all
the lines as most text utilities do.

If you need some specific command to be run for every line, see
also the "xargs" utility.

In any case "read line" involves a very special behavior of
"read".

If you want to *only* read the line, it's IFS= read -r line.
Without IFS= or -r, you get extra processing which you generally
don't want.

[...]
> for line in `cat test.data`;
> do
> echo $line;
> done
>
> In this case if the line contains words separated by a whitespace,
> this whitespace will be used as a separator. Which I do not need.


No, that's even worse.

If you're really going to use a loop, then it's:

while IFS= read -r line <&3; do
some-processing "$line" # don't forget the quotes
done 3< data.file
[ -n "$line" ] && some-extra-processing "$line" # for the extra
# chars after
# the last line

Using fd 3 instead of 0 allows your "some-processing" to have
access to the original stdin.

Or:

while IFS= read <&3 -r line || [ -n "$line" ]; do
some-processing "$line" # don't forget the quotes
done 3< data.file

(but note that in that case "read" will be called an extra time
which may cause problems if "data.file" is some special kind of
file)

>> Note that a file that doesn't end in a NL character is not a
>> text file as per the POSIX definition of a text file. (that
>> means for instance that the behavior of a text utility
>> processing it is unspecified most of the time).
>>

>
> This is good argument. So, the problem is not in code, but rather in
> ill-formed text file. Right?

[...]

I'd say yes.

--
Stéphane
  Réponse avec citation
Vieux 20/03/2008, 11h52   #5
Viatly
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Howto read file line-by-line in bash

On 20 mrt, 11:18, Stephane CHAZELAS <this.addr...@is.invalid> wrote:
> 2008-03-20, 03:04(-07), Viatly:
> [...]
>
> > Why? In fact what I need is
> > while read line
> > do
> > # do some processing
> > done < "test.data"

>
> What kind of processing? If it's text processing, just use a
> text processing tool that will take care of looping through all
> the lines as most text utilities do.
>
> If you need some specific command to be run for every line, see
> also the "xargs" utility.
>
> In any case "read line" involves a very special behavior of
> "read".
>
> If you want to *only* read the line, it's IFS= read -r line.
> Without IFS= or -r, you get extra processing which you generally
> don't want.
>
> [...]
>
> > for line in `cat test.data`;
> > do
> > echo $line;
> > done

>
> > In this case if the line contains words separated by a whitespace,
> > this whitespace will be used as a separator. Which I do not need.

>
> No, that's even worse.
>
> If you're really going to use a loop, then it's:
>
> while IFS= read -r line <&3; do
> some-processing "$line" # don't forget the quotes
> done 3< data.file
> [ -n "$line" ] && some-extra-processing "$line" # for the extra
> # chars after
> # the last line
>
> Using fd 3 instead of 0 allows your "some-processing" to have
> access to the original stdin.
>
> Or:
>
> while IFS= read <&3 -r line || [ -n "$line" ]; do
> some-processing "$line" # don't forget the quotes
> done 3< data.file
>
> (but note that in that case "read" will be called an extra time
> which may cause problems if "data.file" is some special kind of
> file)
>
> >> Note that a file that doesn't end in a NL character is not a
> >> text file as per the POSIX definition of a text file. (that
> >> means for instance that the behavior of a text utility
> >> processing it is unspecified most of the time).

>
> > This is good argument. So, the problem is not in code, but rather in
> > ill-formed text file. Right?

>
> [...]
>
> I'd say yes.
>
> --
> Stéphane


Thanx a lot!
  Réponse avec citation
Vieux 20/03/2008, 17h45   #6
Cyrus Kriticos
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Howto read file line-by-line in bash

Viatly wrote:
> It is important that there is no trailing EOL at the end of file.


ok, add a EOL.

> I read test.data with the following script:
>
> #!/bin/bash
> while read line
> do
> echo "$line"
> done < "test.data"


done < <(cat "test.data"; echo)

--
Best regards | Monica Lewinsky's X-Boyfriend's
Cyrus | Wife for President
  Réponse avec citation
Vieux 20/03/2008, 18h10   #7
Old Man
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Howto read file line-by-line in bash

Viatly wrote:
> The content of test.data is
>
> -bash-3.2# cat test.data
> line1
> line2
> line3
> It is important that there is no trailing EOL at the end of file.
> I read test.data with the following script:
>
> -bash-3.2# cat test.sh
> #!/bin/bash
> while read line
> do
> echo "$line"
> done < "test.data"
>
> -bash-3.2# ./test.sh
> line1
> line2
>
> That is, "line3" is lost.
> Questions:
> 1. What is a nice way to fix this code?
> 2. The code of the script pretends to be a stdandard way if reading
> text file line-by-line because it is recommended by respected
> resources (e.g. http://bash-hackers.org/wiki/doku.php/tests/bashfaq).
> Provided the code is ok, does it mean that a typical text file in Unix/
> Linux should have EOL at the end?


First, your script will perfectly do the job.

The read is terminating, possibly, because there is a character in the
input file that causes the read to think the end of the file has been
reached. Realize that UNIX does not have an EOF character; instead, it
has a total byte count. When the total bytes that are recorded in the
inode are read, then the file is deemed to be at the end of the file.
For your script to end at line 2 instead of line 3 means that something
caused it to think end of file.

Inspect your input data in the file, test.data. I suspect that there is
a control character or something as simple as a control-c, carriage
return, or the like. To see if the input file has these values:

strings test.data # Only shows valid printable characters.
od -cx test.data # Shows all values to determine the bad.
cat -vte test.data # Shows character values in characters;
# Shows no-character in another form such
# as a TAB is ^I.

If the data was transferred from Windows to UNIX, these types of
non-visible characters are common.

I hope that this ed.

Old Man
  Réponse avec citation
Vieux 20/03/2008, 18h15   #8
Old Man
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Howto read file line-by-line in bash

Old Man wrote:
> Viatly wrote:
>> The content of test.data is
>>
>> -bash-3.2# cat test.data
>> line1
>> line2
>> line3
>> It is important that there is no trailing EOL at the end of file.
>> I read test.data with the following script:
>>
>> -bash-3.2# cat test.sh
>> #!/bin/bash
>> while read line
>> do
>> echo "$line"
>> done < "test.data"
>>
>> -bash-3.2# ./test.sh
>> line1
>> line2
>>
>> That is, "line3" is lost.
>> Questions:
>> 1. What is a nice way to fix this code?
>> 2. The code of the script pretends to be a stdandard way if reading
>> text file line-by-line because it is recommended by respected
>> resources (e.g. http://bash-hackers.org/wiki/doku.php/tests/bashfaq).
>> Provided the code is ok, does it mean that a typical text file in Unix/
>> Linux should have EOL at the end?

>
> First, your script will perfectly do the job.
>
> The read is terminating, possibly, because there is a character in the
> input file that causes the read to think the end of the file has been
> reached. Realize that UNIX does not have an EOF character; instead, it
> has a total byte count. When the total bytes that are recorded in the
> inode are read, then the file is deemed to be at the end of the file.
> For your script to end at line 2 instead of line 3 means that something
> caused it to think end of file.
>
> Inspect your input data in the file, test.data. I suspect that there is
> a control character or something as simple as a control-c, carriage
> return, or the like. To see if the input file has these values:
>
> strings test.data # Only shows valid printable characters.
> od -cx test.data # Shows all values to determine the bad.
> cat -vte test.data # Shows character values in characters;
> # Shows no-character in another form such
> # as a TAB is ^I.
>
> If the data was transferred from Windows to UNIX, these types of
> non-visible characters are common.
>
> I hope that this ed.
>
> Old Man



I omitted another tool that you need to make this evaluation.

The command, "man ascii", shows the valid and visible characters, as
well as the non-visible with their associated hex and such equivalent.
All characters from hex 00 to 1F are non-visible; these are the ones
that the od and cat commands will to see. If you have a problem
value in test.data, it is likely in that range.

Old Man
  Réponse avec citation
Vieux 20/03/2008, 19h19   #9
Cyrus Kriticos
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Howto read file line-by-line in bash



Old Man wrote:
> Old Man wrote:
>> Viatly wrote:
>>> The content of test.data is
>>>
>>> -bash-3.2# cat test.data
>>> line1
>>> line2
>>> line3
>>> It is important that there is no trailing EOL at the end of file.
>>> I read test.data with the following script:
>>>
>>> -bash-3.2# cat test.sh
>>> #!/bin/bash
>>> while read line
>>> do
>>> echo "$line"
>>> done < "test.data"
>>>
>>> -bash-3.2# ./test.sh
>>> line1
>>> line2
>>>
>>> That is, "line3" is lost.
>>> Questions:
>>> 1. What is a nice way to fix this code?
>>> 2. The code of the script pretends to be a stdandard way if reading
>>> text file line-by-line because it is recommended by respected
>>> resources (e.g. http://bash-hackers.org/wiki/doku.php/tests/bashfaq).
>>> Provided the code is ok, does it mean that a typical text file in Unix/
>>> Linux should have EOL at the end?

>>
>> First, your script will perfectly do the job.
>>
>> The read is terminating, possibly, because there is a character in the
>> input file that causes the read to think the end of the file has been
>> reached. Realize that UNIX does not have an EOF character; instead,
>> it has a total byte count. When the total bytes that are recorded in
>> the inode are read, then the file is deemed to be at the end of the
>> file. For your script to end at line 2 instead of line 3 means that
>> something caused it to think end of file.
>>
>> Inspect your input data in the file, test.data. I suspect that there
>> is a control character or something as simple as a control-c, carriage
>> return, or the like. To see if the input file has these values:
>>
>> strings test.data # Only shows valid printable characters.
>> od -cx test.data # Shows all values to determine the bad.
>> cat -vte test.data # Shows character values in characters;
>> # Shows no-character in another form such
>> # as a TAB is ^I.
>>
>> If the data was transferred from Windows to UNIX, these types of
>> non-visible characters are common.
>>
>> I hope that this ed.
>>
>> Old Man

>
>
> I omitted another tool that you need to make this evaluation.
>
> The command, "man ascii", shows the valid and visible characters, as
> well as the non-visible with their associated hex and such equivalent.
> All characters from hex 00 to 1F are non-visible; these are the ones
> that the od and cat commands will to see. If you have a problem
> value in test.data, it is likely in that range.
>
> Old Man


How-to-make-your-own-file-without-newline-at-the-end:

$ echo "line1" > file
$ echo "line2" >> file
$ echo -n "line3" >> file

and then

$ while read line; do echo $line; done < file

--
Best regards | Monica Lewinsky's X-Boyfriend's
Cyrus | Wife for President
  Réponse avec citation
Vieux 25/04/2008, 16h03   #10
Guru
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Howto read file line-by-line in bash

On Mar 20, 6:04 am, Viatly <postoronnim...@mail.ru> wrote:
> On 20 mrt, 10:49, Stephane CHAZELAS <this.addr...@is.invalid> wrote:
>
>
>
> > 2008-03-20, 02:38(-07), Viatly:

>
> > > The content of test.data is

>
> > > -bash-3.2# cat test.data
> > > line1
> > > line2
> > > line3
> > > It is important that there is no trailing EOL at the end of file.
> > > I read test.data with the following script:

>
> > > -bash-3.2# cat test.sh
> > > #!/bin/bash
> > > while read line
> > > do
> > > echo "$line"
> > > done < "test.data"

>
> > > -bash-3.2# ./test.sh
> > > line1
> > > line2

>
> > > That is, "line3" is lost.
> > > Questions:
> > > 1. What is a nice way to fix this code?

>
> > The nicest way is to avoid while read loops in shells.

>
> Why? In fact what I need is
> while read line
> do
> # do some processing
> done < "test.data"
>
>
>
>
>
> > > 2. The code of the script pretends to be a stdandard way if reading
> > > text file line-by-line because it is recommended by respected
> > > resources (e.g.http://bash-hackers.org/wiki/doku.php/tests/bashfaq).
> > > Provided the code is ok, does it mean that a typical text file in Unix/
> > > Linux should have EOL at the end?

>
> > "read" returns false if a full line is not read, but $line will
> > contain those extra characters after the last NL character

>
> > while IFS= read -r line; do
> > printf '%s\n' "$line"
> > done < test.data
> > printf %s "$line"

>
> > Will do it, but

>
> > cat test.data

>
> Do you mean:
>
> for line in `cat test.data`;
> do
> echo $line;
> done
>
> In this case if the line contains words separated by a whitespace,
> this whitespace will be used as a separator. Which I do not need.
>
>
>
> > Note that a file that doesn't end in a NL character is not a
> > text file as per the POSIX definition of a text file. (that
> > means for instance that the behavior of a text utility
> > processing it is unspecified most of the time).

>
> This is good argument. So, the problem is not in code, but rather in
> ill-formed text file. Right?
>
> > --
> > Stéphane


Hi Viatly,

Try using this..

OLDIFS=$IFS
IFS="|"
for line in `cat test.data`;
do
echo $line;
done
IFS=$OLDIFS

This is simple and crisp.

Rgds
Gaurav S
  Réponse avec citation
Vieux 25/04/2008, 17h26   #11
Stephane CHAZELAS
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Howto read file line-by-line in bash

2008-04-25, 08:03(-07), Guru:
[...]
> Try using this..
>
> OLDIFS=$IFS
> IFS="|"
> for line in `cat test.data`;
> do
> echo $line;
> done
> IFS=$OLDIFS
>
> This is simple and crisp.

[...]

And you oversaw three more problems: empty lines are discarded,
try it with a line containing "*", and if IFS was previously
unset, it becomes set to the empty string which has a totally
different meaning.

Using loops in shells always leads to this kind of corner case
problems. Shells are not meant to be used like that, avoid loops
to process text.

--
Stéphane
  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 22h13.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,24707 seconds with 19 queries