|
|
|
|
||||||
| comp.unix.shell Using and programming the Unix shell. |
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
The content of test.data is
-bash-3.2# cat test.data line1 line2 line3 It is important that there is no trailing EOL at the end of file. I read test.data with the following script: -bash-3.2# cat test.sh #!/bin/bash while read line do echo "$line" done < "test.data" -bash-3.2# ./test.sh line1 line2 That is, "line3" is lost. Questions: 1. What is a nice way to fix this code? 2. The code of the script pretends to be a stdandard way if reading text file line-by-line because it is recommended by respected resources (e.g. http://bash-hackers.org/wiki/doku.php/tests/bashfaq). Provided the code is ok, does it mean that a typical text file in Unix/ Linux should have EOL at the end? |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
2008-03-20, 02:38(-07), Viatly:
> The content of test.data is > > -bash-3.2# cat test.data > line1 > line2 > line3 > It is important that there is no trailing EOL at the end of file. > I read test.data with the following script: > > -bash-3.2# cat test.sh > #!/bin/bash > while read line > do > echo "$line" > done < "test.data" > > -bash-3.2# ./test.sh > line1 > line2 > > That is, "line3" is lost. > Questions: > 1. What is a nice way to fix this code? The nicest way is to avoid while read loops in shells. > 2. The code of the script pretends to be a stdandard way if reading > text file line-by-line because it is recommended by respected > resources (e.g. http://bash-hackers.org/wiki/doku.php/tests/bashfaq). > Provided the code is ok, does it mean that a typical text file in Unix/ > Linux should have EOL at the end? "read" returns false if a full line is not read, but $line will contain those extra characters after the last NL character while IFS= read -r line; do printf '%s\n' "$line" done < test.data printf %s "$line" Will do it, but cat test.data will do the same (and work even better if for instance test.data contains NUL bytes). Note that a file that doesn't end in a NL character is not a text file as per the POSIX definition of a text file. (that means for instance that the behavior of a text utility processing it is unspecified most of the time). -- Stéphane |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
On 20 mrt, 10:49, Stephane CHAZELAS <this.addr...@is.invalid> wrote:
> 2008-03-20, 02:38(-07), Viatly: > > > > > The content of test.data is > > > -bash-3.2# cat test.data > > line1 > > line2 > > line3 > > It is important that there is no trailing EOL at the end of file. > > I read test.data with the following script: > > > -bash-3.2# cat test.sh > > #!/bin/bash > > while read line > > do > > echo "$line" > > done < "test.data" > > > -bash-3.2# ./test.sh > > line1 > > line2 > > > That is, "line3" is lost. > > Questions: > > 1. What is a nice way to fix this code? > > The nicest way is to avoid while read loops in shells. Why? In fact what I need is while read line do # do some processing done < "test.data" > > > 2. The code of the script pretends to be a stdandard way if reading > > text file line-by-line because it is recommended by respected > > resources (e.g.http://bash-hackers.org/wiki/doku.php/tests/bashfaq). > > Provided the code is ok, does it mean that a typical text file in Unix/ > > Linux should have EOL at the end? > > "read" returns false if a full line is not read, but $line will > contain those extra characters after the last NL character > > while IFS= read -r line; do > printf '%s\n' "$line" > done < test.data > printf %s "$line" > > Will do it, but > > cat test.data Do you mean: for line in `cat test.data`; do echo $line; done In this case if the line contains words separated by a whitespace, this whitespace will be used as a separator. Which I do not need. > > Note that a file that doesn't end in a NL character is not a > text file as per the POSIX definition of a text file. (that > means for instance that the behavior of a text utility > processing it is unspecified most of the time). > This is good argument. So, the problem is not in code, but rather in ill-formed text file. Right? > -- > Stéphane |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
2008-03-20, 03:04(-07), Viatly:
[...] > Why? In fact what I need is > while read line > do > # do some processing > done < "test.data" What kind of processing? If it's text processing, just use a text processing tool that will take care of looping through all the lines as most text utilities do. If you need some specific command to be run for every line, see also the "xargs" utility. In any case "read line" involves a very special behavior of "read". If you want to *only* read the line, it's IFS= read -r line. Without IFS= or -r, you get extra processing which you generally don't want. [...] > for line in `cat test.data`; > do > echo $line; > done > > In this case if the line contains words separated by a whitespace, > this whitespace will be used as a separator. Which I do not need. No, that's even worse. If you're really going to use a loop, then it's: while IFS= read -r line <&3; do some-processing "$line" # don't forget the quotes done 3< data.file [ -n "$line" ] && some-extra-processing "$line" # for the extra # chars after # the last line Using fd 3 instead of 0 allows your "some-processing" to have access to the original stdin. Or: while IFS= read <&3 -r line || [ -n "$line" ]; do some-processing "$line" # don't forget the quotes done 3< data.file (but note that in that case "read" will be called an extra time which may cause problems if "data.file" is some special kind of file) >> Note that a file that doesn't end in a NL character is not a >> text file as per the POSIX definition of a text file. (that >> means for instance that the behavior of a text utility >> processing it is unspecified most of the time). >> > > This is good argument. So, the problem is not in code, but rather in > ill-formed text file. Right? [...] I'd say yes. -- Stéphane |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
On 20 mrt, 11:18, Stephane CHAZELAS <this.addr...@is.invalid> wrote:
> 2008-03-20, 03:04(-07), Viatly: > [...] > > > Why? In fact what I need is > > while read line > > do > > # do some processing > > done < "test.data" > > What kind of processing? If it's text processing, just use a > text processing tool that will take care of looping through all > the lines as most text utilities do. > > If you need some specific command to be run for every line, see > also the "xargs" utility. > > In any case "read line" involves a very special behavior of > "read". > > If you want to *only* read the line, it's IFS= read -r line. > Without IFS= or -r, you get extra processing which you generally > don't want. > > [...] > > > for line in `cat test.data`; > > do > > echo $line; > > done > > > In this case if the line contains words separated by a whitespace, > > this whitespace will be used as a separator. Which I do not need. > > No, that's even worse. > > If you're really going to use a loop, then it's: > > while IFS= read -r line <&3; do > some-processing "$line" # don't forget the quotes > done 3< data.file > [ -n "$line" ] && some-extra-processing "$line" # for the extra > # chars after > # the last line > > Using fd 3 instead of 0 allows your "some-processing" to have > access to the original stdin. > > Or: > > while IFS= read <&3 -r line || [ -n "$line" ]; do > some-processing "$line" # don't forget the quotes > done 3< data.file > > (but note that in that case "read" will be called an extra time > which may cause problems if "data.file" is some special kind of > file) > > >> Note that a file that doesn't end in a NL character is not a > >> text file as per the POSIX definition of a text file. (that > >> means for instance that the behavior of a text utility > >> processing it is unspecified most of the time). > > > This is good argument. So, the problem is not in code, but rather in > > ill-formed text file. Right? > > [...] > > I'd say yes. > > -- > Stéphane Thanx a lot! |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
Viatly wrote:
> It is important that there is no trailing EOL at the end of file. ok, add a EOL. > I read test.data with the following script: > > #!/bin/bash > while read line > do > echo "$line" > done < "test.data" done < <(cat "test.data"; echo) -- Best regards | Monica Lewinsky's X-Boyfriend's Cyrus | Wife for President |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
Viatly wrote:
> The content of test.data is > > -bash-3.2# cat test.data > line1 > line2 > line3 > It is important that there is no trailing EOL at the end of file. > I read test.data with the following script: > > -bash-3.2# cat test.sh > #!/bin/bash > while read line > do > echo "$line" > done < "test.data" > > -bash-3.2# ./test.sh > line1 > line2 > > That is, "line3" is lost. > Questions: > 1. What is a nice way to fix this code? > 2. The code of the script pretends to be a stdandard way if reading > text file line-by-line because it is recommended by respected > resources (e.g. http://bash-hackers.org/wiki/doku.php/tests/bashfaq). > Provided the code is ok, does it mean that a typical text file in Unix/ > Linux should have EOL at the end? First, your script will perfectly do the job. The read is terminating, possibly, because there is a character in the input file that causes the read to think the end of the file has been reached. Realize that UNIX does not have an EOF character; instead, it has a total byte count. When the total bytes that are recorded in the inode are read, then the file is deemed to be at the end of the file. For your script to end at line 2 instead of line 3 means that something caused it to think end of file. Inspect your input data in the file, test.data. I suspect that there is a control character or something as simple as a control-c, carriage return, or the like. To see if the input file has these values: strings test.data # Only shows valid printable characters. od -cx test.data # Shows all values to determine the bad. cat -vte test.data # Shows character values in characters; # Shows no-character in another form such # as a TAB is ^I. If the data was transferred from Windows to UNIX, these types of non-visible characters are common. I hope that this ed. Old Man |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
Old Man wrote:
> Viatly wrote: >> The content of test.data is >> >> -bash-3.2# cat test.data >> line1 >> line2 >> line3 >> It is important that there is no trailing EOL at the end of file. >> I read test.data with the following script: >> >> -bash-3.2# cat test.sh >> #!/bin/bash >> while read line >> do >> echo "$line" >> done < "test.data" >> >> -bash-3.2# ./test.sh >> line1 >> line2 >> >> That is, "line3" is lost. >> Questions: >> 1. What is a nice way to fix this code? >> 2. The code of the script pretends to be a stdandard way if reading >> text file line-by-line because it is recommended by respected >> resources (e.g. http://bash-hackers.org/wiki/doku.php/tests/bashfaq). >> Provided the code is ok, does it mean that a typical text file in Unix/ >> Linux should have EOL at the end? > > First, your script will perfectly do the job. > > The read is terminating, possibly, because there is a character in the > input file that causes the read to think the end of the file has been > reached. Realize that UNIX does not have an EOF character; instead, it > has a total byte count. When the total bytes that are recorded in the > inode are read, then the file is deemed to be at the end of the file. > For your script to end at line 2 instead of line 3 means that something > caused it to think end of file. > > Inspect your input data in the file, test.data. I suspect that there is > a control character or something as simple as a control-c, carriage > return, or the like. To see if the input file has these values: > > strings test.data # Only shows valid printable characters. > od -cx test.data # Shows all values to determine the bad. > cat -vte test.data # Shows character values in characters; > # Shows no-character in another form such > # as a TAB is ^I. > > If the data was transferred from Windows to UNIX, these types of > non-visible characters are common. > > I hope that this ed. > > Old Man I omitted another tool that you need to make this evaluation. The command, "man ascii", shows the valid and visible characters, as well as the non-visible with their associated hex and such equivalent. All characters from hex 00 to 1F are non-visible; these are the ones that the od and cat commands will to see. If you have a problem value in test.data, it is likely in that range. Old Man |
|
|
|
#9 |
|
Messages: n/a
Hébergeur: |
Old Man wrote: > Old Man wrote: >> Viatly wrote: >>> The content of test.data is >>> >>> -bash-3.2# cat test.data >>> line1 >>> line2 >>> line3 >>> It is important that there is no trailing EOL at the end of file. >>> I read test.data with the following script: >>> >>> -bash-3.2# cat test.sh >>> #!/bin/bash >>> while read line >>> do >>> echo "$line" >>> done < "test.data" >>> >>> -bash-3.2# ./test.sh >>> line1 >>> line2 >>> >>> That is, "line3" is lost. >>> Questions: >>> 1. What is a nice way to fix this code? >>> 2. The code of the script pretends to be a stdandard way if reading >>> text file line-by-line because it is recommended by respected >>> resources (e.g. http://bash-hackers.org/wiki/doku.php/tests/bashfaq). >>> Provided the code is ok, does it mean that a typical text file in Unix/ >>> Linux should have EOL at the end? >> >> First, your script will perfectly do the job. >> >> The read is terminating, possibly, because there is a character in the >> input file that causes the read to think the end of the file has been >> reached. Realize that UNIX does not have an EOF character; instead, >> it has a total byte count. When the total bytes that are recorded in >> the inode are read, then the file is deemed to be at the end of the >> file. For your script to end at line 2 instead of line 3 means that >> something caused it to think end of file. >> >> Inspect your input data in the file, test.data. I suspect that there >> is a control character or something as simple as a control-c, carriage >> return, or the like. To see if the input file has these values: >> >> strings test.data # Only shows valid printable characters. >> od -cx test.data # Shows all values to determine the bad. >> cat -vte test.data # Shows character values in characters; >> # Shows no-character in another form such >> # as a TAB is ^I. >> >> If the data was transferred from Windows to UNIX, these types of >> non-visible characters are common. >> >> I hope that this ed. >> >> Old Man > > > I omitted another tool that you need to make this evaluation. > > The command, "man ascii", shows the valid and visible characters, as > well as the non-visible with their associated hex and such equivalent. > All characters from hex 00 to 1F are non-visible; these are the ones > that the od and cat commands will to see. If you have a problem > value in test.data, it is likely in that range. > > Old Man How-to-make-your-own-file-without-newline-at-the-end: $ echo "line1" > file $ echo "line2" >> file $ echo -n "line3" >> file and then $ while read line; do echo $line; done < file -- Best regards | Monica Lewinsky's X-Boyfriend's Cyrus | Wife for President |
|
|
|
#10 |
|
Messages: n/a
Hébergeur: |
On Mar 20, 6:04 am, Viatly <postoronnim...@mail.ru> wrote:
> On 20 mrt, 10:49, Stephane CHAZELAS <this.addr...@is.invalid> wrote: > > > > > 2008-03-20, 02:38(-07), Viatly: > > > > The content of test.data is > > > > -bash-3.2# cat test.data > > > line1 > > > line2 > > > line3 > > > It is important that there is no trailing EOL at the end of file. > > > I read test.data with the following script: > > > > -bash-3.2# cat test.sh > > > #!/bin/bash > > > while read line > > > do > > > echo "$line" > > > done < "test.data" > > > > -bash-3.2# ./test.sh > > > line1 > > > line2 > > > > That is, "line3" is lost. > > > Questions: > > > 1. What is a nice way to fix this code? > > > The nicest way is to avoid while read loops in shells. > > Why? In fact what I need is > while read line > do > # do some processing > done < "test.data" > > > > > > > > 2. The code of the script pretends to be a stdandard way if reading > > > text file line-by-line because it is recommended by respected > > > resources (e.g.http://bash-hackers.org/wiki/doku.php/tests/bashfaq). > > > Provided the code is ok, does it mean that a typical text file in Unix/ > > > Linux should have EOL at the end? > > > "read" returns false if a full line is not read, but $line will > > contain those extra characters after the last NL character > > > while IFS= read -r line; do > > printf '%s\n' "$line" > > done < test.data > > printf %s "$line" > > > Will do it, but > > > cat test.data > > Do you mean: > > for line in `cat test.data`; > do > echo $line; > done > > In this case if the line contains words separated by a whitespace, > this whitespace will be used as a separator. Which I do not need. > > > > > Note that a file that doesn't end in a NL character is not a > > text file as per the POSIX definition of a text file. (that > > means for instance that the behavior of a text utility > > processing it is unspecified most of the time). > > This is good argument. So, the problem is not in code, but rather in > ill-formed text file. Right? > > > -- > > Stéphane Hi Viatly, Try using this.. OLDIFS=$IFS IFS="|" for line in `cat test.data`; do echo $line; done IFS=$OLDIFS This is simple and crisp. Rgds Gaurav S |
|
|
|
#11 |
|
Messages: n/a
Hébergeur: |
2008-04-25, 08:03(-07), Guru:
[...] > Try using this.. > > OLDIFS=$IFS > IFS="|" > for line in `cat test.data`; > do > echo $line; > done > IFS=$OLDIFS > > This is simple and crisp. [...] And you oversaw three more problems: empty lines are discarded, try it with a line containing "*", and if IFS was previously unset, it becomes set to the empty string which has a totally different meaning. Using loops in shells always leads to this kind of corner case problems. Shells are not meant to be used like that, avoid loops to process text. -- Stéphane |
|
![]() |
| Outils de la discussion | |
|
|