|
|
|
|
||||||
| comp.unix.shell Using and programming the Unix shell. |
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Hi all,
I'm writing a small shell program that takes a file as input and print result in another file. I'm searching for command that can extract data which is between two asterisks. The next file shows what I mean: ----------------------------------- in this file *I want to print only this* and *this* Linux is the *best operating* system! end of *this* file ----------------------------------------- It's clear that a search for character * must be done in each line contained in file , when reached, data is written in the another file until it reaches the second asterisk, then it stops to write data and pass to the next line and so on. |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
On Monday 28 April 2008 15:07, Nezhate wrote:
> Hi all, > I'm writing a small shell program that takes a file as input and print > result in another file. > I'm searching for command that can extract data which is between two > asterisks. > The next file shows what I mean: > > ----------------------------------- > in this file *I want to print only this* and *this* > Linux is the *best operating* system! > end of *this* file > ----------------------------------------- > > It's clear that a search for character * must be done in each line > contained in file , when reached, data is written in the another file > until it reaches the second asterisk, then it stops to write data and > pass to the next line and so on. A similar problem was discussed on comp.lang.awk some days ago. To summarize, you basically want this: awk -v RS='*' '!(NR%2)' yourfile -- All the commands are tested with bash and GNU tools, so they may use nonstandard features. I try to mention when something is nonstandard (if I'm aware of that), but I may miss something. Corrections are welcome. |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
2008-04-28, 06:07(-07), Nezhate:
> Hi all, > I'm writing a small shell program that takes a file as input and print > result in another file. > I'm searching for command that can extract data which is between two > asterisks. > The next file shows what I mean: > > ----------------------------------- > in this file *I want to print only this* and *this* > Linux is the *best operating* system! > end of *this* file > ----------------------------------------- > > It's clear that a search for character * must be done in each line > contained in file , when reached, data is written in the another file > until it reaches the second asterisk, then it stops to write data and > pass to the next line and so on. awk -F'[*]' 'NF>2 {for (i = 1; i < int((NF+1)/2); i++) print $(i*2)}' or: perl -lne 'print for /\*(.*?)\*/g' or sed -n 's/[^*]*\*\([^*]*\)\*/\1\ /g; s/\(.*\)\n.*/\1/p' -- Stéphane |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
Shell solves:
$ cat s Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done $ cat file in this file *I want to print only this* and *this* Linux is the *best operating* system! end of *this* file $ ./s<file>fileout $ cat fileout I want to print only this this best operating this Nezhate wrote: > Hi all, > I'm writing a small shell program that takes a file as input and print > result in another file. > I'm searching for command that can extract data which is between two > asterisks. > The next file shows what I mean: > > ----------------------------------- > in this file *I want to print only this* and *this* > Linux is the *best operating* system! > end of *this* file > ----------------------------------------- > > It's clear that a search for character * must be done in each line > contained in file , when reached, data is written in the another file > until it reaches the second asterisk, then it stops to write data and > pass to the next line and so on. |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
On Mon, 28 Apr 2008 06:07:31 -0700, Nezhate wrote:
> Hi all, > I'm writing a small shell program that takes a file as input and print > result in another file. > I'm searching for command that can extract data which is between two > asterisks. > The next file shows what I mean: > > ----------------------------------- > in this file *I want to print only this* and *this* Linux is the *best > operating* system! end of *this* file > ----------------------------------------- > > It's clear that a search for character * must be done in each line > contained in file , when reached, data is written in the another file > until it reaches the second asterisk, then it stops to write data and > pass to the next line and so on. What needs to happen with malformed input? Specifically, what needs to happen with a line that has an odd number of *'s? |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
On Apr 29, 5:32 am, mop2 <mop2bky4mz5tyjwa8ersp7hrg5u...@gmail.com>
wrote: > Shell solves: > > $ cat s > Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done > > $ cat file > in this file *I want to print only this* and *this* > Linux is the *best operating* system! > end of *this* file > > $ ./s<file>fileout > > $ cat fileout > I want to print only this > this > best operating > this > mop2: Thanks for your . but can you explain me what the next line do (I'm newbie to shell programming)I understand that ? Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
2008-04-28, 23:08(-07), Nezhate:
[...] > mop2: Thanks for your . but can you explain me what the next line > do (I'm newbie to shell programming)I understand that ? > Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done (note that it is non-standard and the behavior varies amongst the shells that support read -d (zsh, ksh93 and bash)). It is actually very complicated. The "read -d\*" is a command that returns true (with a zero exit status) if it finds an unescaped "*" in its standard input. It will store in the $REPLY variable the sequence of characters read up to but not including that unescaped "*", but not before having done a few transformations on it: - except for bash, the leading and trailing blank characters (space, tab or newline) will be removed as long as those blank characters also happen to be present (once and only once for zsh) in the $IFS special parameter or if $IFS is unset - except for bash again, the escaped "*"s will be removed. - The other "\x" escaped x characters will be changed to "x". [ $Y ] is also very complicated. It calls the "[" command with a number of arguments resulting from the expansion of $Y and "]". As $Y is not quoted, in all shells but zsh when not in sh/ksh emulation, the expansion involves a very complex process. The content of the $Y is first split according to the list of characters contained in the $IFS special parameter (that part of the process is generally called "word splitting"). The rules for that vary from shell to shell, but with the default value of $IFS, $Y will be considered as a list of blank separated words (so, $Y will be split according to blanks and leading and trailing blanks will be removed) Then (again except with zsh), for those words, the shell will attempt to consider each of them as a wildcard pattern and expand them to a list of matching file names (relative to the current directory) (that's the process generally called "filename generation" or "globbing"). A last thing and that also applies to zsh, if the Y parameter is empty, $Y will expand to no argument at all as opposed to one empty argument (that's a process sometimes called "empties removal"). Here, as it happens, the Y variable can only contain either nothing or "1", so unless the $IFS character contains "1" $Y will expand into either no argument at all or one argument being "1". The "[" command when called with the only 2 arguments "[" and "]" is a command that returns "false" as a special case (no test expression provided). "[" "1" "]" returns true on the ground that "1" is not an empty string. I think the OP's idea was to test whether the $Y string was empty or not. So it was a very convoluted and dangerous way to write [ -n "$Y" ] or [ "$Y" != "" ]. echo "$REPLY" again, is a command whose behavior varies a lot between shell and even for a same shell depending on the environment or the way the shell was compiled. That command will display the content of the $REPLY variable on stdout followed by a newline character unless (depending on the shell/echo implementation) $REPLY is one of "-" "-e", "-E", "-n", "-ne", "-nn"... or contains backslash characters in which case all sorts of things may happen. So, that command is something that may do what you want in an inefficient way probably for most inputs but will give unexpected results (varying from shell to shell) for some other inputs and is to my mind an improper usage of the shell. -- Stéphane |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
On Apr 29, 11:25 am, Stephane CHAZELAS <this.addr...@is.invalid>
wrote: > 2008-04-28, 23:08(-07), Nezhate: > [...] > > > mop2: Thanks for your . but can you explain me what the next line > > do (I'm newbie to shell programming)I understand that ? > > Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done > > (note that it is non-standard and the behavior varies amongst > the shells that support read -d (zsh, ksh93 and bash)). > > It is actually very complicated. > > The "read -d\*" is a command that returns true (with a zero exit > status) if it finds an unescaped "*" in its standard input. > > It will store in the $REPLY variable the sequence of characters > read up to but not including that unescaped "*", but not before > having done a few transformations on it: > > - except for bash, the leading and trailing blank characters > (space, tab or newline) will be removed as long as those blank > characters also happen to be present (once and only once for > zsh) in the $IFS special parameter or if $IFS is unset > - except for bash again, the escaped "*"s will be removed. > - The other "\x" escaped x characters will be changed to "x". > > [ $Y ] is also very complicated. > > It calls the "[" command with a number of arguments resulting > from the expansion of $Y and "]". > > As $Y is not quoted, in all shells but zsh when not in sh/ksh > emulation, the expansion involves a very complex process. The > content of the $Y is first split according to the list of > characters contained in the $IFS special parameter (that part of > the process is generally called "word splitting"). The rules for > that vary from shell to shell, but with the default value of > $IFS, $Y will be considered as a list of blank separated words > (so, $Y will be split according to blanks and leading and > trailing blanks will be removed) > > Then (again except with zsh), for those words, the shell will > attempt to consider each of them as a wildcard pattern and > expand them to a list of matching file names (relative to the > current directory) (that's the process generally called > "filename generation" or "globbing"). > > A last thing and that also applies to zsh, if the Y parameter is > empty, $Y will expand to no argument at all as opposed to one > empty argument (that's a process sometimes called "empties > removal"). > > Here, as it happens, the Y variable can only contain either > nothing or "1", so unless the $IFS character contains "1" $Y > will expand into either no argument at all or one argument being > "1". > > The "[" command when called with the only 2 arguments "[" and > "]" is a command that returns "false" as a special case (no test > expression provided). "[" "1" "]" returns true on the ground > that "1" is not an empty string. > > I think the OP's idea was to test whether the $Y string was > empty or not. So it was a very convoluted and dangerous way to > write [ -n "$Y" ] or [ "$Y" != "" ]. > > echo "$REPLY" > > again, is a command whose behavior varies a lot between shell > and even for a same shell depending on the environment or the > way the shell was compiled. > > That command will display the content of the $REPLY variable on > stdout followed by a newline character unless (depending on the > shell/echo implementation) $REPLY is one of "-" "-e", "-E", > "-n", "-ne", "-nn"... or contains backslash characters in which > case all sorts of things may happen. > > So, that command is something that may do what you want in an > inefficient way probably for most inputs but will give > unexpected results (varying from shell to shell) for some other > inputs and is to my mind an improper usage of the shell. > > -- > St�phane Un grand merci a Stephane Chazelas pour l'explication ! |
|
|
|
#9 |
|
Messages: n/a
Hébergeur: |
Thanks CHAZELAS!
Very good explanation. I like read his posts, excelent for learning about most unixes peculiarities. My focus will never be portability and my environments are always under my control. My universe is very limited and pure shell is always my start point. I don't see problems with escaped "*" in standard input in that case, for bash at lest. For small files I think the shell will be more efficient than the use of a call to an external tool that isn't in the cache. |
|
|
|
#10 |
|
Messages: n/a
Hébergeur: |
On 4/29/2008 6:45 AM, mop2 wrote: > Thanks CHAZELAS! > Very good explanation. > > I like read his posts, excelent for learning about most unixes > peculiarities. > My focus will never be portability and my environments are always > under my control. > My universe is very limited and pure shell is always my start point. Serious question - to me, the shell is an environment from which to call appropriate tools in a specific order to get a job done, so what does "pure shell" mean to you? > I don't see problems with escaped "*" in standard input in that case, > for bash at lest. > For small files I think the shell will be more efficient than the use > of a call to an external tool that isn't in the cache. Whether that's true or not, for small files efficiency doesn't matter so is there any other reason to prefer: Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done < file over: awk -v RS='*' '!(NR%2)' file Regards, Ed. |
|
|
|
#11 |
|
Messages: n/a
Hébergeur: |
2008-04-29, 04:45(-07), mop2:
> Thanks CHAZELAS! > Very good explanation. > > I like read his posts, excelent for learning about most unixes > peculiarities. > My focus will never be portability and my environments are always > under my control. > My universe is very limited and pure shell is always my start point. > I don't see problems with escaped "*" in standard input in that case, > for bash at lest. ~/install$ printf '%s\n' 'foo\*bar\*baz' | bash -c 'read -d\*; printf "%s: <%s>\n" "$?" "$REPLY"' 1: <foo*bar*baz > ~/install$ printf '%s\n' 'foo\*bar*baz' | bash -c 'read -d\*; printf "%s: <%s>\n" "$?" "$REPLY"' 0: <foo*bar> ~/install$ printf '%s\n' 'foo\*bar*baz' | ksh -c 'read -d\*; printf "%s: <%s>\n" "$?" "$REPLY"' 0: <foobar> ~/install$ printf '%s\n' 'foo\*bar*baz' | zsh -c 'read -d\*; printf "%s: <%s>\n" "$?" "$REPLY"' 0: <foobar> In short, you need the "-r" option, and you need to remove white spaces from $IFS. One you've done that and replaced "echo" with "print", your code will become illegible. > For small files I think the shell will be more efficient than the use > of a call to an external tool that isn't in the cache. In which case, you'll gain a few microseconds. For large files you may end up wasting several seconds or minutes. And there's also the time spent deciphering the code and the time spent debugging it, and the time rewriting it when porting to a system that doesn't have the same shell or not the same version. The "pure shell" thing is a nonsense to my mind. A shell is *the* tool designed to run commands, that's what it's been made for. Trying to have it not execute commands is a bit like trying to have rm not remove files. -- Stéphane |
|
|
|
#12 |
|
Messages: n/a
Hébergeur: |
Hi Ed:
Q1 For me "pure shell" is the use of the shell exclusively, without external programs. Q2 Using the small file posted as exampe: Shell bash: $ time { Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done < file;} real 0m0.001s user 0m0.004s sys 0m0.000s I don't use awk for myself. The first call: $ time awk -v RS='*' '!(NR%2)' file real 0m0.051s user 0m0.000s sys 0m0.000s The next: $ time awk -v RS='*' '!(NR%2)' file real 0m0.006s user 0m0.000s sys 0m0.004s I see the things in this way. The relevance of all this is a question of view point and is very personal. Ed Morton wrote: > On 4/29/2008 6:45 AM, mop2 wrote: > > Thanks CHAZELAS! > > Very good explanation. > > > > I like read his posts, excelent for learning about most unixes > > peculiarities. > > My focus will never be portability and my environments are always > > under my control. > > My universe is very limited and pure shell is always my start point. > > Serious question - to me, the shell is an environment from which to call > appropriate tools in a specific order to get a job done, so what does "pure > shell" mean to you? > > > I don't see problems with escaped "*" in standard input in that case, > > for bash at lest. > > For small files I think the shell will be more efficient than the use > > of a call to an external tool that isn't in the cache. > > Whether that's true or not, for small files efficiency doesn't matter so is > there any other reason to prefer: > > Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done < file > > over: > > awk -v RS='*' '!(NR%2)' file > > Regards, > > Ed. |
|
|
|
#13 |
|
Messages: n/a
Hébergeur: |
On 4/29/2008 7:35 AM, mop2 wrote:
> Ed Morton wrote: > >>On 4/29/2008 6:45 AM, mop2 wrote: >> >>>Thanks CHAZELAS! >>>Very good explanation. >>> >>>I like read his posts, excelent for learning about most unixes >>>peculiarities. >>>My focus will never be portability and my environments are always >>>under my control. >>>My universe is very limited and pure shell is always my start point. >> >>Serious question - to me, the shell is an environment from which to call >>appropriate tools in a specific order to get a job done, so what does "pure >>shell" mean to you? >> >> >>>I don't see problems with escaped "*" in standard input in that case, >>>for bash at lest. >>>For small files I think the shell will be more efficient than the use >>>of a call to an external tool that isn't in the cache. >> >>Whether that's true or not, for small files efficiency doesn't matter so is >>there any other reason to prefer: >> >> Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done < file >> >>over: >> >> awk -v RS='*' '!(NR%2)' file >> >>Regards, >> >> Ed. > > Hi Ed: > > Q1 > For me "pure shell" is the use of the shell exclusively, without > external programs. So, if you need to find out how many characters are in a file, you'd do something other than "wc -c"? I don't mean to preach, it's just that I find trying to avoid external commands less easy to understand than trying to avoid cars in favor of horse-and-cart. Perhaps it's a new paradigm - Amish Programming ;-). > Q2 > Using the small file posted as exampe: > Shell bash: > $ time { Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done < > file;} > real 0m0.001s > user 0m0.004s > sys 0m0.000s > > I don't use awk for myself. > The first call: > $ time awk -v RS='*' '!(NR%2)' file > real 0m0.051s > user 0m0.000s > sys 0m0.000s > The next: > $ time awk -v RS='*' '!(NR%2)' file > real 0m0.006s > user 0m0.000s > sys 0m0.004s My point was that efficiency isn't a concern for small files since, as you show above, the script runs in the blink of an eye either way, I was just wondering if there was any reason other than efficiency to avoid external commands. > I see the things in this way. > The relevance of all this is a question of view point and is very > personal. > OK, thanks for explaining. Obviously, you don't have to justify your view point to me - I was just curious... Ed. |
|
|
|
#14 |
|
Messages: n/a
Hébergeur: |
2008-04-29, 05:35(-07), mop2:
[...] > Q2 > Using the small file posted as exampe: > Shell bash: > $ time { Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done < > file;} > real 0m0.001s > user 0m0.004s > sys 0m0.000s > > I don't use awk for myself. > The first call: > $ time awk -v RS='*' '!(NR%2)' file > real 0m0.051s > user 0m0.000s > sys 0m0.000s [...] Note that the above shows either that those timings cannot be trusted or that the awk solution uses less CPU time (0ms!) than the shell-only solution (4ms)! $ yes 'foo * bar * baz' | head -100000 | time bash -c 'Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done' > /dev/null real 0m7.33s user 0m6.15s sys 0m1.17s $ yes 'foo * bar * baz' | head -100000 | time bash -c "awk -v RS='*' '!(NR%2)'" > /dev/null real 0m0.15s user 0m0.15s sys 0m0.01s -- Stéphane |
|
|
|
#15 |
|
Messages: n/a
Hébergeur: |
Stephane CHAZELAS wrote: > 2008-04-29, 04:45(-07), mop2: > > Thanks CHAZELAS! > > Very good explanation. > > > > I like read his posts, excelent for learning about most unixes > > peculiarities. > > My focus will never be portability and my environments are always > > under my control. > > My universe is very limited and pure shell is always my start point. > > I don't see problems with escaped "*" in standard input in that case, > > for bash at lest. > > ~/install$ printf '%s\n' 'foo\*bar\*baz' | bash -c 'read -d\*; printf "%s:<%s>\n" "$?" "$REPLY"' > 1: <foo*bar*baz > > > ~/install$ printf '%s\n' 'foo\*bar*baz' | bash -c 'read -d\*; printf "%s: <%s>\n" "$?" "$REPLY"' > 0: <foo*bar> > ~/install$ printf '%s\n' 'foo\*bar*baz' | ksh -c 'read -d\*; printf "%s: <%s>\n" "$?" "$REPLY"' > 0: <foobar> > ~/install$ printf '%s\n' 'foo\*bar*baz' | zsh -c 'read -d\*; printf "%s: <%s>\n" "$?" "$REPLY"' > 0: <foobar> > > In short, you need the "-r" option, and you need to remove white > spaces from $IFS. One you've done that and replaced "echo" with > "print", your code will become illegible. Thanks, that is true, the "-r" option is needed for escaped "*". With the correction in my proposed code: bash$ printf '%s\n' 'foo\*bar\*baz'|\ { Y=;while read -r -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done;} bar\ bash$ I don't say my solution is the best, it is convenient for me. For others can be just an impracticable or limited solution. I have more facility with bash than with ksh,...,sed, awk, perl,etc. So, for me, do programming for bash is faster and easier. > > > For small files I think the shell will be more efficient than the use > > of a call to an external tool that isn't in the cache. > > In which case, you'll gain a few microseconds. For large files > you may end up wasting several seconds or minutes. > > And there's also the time spent deciphering the code and the > time spent debugging it, and the time rewriting it when porting > to a system that doesn't have the same shell or not the same > version. Yes, but here my preference and experience can me over other options. > > The "pure shell" thing is a nonsense to my mind. A shell is > *the* tool designed to run commands, that's what it's been made > for. Trying to have it not execute commands is a bit like trying > to have rm not remove files. > > -- > St�phane |
|
|
|
#16 |
|
Messages: n/a
Hébergeur: |
On 29 Apr., 15:19, Stephane CHAZELAS <this.addr...@is.invalid> wrote:
> 2008-04-29, 05:35(-07), mop2: > > real 0m0.001s > > user 0m0.004s > > sys 0m0.000s > > Note that the above shows either that those timings cannot be > trusted or that the awk solution uses less CPU time (0ms!) than > the shell-only solution (4ms)! In the past decades I've always thought (and haven't ever observed it differently) that the 'real' value is at least as large as max('user','sys') or differs at best only in the least significant digit if comparing it to 'user'+'sys'. And the man pages seem to confirm that view. How can 'real' be 1ms if 'user' is around 4ms? Janis |
|
|
|
#17 |
|
Messages: n/a
Hébergeur: |
2008-04-29, 06:38(-07), Janis:
> On 29 Apr., 15:19, Stephane CHAZELAS <this.addr...@is.invalid> wrote: >> 2008-04-29, 05:35(-07), mop2: >> > real 0m0.001s >> > user 0m0.004s >> > sys 0m0.000s >> >> Note that the above shows either that those timings cannot be >> trusted or that the awk solution uses less CPU time (0ms!) than >> the shell-only solution (4ms)! > > In the past decades I've always thought (and haven't ever observed > it differently) that the 'real' value is at least as large as > max('user','sys') or differs at best only in the least significant > digit if comparing it to 'user'+'sys'. And the man pages seem to > confirm that view. How can 'real' be 1ms if 'user' is around 4ms? [...] "real" is <end-time> - <start-time>, which on a system running more than one process and one or several CPU has little correlation with the number of CPU cycles that are needed to execute the corresponding code. You have to consider the time used up by other processes, the time waiting for resources, and the fact that several processors might run concurrently to perform the task. All you are guaranteed is that: real >= (user + sys) / ncpus -- Stéphane |
|
|
|
#18 |
|
Messages: n/a
Hébergeur: |
Ed Morton wrote: > On 4/29/2008 7:35 AM, mop2 wrote: > > Ed Morton wrote: > > > >>On 4/29/2008 6:45 AM, mop2 wrote: > >> > >>>Thanks CHAZELAS! > >>>Very good explanation. > >>> > >>>I like read his posts, excelent for learning about most unixes > >>>peculiarities. > >>>My focus will never be portability and my environments are always > >>>under my control. > >>>My universe is very limited and pure shell is always my start point. > >> > >>Serious question - to me, the shell is an environment from which to call > >>appropriate tools in a specific order to get a job done, so what does "pure > >>shell" mean to you? > >> > >> > >>>I don't see problems with escaped "*" in standard input in that case, > >>>for bash at lest. > >>>For small files I think the shell will be more efficient than the use > >>>of a call to an external tool that isn't in the cache. > >> > >>Whether that's true or not, for small files efficiency doesn't matter so is > >>there any other reason to prefer: > >> > >> Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done < file > >> > >>over: > >> > >> awk -v RS='*' '!(NR%2)' file > >> > >>Regards, > >> > >> Ed. > > > > Hi Ed: > > > > Q1 > > For me "pure shell" is the use of the shell exclusively, without > > external programs. > > So, if you need to find out how many characters are in a file, you'd do > something other than "wc -c"? I don't mean to preach, it's just that I find > trying to avoid external commands less easy to understand than trying to avoid > cars in favor of horse-and-cart. Perhaps it's a new paradigm - Amish Programming > ;-). Probably no, but for length in a variable content, perhaps... I am not a fundamentalist. This is my view today for that case. Tomorrow it can be different because i'm learning a bit every day. ![]() > > > Q2 > > Using the small file posted as exampe: > > Shell bash: > > $ time { Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done < > > file;} > > real 0m0.001s > > user 0m0.004s > > sys 0m0.000s > > > > I don't use awk for myself. > > The first call: > > $ time awk -v RS='*' '!(NR%2)' file > > real 0m0.051s > > user 0m0.000s > > sys 0m0.000s > > The next: > > $ time awk -v RS='*' '!(NR%2)' file > > real 0m0.006s > > user 0m0.000s > > sys 0m0.004s > > My point was that efficiency isn't a concern for small files since, as you show > above, the script runs in the blink of an eye either way, I was just wondering > if there was any reason other than efficiency to avoid external commands. I prefer know much from a thing than a bit from more things (sorry, i don't know how speak this in english) > > > I see the things in this way. > > The relevance of all this is a question of view point and is very > > personal. > > > > OK, thanks for explaining. Obviously, you don't have to justify your view point > to me - I was just curious... I also like hear details from others to try understand because they have their opinions. ![]() > > Ed. |
|
|
|
#19 |
|
Messages: n/a
Hébergeur: |
On 29 Apr., 15:50, Stephane CHAZELAS <this.addr...@is.invalid> wrote:
> > "real" is <end-time> - <start-time>, which on a system running > more than one process and one or several CPU has little > correlation with the number of CPU cycles that are needed to > execute the corresponding code. You have to consider the time > used up by other processes, the time waiting for resources, and > the fact that several processors might run concurrently to > perform the task. > > All you are guaranteed is that: > > real >= (user + sys) / ncpus Ah, thanks. So user and sys are actually the respective accumulated CPU seconds of all involved CPUs. (Wasn't aware of that; I guess it's time to switch to a state-of-the-art multi-core platform.) Janis |
|
|
|
#20 |
|
Messages: n/a
Hébergeur: |
On 4/29/2008 8:50 AM, Stephane CHAZELAS wrote:
> 2008-04-29, 06:38(-07), Janis: > >>On 29 Apr., 15:19, Stephane CHAZELAS <this.addr...@is.invalid> wrote: >> >>>2008-04-29, 05:35(-07), mop2: >>> >>>>real 0m0.001s >>>>user 0m0.004s >>>>sys 0m0.000s >>> >>>Note that the above shows either that those timings cannot be >>>trusted or that the awk solution uses less CPU time (0ms!) than >>>the shell-only solution (4ms)! >> >>In the past decades I've always thought (and haven't ever observed >>it differently) that the 'real' value is at least as large as >>max('user','sys') or differs at best only in the least significant >>digit if comparing it to 'user'+'sys'. And the man pages seem to >>confirm that view. How can 'real' be 1ms if 'user' is around 4ms? > > [...] > > "real" is <end-time> - <start-time>, which on a system running > more than one process and one or several CPU has little > correlation with the number of CPU cycles that are needed to > execute the corresponding code. You have to consider the time > used up by other processes, the time waiting for resources, and > the fact that several processors might run concurrently to > perform the task. > > All you are guaranteed is that: > > real >= (user + sys) / ncpus > Just curious - is there a way to specify that a given process must be run on just one processor? Seems like the "time" output for "user" and "sys" might be more useful in that case if you want to compare apples... Ed. Ed. |
|
|
|
#21 |
|
Messages: n/a
Hébergeur: |
2008-04-29, 09:41(-05), Ed Morton:
[...] > Just curious - is there a way to specify that a given process must be run on > just one processor? Seems like the "time" output for "user" and "sys" might be > more useful in that case if you want to compare apples... [...] In both cases (while read loop vs awk), there was only one process running at a time, so it shouldn't make a big difference. In any case, the "real" timing has little significance wrt to measuring performance as it may take into account the time spent to run other processes. The user+sys is significant in that it's the quantity of CPU cycles that are needed to perform the task (note that the amount of work that has to be done may vary from one run to the next, in that you may or may not have to move pages of code or data around in between cache, memory, permanent/network... storage). It shouldn't change significantly if you assign all the threads to a same processor or several. Of course though, that doesn't take into account the time waiting for IO like for instance when loading the executables/libraries/data into memory. -- Stéphane |
|
|
|
#22 |
|
Messages: n/a
Hébergeur: |
On 4/29/2008 9:51 AM, Stephane CHAZELAS wrote: > 2008-04-29, 09:41(-05), Ed Morton: > [...] > >>Just curious - is there a way to specify that a given process must be run on >>just one processor? Seems like the "time" output for "user" and "sys" might be >>more useful in that case if you want to compare apples... > > [...] > > In both cases (while read loop vs awk), there was only one > process running at a time, so it shouldn't make a big > difference. > > In any case, the "real" timing has little significance wrt to > measuring performance as it may take into account the time spent > to run other processes. The user+sys is significant in that it's > the quantity of CPU cycles that are needed to perform the task > (note that the amount of work that has to be done may vary from > one run to the next, in that you may or may not have to move > pages of code or data around in between cache, memory, > permanent/network... storage). > > It shouldn't change significantly if you assign all the > threads to a same processor or several. Of course though, that > doesn't take into account the time waiting for IO like for > instance when loading the executables/libraries/data into > memory. > Oh, I thought there might be some inter-processor communication and scheduling performance impact in the multi-processor case that would have a non-negligible impact the user+sys counts. Ed |
|
|
|
#23 |
|
Messages: n/a
Hébergeur: |
Stephane CHAZELAS wrote: > 2008-04-29, 05:35(-07), mop2: > [...] > > Q2 > > Using the small file posted as exampe: > > Shell bash: > > $ time { Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done < > > file;} > > real 0m0.001s > > user 0m0.004s > > sys 0m0.000s > > > > I don't use awk for myself. > > The first call: > > $ time awk -v RS='*' '!(NR%2)' file > > real 0m0.051s > > user 0m0.000s > > sys 0m0.000s > [...] > > Note that the above shows either that those timings cannot be > trusted or that the awk solution uses less CPU time (0ms!) than > the shell-only solution (4ms)! > > $ yes 'foo * bar * baz' | head -100000 | time bash -c 'Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done' > /dev/null > > real 0m7.33s > user 0m6.15s > sys 0m1.17s > $ yes 'foo * bar * baz' | head -100000 | time bash -c "awk -v RS='*' '!(NR%2)'" > /dev/null > > real 0m0.15s > user 0m0.15s > sys 0m0.01s > Yes, for 100k lines as i said, my code isn't a efficient option. But, for me, for a eventual use it is more eficient than awk, considering CODER time. ![]() I know nothing about awk and i don't have large amount of data to process. Learning english is much more important for me, for example. As aditional reference for 10, 100 and 100k lines with Stephane's two codes (bash/awk): bash$ cat s #function "t" is because problems here with command "time" t(){ [ $T ]&&echo `date +%s.%N`-$T|bc&&T=||T=`date +%s.%N`;} T= for L in 10 100 100000;do echo LINES=$L t yes 'foo * bar * baz' | head -$L | bash -c 'Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done' > /dev/null t t yes 'foo * bar * baz' | head -$L | bash -c "awk -v RS='*' '!(NR%2)'" > /dev/null t done bash$ . ./s LINES=10 .023761324 .024364479 LINES=100 .031842460 .024581416 LINES=100000 8.672900116 .226956947 With both programs in cache, bash is faster only for few lines, as expected. > > -- > St�phane |