PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Forums Hébergement > Forum Serveur - Sécurité et techniques > comp.unix.shell > commands to manipulate files
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
comp.unix.shell Using and programming the Unix shell.

commands to manipulate files

Réponse
 
LinkBack Outils de la discussion
Vieux 28/04/2008, 14h07   #1
Nezhate
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut commands to manipulate files

Hi all,
I'm writing a small shell program that takes a file as input and print
result in another file.
I'm searching for command that can extract data which is between two
asterisks.
The next file shows what I mean:

-----------------------------------
in this file *I want to print only this* and *this*
Linux is the *best operating* system!
end of *this* file
-----------------------------------------

It's clear that a search for character * must be done in each line
contained in file , when reached, data is written in the another file
until it reaches the second asterisk, then it stops to write data and
pass to the next line and so on.
  Réponse avec citation
Vieux 28/04/2008, 14h33   #2
pk
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: commands to manipulate files

On Monday 28 April 2008 15:07, Nezhate wrote:

> Hi all,
> I'm writing a small shell program that takes a file as input and print
> result in another file.
> I'm searching for command that can extract data which is between two
> asterisks.
> The next file shows what I mean:
>
> -----------------------------------
> in this file *I want to print only this* and *this*
> Linux is the *best operating* system!
> end of *this* file
> -----------------------------------------
>
> It's clear that a search for character * must be done in each line
> contained in file , when reached, data is written in the another file
> until it reaches the second asterisk, then it stops to write data and
> pass to the next line and so on.


A similar problem was discussed on comp.lang.awk some days ago.
To summarize, you basically want this:

awk -v RS='*' '!(NR%2)' yourfile

--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.
  Réponse avec citation
Vieux 28/04/2008, 14h37   #3
Stephane CHAZELAS
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: commands to manipulate files

2008-04-28, 06:07(-07), Nezhate:
> Hi all,
> I'm writing a small shell program that takes a file as input and print
> result in another file.
> I'm searching for command that can extract data which is between two
> asterisks.
> The next file shows what I mean:
>
> -----------------------------------
> in this file *I want to print only this* and *this*
> Linux is the *best operating* system!
> end of *this* file
> -----------------------------------------
>
> It's clear that a search for character * must be done in each line
> contained in file , when reached, data is written in the another file
> until it reaches the second asterisk, then it stops to write data and
> pass to the next line and so on.


awk -F'[*]' 'NF>2 {for (i = 1; i < int((NF+1)/2); i++) print $(i*2)}'

or:

perl -lne 'print for /\*(.*?)\*/g'

or

sed -n 's/[^*]*\*\([^*]*\)\*/\1\
/g; s/\(.*\)\n.*/\1/p'


--
Stéphane
  Réponse avec citation
Vieux 29/04/2008, 02h32   #4
mop2
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: commands to manipulate files

Shell solves:

$ cat s
Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done

$ cat file
in this file *I want to print only this* and *this*
Linux is the *best operating* system!
end of *this* file

$ ./s<file>fileout

$ cat fileout
I want to print only this
this
best operating
this



Nezhate wrote:
> Hi all,
> I'm writing a small shell program that takes a file as input and print
> result in another file.
> I'm searching for command that can extract data which is between two
> asterisks.
> The next file shows what I mean:
>
> -----------------------------------
> in this file *I want to print only this* and *this*
> Linux is the *best operating* system!
> end of *this* file
> -----------------------------------------
>
> It's clear that a search for character * must be done in each line
> contained in file , when reached, data is written in the another file
> until it reaches the second asterisk, then it stops to write data and
> pass to the next line and so on.

  Réponse avec citation
Vieux 29/04/2008, 05h15   #5
Dan Stromberg
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: commands to manipulate files

On Mon, 28 Apr 2008 06:07:31 -0700, Nezhate wrote:

> Hi all,
> I'm writing a small shell program that takes a file as input and print
> result in another file.
> I'm searching for command that can extract data which is between two
> asterisks.
> The next file shows what I mean:
>
> -----------------------------------
> in this file *I want to print only this* and *this* Linux is the *best
> operating* system! end of *this* file
> -----------------------------------------
>
> It's clear that a search for character * must be done in each line
> contained in file , when reached, data is written in the another file
> until it reaches the second asterisk, then it stops to write data and
> pass to the next line and so on.


What needs to happen with malformed input? Specifically, what needs to
happen with a line that has an odd number of *'s?

  Réponse avec citation
Vieux 29/04/2008, 07h08   #6
Nezhate
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: commands to manipulate files

On Apr 29, 5:32 am, mop2 <mop2bky4mz5tyjwa8ersp7hrg5u...@gmail.com>
wrote:
> Shell solves:
>
> $ cat s
> Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done
>
> $ cat file
> in this file *I want to print only this* and *this*
> Linux is the *best operating* system!
> end of *this* file
>
> $ ./s<file>fileout
>
> $ cat fileout
> I want to print only this
> this
> best operating
> this
>


mop2: Thanks for your . but can you explain me what the next line
do (I'm newbie to shell programming)I understand that ?
Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done
  Réponse avec citation
Vieux 29/04/2008, 08h25   #7
Stephane CHAZELAS
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: commands to manipulate files

2008-04-28, 23:08(-07), Nezhate:
[...]
> mop2: Thanks for your . but can you explain me what the next line
> do (I'm newbie to shell programming)I understand that ?
> Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done


(note that it is non-standard and the behavior varies amongst
the shells that support read -d (zsh, ksh93 and bash)).

It is actually very complicated.

The "read -d\*" is a command that returns true (with a zero exit
status) if it finds an unescaped "*" in its standard input.

It will store in the $REPLY variable the sequence of characters
read up to but not including that unescaped "*", but not before
having done a few transformations on it:

- except for bash, the leading and trailing blank characters
(space, tab or newline) will be removed as long as those blank
characters also happen to be present (once and only once for
zsh) in the $IFS special parameter or if $IFS is unset
- except for bash again, the escaped "*"s will be removed.
- The other "\x" escaped x characters will be changed to "x".

[ $Y ] is also very complicated.

It calls the "[" command with a number of arguments resulting
from the expansion of $Y and "]".

As $Y is not quoted, in all shells but zsh when not in sh/ksh
emulation, the expansion involves a very complex process. The
content of the $Y is first split according to the list of
characters contained in the $IFS special parameter (that part of
the process is generally called "word splitting"). The rules for
that vary from shell to shell, but with the default value of
$IFS, $Y will be considered as a list of blank separated words
(so, $Y will be split according to blanks and leading and
trailing blanks will be removed)

Then (again except with zsh), for those words, the shell will
attempt to consider each of them as a wildcard pattern and
expand them to a list of matching file names (relative to the
current directory) (that's the process generally called
"filename generation" or "globbing").

A last thing and that also applies to zsh, if the Y parameter is
empty, $Y will expand to no argument at all as opposed to one
empty argument (that's a process sometimes called "empties
removal").

Here, as it happens, the Y variable can only contain either
nothing or "1", so unless the $IFS character contains "1" $Y
will expand into either no argument at all or one argument being
"1".

The "[" command when called with the only 2 arguments "[" and
"]" is a command that returns "false" as a special case (no test
expression provided). "[" "1" "]" returns true on the ground
that "1" is not an empty string.

I think the OP's idea was to test whether the $Y string was
empty or not. So it was a very convoluted and dangerous way to
write [ -n "$Y" ] or [ "$Y" != "" ].

echo "$REPLY"

again, is a command whose behavior varies a lot between shell
and even for a same shell depending on the environment or the
way the shell was compiled.

That command will display the content of the $REPLY variable on
stdout followed by a newline character unless (depending on the
shell/echo implementation) $REPLY is one of "-" "-e", "-E",
"-n", "-ne", "-nn"... or contains backslash characters in which
case all sorts of things may happen.

So, that command is something that may do what you want in an
inefficient way probably for most inputs but will give
unexpected results (varying from shell to shell) for some other
inputs and is to my mind an improper usage of the shell.

--
Stéphane
  Réponse avec citation
Vieux 29/04/2008, 08h45   #8
Nezhate
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: commands to manipulate files

On Apr 29, 11:25 am, Stephane CHAZELAS <this.addr...@is.invalid>
wrote:
> 2008-04-28, 23:08(-07), Nezhate:
> [...]
>
> > mop2: Thanks for your . but can you explain me what the next line
> > do (I'm newbie to shell programming)I understand that ?
> > Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done

>
> (note that it is non-standard and the behavior varies amongst
> the shells that support read -d (zsh, ksh93 and bash)).
>
> It is actually very complicated.
>
> The "read -d\*" is a command that returns true (with a zero exit
> status) if it finds an unescaped "*" in its standard input.
>
> It will store in the $REPLY variable the sequence of characters
> read up to but not including that unescaped "*", but not before
> having done a few transformations on it:
>
> - except for bash, the leading and trailing blank characters
> (space, tab or newline) will be removed as long as those blank
> characters also happen to be present (once and only once for
> zsh) in the $IFS special parameter or if $IFS is unset
> - except for bash again, the escaped "*"s will be removed.
> - The other "\x" escaped x characters will be changed to "x".
>
> [ $Y ] is also very complicated.
>
> It calls the "[" command with a number of arguments resulting
> from the expansion of $Y and "]".
>
> As $Y is not quoted, in all shells but zsh when not in sh/ksh
> emulation, the expansion involves a very complex process. The
> content of the $Y is first split according to the list of
> characters contained in the $IFS special parameter (that part of
> the process is generally called "word splitting"). The rules for
> that vary from shell to shell, but with the default value of
> $IFS, $Y will be considered as a list of blank separated words
> (so, $Y will be split according to blanks and leading and
> trailing blanks will be removed)
>
> Then (again except with zsh), for those words, the shell will
> attempt to consider each of them as a wildcard pattern and
> expand them to a list of matching file names (relative to the
> current directory) (that's the process generally called
> "filename generation" or "globbing").
>
> A last thing and that also applies to zsh, if the Y parameter is
> empty, $Y will expand to no argument at all as opposed to one
> empty argument (that's a process sometimes called "empties
> removal").
>
> Here, as it happens, the Y variable can only contain either
> nothing or "1", so unless the $IFS character contains "1" $Y
> will expand into either no argument at all or one argument being
> "1".
>
> The "[" command when called with the only 2 arguments "[" and
> "]" is a command that returns "false" as a special case (no test
> expression provided). "[" "1" "]" returns true on the ground
> that "1" is not an empty string.
>
> I think the OP's idea was to test whether the $Y string was
> empty or not. So it was a very convoluted and dangerous way to
> write [ -n "$Y" ] or [ "$Y" != "" ].
>
> echo "$REPLY"
>
> again, is a command whose behavior varies a lot between shell
> and even for a same shell depending on the environment or the
> way the shell was compiled.
>
> That command will display the content of the $REPLY variable on
> stdout followed by a newline character unless (depending on the
> shell/echo implementation) $REPLY is one of "-" "-e", "-E",
> "-n", "-ne", "-nn"... or contains backslash characters in which
> case all sorts of things may happen.
>
> So, that command is something that may do what you want in an
> inefficient way probably for most inputs but will give
> unexpected results (varying from shell to shell) for some other
> inputs and is to my mind an improper usage of the shell.
>
> --
> St�phane

Un grand merci a Stephane Chazelas pour l'explication !
  Réponse avec citation
Vieux 29/04/2008, 12h45   #9
mop2
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: commands to manipulate files

Thanks CHAZELAS!
Very good explanation.

I like read his posts, excelent for learning about most unixes
peculiarities.
My focus will never be portability and my environments are always
under my control.
My universe is very limited and pure shell is always my start point.
I don't see problems with escaped "*" in standard input in that case,
for bash at lest.
For small files I think the shell will be more efficient than the use
of a call to an external tool that isn't in the cache.
  Réponse avec citation
Vieux 29/04/2008, 13h06   #10
Ed Morton
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: commands to manipulate files



On 4/29/2008 6:45 AM, mop2 wrote:
> Thanks CHAZELAS!
> Very good explanation.
>
> I like read his posts, excelent for learning about most unixes
> peculiarities.
> My focus will never be portability and my environments are always
> under my control.
> My universe is very limited and pure shell is always my start point.


Serious question - to me, the shell is an environment from which to call
appropriate tools in a specific order to get a job done, so what does "pure
shell" mean to you?

> I don't see problems with escaped "*" in standard input in that case,
> for bash at lest.
> For small files I think the shell will be more efficient than the use
> of a call to an external tool that isn't in the cache.


Whether that's true or not, for small files efficiency doesn't matter so is
there any other reason to prefer:

Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done < file

over:

awk -v RS='*' '!(NR%2)' file

Regards,

Ed.

  Réponse avec citation
Vieux 29/04/2008, 13h13   #11
Stephane CHAZELAS
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: commands to manipulate files

2008-04-29, 04:45(-07), mop2:
> Thanks CHAZELAS!
> Very good explanation.
>
> I like read his posts, excelent for learning about most unixes
> peculiarities.
> My focus will never be portability and my environments are always
> under my control.
> My universe is very limited and pure shell is always my start point.
> I don't see problems with escaped "*" in standard input in that case,
> for bash at lest.


~/install$ printf '%s\n' 'foo\*bar\*baz' | bash -c 'read -d\*; printf "%s: <%s>\n" "$?" "$REPLY"'
1: <foo*bar*baz
>

~/install$ printf '%s\n' 'foo\*bar*baz' | bash -c 'read -d\*; printf "%s: <%s>\n" "$?" "$REPLY"'
0: <foo*bar>
~/install$ printf '%s\n' 'foo\*bar*baz' | ksh -c 'read -d\*; printf "%s: <%s>\n" "$?" "$REPLY"'
0: <foobar>
~/install$ printf '%s\n' 'foo\*bar*baz' | zsh -c 'read -d\*; printf "%s: <%s>\n" "$?" "$REPLY"'
0: <foobar>

In short, you need the "-r" option, and you need to remove white
spaces from $IFS. One you've done that and replaced "echo" with
"print", your code will become illegible.

> For small files I think the shell will be more efficient than the use
> of a call to an external tool that isn't in the cache.


In which case, you'll gain a few microseconds. For large files
you may end up wasting several seconds or minutes.

And there's also the time spent deciphering the code and the
time spent debugging it, and the time rewriting it when porting
to a system that doesn't have the same shell or not the same
version.

The "pure shell" thing is a nonsense to my mind. A shell is
*the* tool designed to run commands, that's what it's been made
for. Trying to have it not execute commands is a bit like trying
to have rm not remove files.

--
Stéphane
  Réponse avec citation
Vieux 29/04/2008, 13h35   #12
mop2
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: commands to manipulate files

Hi Ed:

Q1
For me "pure shell" is the use of the shell exclusively, without
external programs.

Q2
Using the small file posted as exampe:
Shell bash:
$ time { Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done <
file;}
real 0m0.001s
user 0m0.004s
sys 0m0.000s

I don't use awk for myself.
The first call:
$ time awk -v RS='*' '!(NR%2)' file
real 0m0.051s
user 0m0.000s
sys 0m0.000s
The next:
$ time awk -v RS='*' '!(NR%2)' file
real 0m0.006s
user 0m0.000s
sys 0m0.004s

I see the things in this way.
The relevance of all this is a question of view point and is very
personal.


Ed Morton wrote:
> On 4/29/2008 6:45 AM, mop2 wrote:
> > Thanks CHAZELAS!
> > Very good explanation.
> >
> > I like read his posts, excelent for learning about most unixes
> > peculiarities.
> > My focus will never be portability and my environments are always
> > under my control.
> > My universe is very limited and pure shell is always my start point.

>
> Serious question - to me, the shell is an environment from which to call
> appropriate tools in a specific order to get a job done, so what does "pure
> shell" mean to you?
>
> > I don't see problems with escaped "*" in standard input in that case,
> > for bash at lest.
> > For small files I think the shell will be more efficient than the use
> > of a call to an external tool that isn't in the cache.

>
> Whether that's true or not, for small files efficiency doesn't matter so is
> there any other reason to prefer:
>
> Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done < file
>
> over:
>
> awk -v RS='*' '!(NR%2)' file
>
> Regards,
>
> Ed.

  Réponse avec citation
Vieux 29/04/2008, 13h59   #13
Ed Morton
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: commands to manipulate files

On 4/29/2008 7:35 AM, mop2 wrote:
> Ed Morton wrote:
>
>>On 4/29/2008 6:45 AM, mop2 wrote:
>>
>>>Thanks CHAZELAS!
>>>Very good explanation.
>>>
>>>I like read his posts, excelent for learning about most unixes
>>>peculiarities.
>>>My focus will never be portability and my environments are always
>>>under my control.
>>>My universe is very limited and pure shell is always my start point.

>>
>>Serious question - to me, the shell is an environment from which to call
>>appropriate tools in a specific order to get a job done, so what does "pure
>>shell" mean to you?
>>
>>
>>>I don't see problems with escaped "*" in standard input in that case,
>>>for bash at lest.
>>>For small files I think the shell will be more efficient than the use
>>>of a call to an external tool that isn't in the cache.

>>
>>Whether that's true or not, for small files efficiency doesn't matter so is
>>there any other reason to prefer:
>>
>> Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done < file
>>
>>over:
>>
>> awk -v RS='*' '!(NR%2)' file
>>
>>Regards,
>>
>> Ed.

>
> Hi Ed:
>
> Q1
> For me "pure shell" is the use of the shell exclusively, without
> external programs.


So, if you need to find out how many characters are in a file, you'd do
something other than "wc -c"? I don't mean to preach, it's just that I find
trying to avoid external commands less easy to understand than trying to avoid
cars in favor of horse-and-cart. Perhaps it's a new paradigm - Amish Programming
;-).

> Q2
> Using the small file posted as exampe:
> Shell bash:
> $ time { Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done <
> file;}
> real 0m0.001s
> user 0m0.004s
> sys 0m0.000s
>
> I don't use awk for myself.
> The first call:
> $ time awk -v RS='*' '!(NR%2)' file
> real 0m0.051s
> user 0m0.000s
> sys 0m0.000s
> The next:
> $ time awk -v RS='*' '!(NR%2)' file
> real 0m0.006s
> user 0m0.000s
> sys 0m0.004s


My point was that efficiency isn't a concern for small files since, as you show
above, the script runs in the blink of an eye either way, I was just wondering
if there was any reason other than efficiency to avoid external commands.

> I see the things in this way.
> The relevance of all this is a question of view point and is very
> personal.
>


OK, thanks for explaining. Obviously, you don't have to justify your view point
to me - I was just curious...

Ed.

  Réponse avec citation
Vieux 29/04/2008, 14h19   #14
Stephane CHAZELAS
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: commands to manipulate files

2008-04-29, 05:35(-07), mop2:
[...]
> Q2
> Using the small file posted as exampe:
> Shell bash:
> $ time { Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done <
> file;}
> real 0m0.001s
> user 0m0.004s
> sys 0m0.000s
>
> I don't use awk for myself.
> The first call:
> $ time awk -v RS='*' '!(NR%2)' file
> real 0m0.051s
> user 0m0.000s
> sys 0m0.000s

[...]

Note that the above shows either that those timings cannot be
trusted or that the awk solution uses less CPU time (0ms!) than
the shell-only solution (4ms)!

$ yes 'foo * bar * baz' | head -100000 | time bash -c 'Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done' > /dev/null

real 0m7.33s
user 0m6.15s
sys 0m1.17s
$ yes 'foo * bar * baz' | head -100000 | time bash -c "awk -v RS='*' '!(NR%2)'" > /dev/null

real 0m0.15s
user 0m0.15s
sys 0m0.01s


--
Stéphane
  Réponse avec citation
Vieux 29/04/2008, 14h38   #15
mop2
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: commands to manipulate files



Stephane CHAZELAS wrote:
> 2008-04-29, 04:45(-07), mop2:
> > Thanks CHAZELAS!
> > Very good explanation.
> >
> > I like read his posts, excelent for learning about most unixes
> > peculiarities.
> > My focus will never be portability and my environments are always
> > under my control.
> > My universe is very limited and pure shell is always my start point.
> > I don't see problems with escaped "*" in standard input in that case,
> > for bash at lest.

>
> ~/install$ printf '%s\n' 'foo\*bar\*baz' | bash -c 'read -d\*; printf "%s:<%s>\n" "$?" "$REPLY"'
> 1: <foo*bar*baz
> >

> ~/install$ printf '%s\n' 'foo\*bar*baz' | bash -c 'read -d\*; printf "%s: <%s>\n" "$?" "$REPLY"'
> 0: <foo*bar>
> ~/install$ printf '%s\n' 'foo\*bar*baz' | ksh -c 'read -d\*; printf "%s: <%s>\n" "$?" "$REPLY"'
> 0: <foobar>
> ~/install$ printf '%s\n' 'foo\*bar*baz' | zsh -c 'read -d\*; printf "%s: <%s>\n" "$?" "$REPLY"'
> 0: <foobar>
>
> In short, you need the "-r" option, and you need to remove white
> spaces from $IFS. One you've done that and replaced "echo" with
> "print", your code will become illegible.


Thanks, that is true, the "-r" option is needed for escaped "*".
With the correction in my proposed code:

bash$ printf '%s\n' 'foo\*bar\*baz'|\
{ Y=;while read -r -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done;}
bar\
bash$

I don't say my solution is the best, it is convenient for me.
For others can be just an impracticable or limited solution.
I have more facility with bash than with ksh,...,sed, awk, perl,etc.
So, for me, do programming for bash is faster and easier.

>
> > For small files I think the shell will be more efficient than the use
> > of a call to an external tool that isn't in the cache.

>
> In which case, you'll gain a few microseconds. For large files
> you may end up wasting several seconds or minutes.
>
> And there's also the time spent deciphering the code and the
> time spent debugging it, and the time rewriting it when porting
> to a system that doesn't have the same shell or not the same
> version.

Yes, but here my preference and experience can me over
other options.
>
> The "pure shell" thing is a nonsense to my mind. A shell is
> *the* tool designed to run commands, that's what it's been made
> for. Trying to have it not execute commands is a bit like trying
> to have rm not remove files.
>
> --
> St�phane

  Réponse avec citation
Vieux 29/04/2008, 14h38   #16
Janis
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: commands to manipulate files

On 29 Apr., 15:19, Stephane CHAZELAS <this.addr...@is.invalid> wrote:
> 2008-04-29, 05:35(-07), mop2:
> > real 0m0.001s
> > user 0m0.004s
> > sys 0m0.000s

>
> Note that the above shows either that those timings cannot be
> trusted or that the awk solution uses less CPU time (0ms!) than
> the shell-only solution (4ms)!


In the past decades I've always thought (and haven't ever observed
it differently) that the 'real' value is at least as large as
max('user','sys') or differs at best only in the least significant
digit if comparing it to 'user'+'sys'. And the man pages seem to
confirm that view. How can 'real' be 1ms if 'user' is around 4ms?

Janis
  Réponse avec citation
Vieux 29/04/2008, 14h50   #17
Stephane CHAZELAS
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: commands to manipulate files

2008-04-29, 06:38(-07), Janis:
> On 29 Apr., 15:19, Stephane CHAZELAS <this.addr...@is.invalid> wrote:
>> 2008-04-29, 05:35(-07), mop2:
>> > real 0m0.001s
>> > user 0m0.004s
>> > sys 0m0.000s

>>
>> Note that the above shows either that those timings cannot be
>> trusted or that the awk solution uses less CPU time (0ms!) than
>> the shell-only solution (4ms)!

>
> In the past decades I've always thought (and haven't ever observed
> it differently) that the 'real' value is at least as large as
> max('user','sys') or differs at best only in the least significant
> digit if comparing it to 'user'+'sys'. And the man pages seem to
> confirm that view. How can 'real' be 1ms if 'user' is around 4ms?

[...]

"real" is <end-time> - <start-time>, which on a system running
more than one process and one or several CPU has little
correlation with the number of CPU cycles that are needed to
execute the corresponding code. You have to consider the time
used up by other processes, the time waiting for resources, and
the fact that several processors might run concurrently to
perform the task.

All you are guaranteed is that:

real >= (user + sys) / ncpus

--
Stéphane
  Réponse avec citation
Vieux 29/04/2008, 15h07   #18
mop2
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: commands to manipulate files



Ed Morton wrote:
> On 4/29/2008 7:35 AM, mop2 wrote:
> > Ed Morton wrote:
> >
> >>On 4/29/2008 6:45 AM, mop2 wrote:
> >>
> >>>Thanks CHAZELAS!
> >>>Very good explanation.
> >>>
> >>>I like read his posts, excelent for learning about most unixes
> >>>peculiarities.
> >>>My focus will never be portability and my environments are always
> >>>under my control.
> >>>My universe is very limited and pure shell is always my start point.
> >>
> >>Serious question - to me, the shell is an environment from which to call
> >>appropriate tools in a specific order to get a job done, so what does "pure
> >>shell" mean to you?
> >>
> >>
> >>>I don't see problems with escaped "*" in standard input in that case,
> >>>for bash at lest.
> >>>For small files I think the shell will be more efficient than the use
> >>>of a call to an external tool that isn't in the cache.
> >>
> >>Whether that's true or not, for small files efficiency doesn't matter so is
> >>there any other reason to prefer:
> >>
> >> Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done < file
> >>
> >>over:
> >>
> >> awk -v RS='*' '!(NR%2)' file
> >>
> >>Regards,
> >>
> >> Ed.

> >
> > Hi Ed:
> >
> > Q1
> > For me "pure shell" is the use of the shell exclusively, without
> > external programs.

>
> So, if you need to find out how many characters are in a file, you'd do
> something other than "wc -c"? I don't mean to preach, it's just that I find
> trying to avoid external commands less easy to understand than trying to avoid
> cars in favor of horse-and-cart. Perhaps it's a new paradigm - Amish Programming
> ;-).

Probably no, but for length in a variable content, perhaps...
I am not a fundamentalist. This is my view today for that case.
Tomorrow it can be different because i'm learning a bit every day.

>
> > Q2
> > Using the small file posted as exampe:
> > Shell bash:
> > $ time { Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done <
> > file;}
> > real 0m0.001s
> > user 0m0.004s
> > sys 0m0.000s
> >
> > I don't use awk for myself.
> > The first call:
> > $ time awk -v RS='*' '!(NR%2)' file
> > real 0m0.051s
> > user 0m0.000s
> > sys 0m0.000s
> > The next:
> > $ time awk -v RS='*' '!(NR%2)' file
> > real 0m0.006s
> > user 0m0.000s
> > sys 0m0.004s

>
> My point was that efficiency isn't a concern for small files since, as you show
> above, the script runs in the blink of an eye either way, I was just wondering
> if there was any reason other than efficiency to avoid external commands.

I prefer know much from a thing than a bit from more things (sorry, i
don't know how speak this in english)
>
> > I see the things in this way.
> > The relevance of all this is a question of view point and is very
> > personal.
> >

>
> OK, thanks for explaining. Obviously, you don't have to justify your view point
> to me - I was just curious...

I also like hear details from others to try understand because they
have their opinions.

>
> Ed.

  Réponse avec citation
Vieux 29/04/2008, 15h07   #19
Janis
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: commands to manipulate files

On 29 Apr., 15:50, Stephane CHAZELAS <this.addr...@is.invalid> wrote:
>
> "real" is <end-time> - <start-time>, which on a system running
> more than one process and one or several CPU has little
> correlation with the number of CPU cycles that are needed to
> execute the corresponding code. You have to consider the time
> used up by other processes, the time waiting for resources, and
> the fact that several processors might run concurrently to
> perform the task.
>
> All you are guaranteed is that:
>
> real >= (user + sys) / ncpus


Ah, thanks. So user and sys are actually the respective accumulated
CPU seconds of all involved CPUs. (Wasn't aware of that; I guess
it's time to switch to a state-of-the-art multi-core platform.)

Janis
  Réponse avec citation
Vieux 29/04/2008, 15h41   #20
Ed Morton
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: commands to manipulate files

On 4/29/2008 8:50 AM, Stephane CHAZELAS wrote:
> 2008-04-29, 06:38(-07), Janis:
>
>>On 29 Apr., 15:19, Stephane CHAZELAS <this.addr...@is.invalid> wrote:
>>
>>>2008-04-29, 05:35(-07), mop2:
>>>
>>>>real 0m0.001s
>>>>user 0m0.004s
>>>>sys 0m0.000s
>>>
>>>Note that the above shows either that those timings cannot be
>>>trusted or that the awk solution uses less CPU time (0ms!) than
>>>the shell-only solution (4ms)!

>>
>>In the past decades I've always thought (and haven't ever observed
>>it differently) that the 'real' value is at least as large as
>>max('user','sys') or differs at best only in the least significant
>>digit if comparing it to 'user'+'sys'. And the man pages seem to
>>confirm that view. How can 'real' be 1ms if 'user' is around 4ms?

>
> [...]
>
> "real" is <end-time> - <start-time>, which on a system running
> more than one process and one or several CPU has little
> correlation with the number of CPU cycles that are needed to
> execute the corresponding code. You have to consider the time
> used up by other processes, the time waiting for resources, and
> the fact that several processors might run concurrently to
> perform the task.
>
> All you are guaranteed is that:
>
> real >= (user + sys) / ncpus
>


Just curious - is there a way to specify that a given process must be run on
just one processor? Seems like the "time" output for "user" and "sys" might be
more useful in that case if you want to compare apples...

Ed.

Ed.

  Réponse avec citation
Vieux 29/04/2008, 15h51   #21
Stephane CHAZELAS
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: commands to manipulate files

2008-04-29, 09:41(-05), Ed Morton:
[...]
> Just curious - is there a way to specify that a given process must be run on
> just one processor? Seems like the "time" output for "user" and "sys" might be
> more useful in that case if you want to compare apples...

[...]

In both cases (while read loop vs awk), there was only one
process running at a time, so it shouldn't make a big
difference.

In any case, the "real" timing has little significance wrt to
measuring performance as it may take into account the time spent
to run other processes. The user+sys is significant in that it's
the quantity of CPU cycles that are needed to perform the task
(note that the amount of work that has to be done may vary from
one run to the next, in that you may or may not have to move
pages of code or data around in between cache, memory,
permanent/network... storage).

It shouldn't change significantly if you assign all the
threads to a same processor or several. Of course though, that
doesn't take into account the time waiting for IO like for
instance when loading the executables/libraries/data into
memory.

--
Stéphane
  Réponse avec citation
Vieux 29/04/2008, 15h56   #22
Ed Morton
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: commands to manipulate files



On 4/29/2008 9:51 AM, Stephane CHAZELAS wrote:
> 2008-04-29, 09:41(-05), Ed Morton:
> [...]
>
>>Just curious - is there a way to specify that a given process must be run on
>>just one processor? Seems like the "time" output for "user" and "sys" might be
>>more useful in that case if you want to compare apples...

>
> [...]
>
> In both cases (while read loop vs awk), there was only one
> process running at a time, so it shouldn't make a big
> difference.
>
> In any case, the "real" timing has little significance wrt to
> measuring performance as it may take into account the time spent
> to run other processes. The user+sys is significant in that it's
> the quantity of CPU cycles that are needed to perform the task
> (note that the amount of work that has to be done may vary from
> one run to the next, in that you may or may not have to move
> pages of code or data around in between cache, memory,
> permanent/network... storage).
>
> It shouldn't change significantly if you assign all the
> threads to a same processor or several. Of course though, that
> doesn't take into account the time waiting for IO like for
> instance when loading the executables/libraries/data into
> memory.
>


Oh, I thought there might be some inter-processor communication and scheduling
performance impact in the multi-processor case that would have a non-negligible
impact the user+sys counts.

Ed

  Réponse avec citation
Vieux 29/04/2008, 16h14   #23
mop2
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: commands to manipulate files



Stephane CHAZELAS wrote:
> 2008-04-29, 05:35(-07), mop2:
> [...]
> > Q2
> > Using the small file posted as exampe:
> > Shell bash:
> > $ time { Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done <
> > file;}
> > real 0m0.001s
> > user 0m0.004s
> > sys 0m0.000s
> >
> > I don't use awk for myself.
> > The first call:
> > $ time awk -v RS='*' '!(NR%2)' file
> > real 0m0.051s
> > user 0m0.000s
> > sys 0m0.000s

> [...]
>
> Note that the above shows either that those timings cannot be
> trusted or that the awk solution uses less CPU time (0ms!) than
> the shell-only solution (4ms)!
>
> $ yes 'foo * bar * baz' | head -100000 | time bash -c 'Y=;while read -d\* ;do [ $Y ]&&echo "$REPLY"&&Y=||Y=1;done' > /dev/null
>
> real 0m7.33s
> user 0m6.15s
> sys 0m1.17s
> $ yes 'foo * bar * baz' | head -100000 | time bash -c "awk -v RS='*' '!(NR%2)'" > /dev/null
>
> real 0m0.15s
> user 0m0.15s
> sys 0m0.01s
>

Yes, for 100k lines as i said, my code isn't a efficient option.
But, for me, for a eventual use it is more eficient than awk,
considering CODER time.
I know nothing about awk and i don't have large amount of data to
process.
Learning english is much more important for me, for example.

As aditional reference for 10, 100 and 100k lines with Stephane's two
codes (bash/awk):

bash$ cat
s
#function "t" is because problems here with command "time"
t(){ [ $T ]&&echo `date +%s.%N`-$T|bc&&T=||T=`date +%s.%N`;}
T=

for L in 10 100 100000;do echo LINES=$L
t
yes 'foo * bar * baz' | head -$L | bash -c 'Y=;while read -d\* ;do
[ $Y ]&&echo "$REPLY"&&Y=||Y=1;done' > /dev/null
t
t
yes 'foo * bar * baz' | head -$L | bash -c "awk -v RS='*' '!(NR%2)'"
> /dev/null

t
done

bash$ . ./s
LINES=10
.023761324
.024364479
LINES=100
.031842460
.024581416
LINES=100000
8.672900116
.226956947

With both programs in cache, bash is faster only for few lines, as
expected.

>
> --
> St�phane