PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Forums Hébergement > Forum Serveur - Sécurité et techniques > comp.unix.shell > How awk `split' works?
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
comp.unix.shell Using and programming the Unix shell.

How awk `split' works?

Réponse
 
LinkBack Outils de la discussion
Vieux 08/05/2008, 11h49   #1
PRC
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut How awk `split' works?

The results of running the following script
gawk 'BEGIN {
n = split("(a,b,c,d)", a, /[(,)]/);
printf("n=%d\n", n);
for(i=1; i<=n; i++)
printf(" -%s-\n",a[i]);
}'
is
n=6
--
-a-
-b-
-c-
-d-
--

instead of
n=4
-a-
-b-
-c-
-d-
which is expected.

I have no ideas how these results come out.
  Réponse avec citation
Vieux 08/05/2008, 11h53   #2
Dave B
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: How awk `split' works?

On Thursday 8 May 2008 12:49, PRC wrote:

> The results of running the following script
> gawk 'BEGIN {
> n = split("(a,b,c,d)", a, /[(,)]/);
> printf("n=%d\n", n);
> for(i=1; i<=n; i++)
> printf(" -%s-\n",a[i]);
> }'
> is
> n=6
> --
> -a-
> -b-
> -c-
> -d-
> --
>
> instead of
> n=4
> -a-
> -b-
> -c-
> -d-
> which is expected.
>
> I have no ideas how these results come out.


You are telling awk to use either '(', ',' or ')' as field separator for
splitting.

Given your string '(a,b,c,d)' awk sees six fields:

- one empty field before the '('
- a
- b
- c
- d
- one empty field after the ')'

Try this:

$ echo '(a,b,c,d)' | awk -F '[(,)]' '{print NF}'
6

--
D.
  Réponse avec citation
Vieux 08/05/2008, 12h13   #3
PRC
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: How awk `split' works?

I see
But if FS is space, awk will skip empty fields. Why does awk work in
different ways for cases where FS is space and where FS is regular
expression?

Dave B wrote:
> On Thursday 8 May 2008 12:49, PRC wrote:
>
> > The results of running the following script
> > gawk 'BEGIN {
> > n = split("(a,b,c,d)", a, /[(,)]/);
> > printf("n=%d\n", n);
> > for(i=1; i<=n; i++)
> > printf(" -%s-\n",a[i]);
> > }'
> > is
> > n=6
> > --
> > -a-
> > -b-
> > -c-
> > -d-
> > --
> >
> > instead of
> > n=4
> > -a-
> > -b-
> > -c-
> > -d-
> > which is expected.
> >
> > I have no ideas how these results come out.

>
> You are telling awk to use either '(', ',' or ')' as field separator for
> splitting.
>
> Given your string '(a,b,c,d)' awk sees six fields:
>
> - one empty field before the '('
> - a
> - b
> - c
> - d
> - one empty field after the ')'
>
> Try this:
>
> $ echo '(a,b,c,d)' | awk -F '[(,)]' '{print NF}'
> 6
>
> --
> D.

  Réponse avec citation
Vieux 08/05/2008, 12h21   #4
Dave B
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: How awk `split' works?

On Thursday 8 May 2008 13:13, PRC wrote:

> I see
> But if FS is space, awk will skip empty fields. Why does awk work in
> different ways for cases where FS is space and where FS is regular
> expression?


Because space is explicitly defined to be a special case.

Compare:

$ echo ' a b ' | awk '{print NF}'
2
$ echo ',,a,,b,,' | awk -F, '{print NF}'
7

From the standard:

The following describes FS behavior:

1. If FS is a null string, the behavior is unspecified.

2. If FS is a single character:

1. If FS is <space>, skip leading and trailing <blank>s; fields
shall be delimited by sets of one or more <blank>s.

2. Otherwise, if FS is any other character c, fields shall be
delimited by each single occurrence of c.

3. Otherwise, the string value of FS shall be considered to be an
extended regular expression. Each occurrence of a sequence matching
the extended regular expression shall delimit fields.


And splitting in split() works the same way.

--
D.

  Réponse avec citation
Vieux 08/05/2008, 12h55   #5
Janis
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: How awk `split' works?

On 8 Mai, 13:13, PRC <panruoc...@gmail.com> wrote:

[Please don't top-post.]

> I see
> But if FS is space, awk will skip empty fields. Why does awk work in
> different ways for cases where FS is space and where FS is regular
> expression?


In your example you didn't use FS. Generally there are some
special cases implemented with the semantics of FS/RS and
spaces or null strings. I suppose to get the best benefits
from a concise awk interface and powerful features. Besides
that your program behaves exactly the same way if you specify
a space as regexp...

n = split(" a b c d ", a, / /);


In your application, since you know the data and delimiters,
just change your loop

for(i=2; i<n; i++)


Janis

>
> Dave B wrote:
> > On Thursday 8 May 2008 12:49, PRC wrote:

>
> > > The results of running the following script
> > > gawk 'BEGIN {
> > > n = split("(a,b,c,d)", a, /[(,)]/);
> > > printf("n=%d\n", n);
> > > for(i=1; i<=n; i++)
> > > printf(" -%s-\n",a[i]);
> > > }'
> > > is
> > > n=6
> > > --
> > > -a-
> > > -b-
> > > -c-
> > > -d-
> > > --

>
> > > instead of
> > > n=4
> > > -a-
> > > -b-
> > > -c-
> > > -d-
> > > which is expected.

>
> > > I have no ideas how these results come out.

>
> > You are telling awk to use either '(', ',' or ')' as field separator for
> > splitting.

>
> > Given your string '(a,b,c,d)' awk sees six fields:

>
> > - one empty field before the '('
> > - a
> > - b
> > - c
> > - d
> > - one empty field after the ')'

>
> > Try this:

>
> > $ echo '(a,b,c,d)' | awk -F '[(,)]' '{print NF}'
> > 6

>
> > --
> > D

  Réponse avec citation
Vieux 08/05/2008, 12h56   #6
Ed Morton
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: How awk `split' works?

On 5/8/2008 6:21 AM, Dave B wrote:
> On Thursday 8 May 2008 13:13, PRC wrote:
>
>
>>I see
>>But if FS is space, awk will skip empty fields. Why does awk work in
>>different ways for cases where FS is space and where FS is regular
>>expression?

>
>
> Because space is explicitly defined to be a special case.
>
> Compare:
>
> $ echo ' a b ' | awk '{print NF}'
> 2
> $ echo ',,a,,b,,' | awk -F, '{print NF}'
> 7
>
> From the standard:
>
> The following describes FS behavior:
>
> 1. If FS is a null string, the behavior is unspecified.
>
> 2. If FS is a single character:
>
> 1. If FS is <space>, skip leading and trailing <blank>s; fields
> shall be delimited by sets of one or more <blank>s.
>
> 2. Otherwise, if FS is any other character c, fields shall be
> delimited by each single occurrence of c.
>
> 3. Otherwise, the string value of FS shall be considered to be an
> extended regular expression. Each occurrence of a sequence matching
> the extended regular expression shall delimit fields.
>
>
> And splitting in split() works the same way.
>


and if you want to literally use a single blank character as the field
separator, specify it as '[ ]':

$ echo ' a b ' | awk '{print NF}'
2
$ echo ' a b ' | awk -F'[ ]' '{print NF}
7

and if you want to use repetitions of a given character (or RE), specify it as
'<pattern>+':

$ echo ',,a,,b,,' | awk -F, '{print NF}'
7
$ echo ',,a,,b,,' | awk -F',+' '{print NF}'
4

and if you want it treated the same as the default FS, you need to strip away
any leading and trailing occurences of the FS:

$ echo ',,a,,b,,' | awk -F',+' '{gsub("^"FS"|"FS"$","")rint NF}'
2

So, the default FS behavior is a shorthand that lets us write:

$ echo ' a b ' | awk '{print NF}'
2

instead of:

$ echo ' a b ' | awk -F'[[:blank:]]+' '{gsub("^"FS"|"FS"$","")rint NF}'
2

Ed.

  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 02h33.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,13809 seconds with 14 queries