|
|
|
|
||||||
| comp.unix.shell Using and programming the Unix shell. |
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
The results of running the following script
gawk 'BEGIN { n = split("(a,b,c,d)", a, /[(,)]/); printf("n=%d\n", n); for(i=1; i<=n; i++) printf(" -%s-\n",a[i]); }' is n=6 -- -a- -b- -c- -d- -- instead of n=4 -a- -b- -c- -d- which is expected. I have no ideas how these results come out. |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
On Thursday 8 May 2008 12:49, PRC wrote:
> The results of running the following script > gawk 'BEGIN { > n = split("(a,b,c,d)", a, /[(,)]/); > printf("n=%d\n", n); > for(i=1; i<=n; i++) > printf(" -%s-\n",a[i]); > }' > is > n=6 > -- > -a- > -b- > -c- > -d- > -- > > instead of > n=4 > -a- > -b- > -c- > -d- > which is expected. > > I have no ideas how these results come out. You are telling awk to use either '(', ',' or ')' as field separator for splitting. Given your string '(a,b,c,d)' awk sees six fields: - one empty field before the '(' - a - b - c - d - one empty field after the ')' Try this: $ echo '(a,b,c,d)' | awk -F '[(,)]' '{print NF}' 6 -- D. |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
I see
But if FS is space, awk will skip empty fields. Why does awk work in different ways for cases where FS is space and where FS is regular expression? Dave B wrote: > On Thursday 8 May 2008 12:49, PRC wrote: > > > The results of running the following script > > gawk 'BEGIN { > > n = split("(a,b,c,d)", a, /[(,)]/); > > printf("n=%d\n", n); > > for(i=1; i<=n; i++) > > printf(" -%s-\n",a[i]); > > }' > > is > > n=6 > > -- > > -a- > > -b- > > -c- > > -d- > > -- > > > > instead of > > n=4 > > -a- > > -b- > > -c- > > -d- > > which is expected. > > > > I have no ideas how these results come out. > > You are telling awk to use either '(', ',' or ')' as field separator for > splitting. > > Given your string '(a,b,c,d)' awk sees six fields: > > - one empty field before the '(' > - a > - b > - c > - d > - one empty field after the ')' > > Try this: > > $ echo '(a,b,c,d)' | awk -F '[(,)]' '{print NF}' > 6 > > -- > D. |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
On Thursday 8 May 2008 13:13, PRC wrote:
> I see > But if FS is space, awk will skip empty fields. Why does awk work in > different ways for cases where FS is space and where FS is regular > expression? Because space is explicitly defined to be a special case. Compare: $ echo ' a b ' | awk '{print NF}' 2 $ echo ',,a,,b,,' | awk -F, '{print NF}' 7 From the standard: The following describes FS behavior: 1. If FS is a null string, the behavior is unspecified. 2. If FS is a single character: 1. If FS is <space>, skip leading and trailing <blank>s; fields shall be delimited by sets of one or more <blank>s. 2. Otherwise, if FS is any other character c, fields shall be delimited by each single occurrence of c. 3. Otherwise, the string value of FS shall be considered to be an extended regular expression. Each occurrence of a sequence matching the extended regular expression shall delimit fields. And splitting in split() works the same way. -- D. |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
On 8 Mai, 13:13, PRC <panruoc...@gmail.com> wrote:
[Please don't top-post.] > I see > But if FS is space, awk will skip empty fields. Why does awk work in > different ways for cases where FS is space and where FS is regular > expression? In your example you didn't use FS. Generally there are some special cases implemented with the semantics of FS/RS and spaces or null strings. I suppose to get the best benefits from a concise awk interface and powerful features. Besides that your program behaves exactly the same way if you specify a space as regexp... n = split(" a b c d ", a, / /); In your application, since you know the data and delimiters, just change your loop for(i=2; i<n; i++) Janis > > Dave B wrote: > > On Thursday 8 May 2008 12:49, PRC wrote: > > > > The results of running the following script > > > gawk 'BEGIN { > > > n = split("(a,b,c,d)", a, /[(,)]/); > > > printf("n=%d\n", n); > > > for(i=1; i<=n; i++) > > > printf(" -%s-\n",a[i]); > > > }' > > > is > > > n=6 > > > -- > > > -a- > > > -b- > > > -c- > > > -d- > > > -- > > > > instead of > > > n=4 > > > -a- > > > -b- > > > -c- > > > -d- > > > which is expected. > > > > I have no ideas how these results come out. > > > You are telling awk to use either '(', ',' or ')' as field separator for > > splitting. > > > Given your string '(a,b,c,d)' awk sees six fields: > > > - one empty field before the '(' > > - a > > - b > > - c > > - d > > - one empty field after the ')' > > > Try this: > > > $ echo '(a,b,c,d)' | awk -F '[(,)]' '{print NF}' > > 6 > > > -- > > D |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
On 5/8/2008 6:21 AM, Dave B wrote:
> On Thursday 8 May 2008 13:13, PRC wrote: > > >>I see >>But if FS is space, awk will skip empty fields. Why does awk work in >>different ways for cases where FS is space and where FS is regular >>expression? > > > Because space is explicitly defined to be a special case. > > Compare: > > $ echo ' a b ' | awk '{print NF}' > 2 > $ echo ',,a,,b,,' | awk -F, '{print NF}' > 7 > > From the standard: > > The following describes FS behavior: > > 1. If FS is a null string, the behavior is unspecified. > > 2. If FS is a single character: > > 1. If FS is <space>, skip leading and trailing <blank>s; fields > shall be delimited by sets of one or more <blank>s. > > 2. Otherwise, if FS is any other character c, fields shall be > delimited by each single occurrence of c. > > 3. Otherwise, the string value of FS shall be considered to be an > extended regular expression. Each occurrence of a sequence matching > the extended regular expression shall delimit fields. > > > And splitting in split() works the same way. > and if you want to literally use a single blank character as the field separator, specify it as '[ ]': $ echo ' a b ' | awk '{print NF}' 2 $ echo ' a b ' | awk -F'[ ]' '{print NF} 7 and if you want to use repetitions of a given character (or RE), specify it as '<pattern>+': $ echo ',,a,,b,,' | awk -F, '{print NF}' 7 $ echo ',,a,,b,,' | awk -F',+' '{print NF}' 4 and if you want it treated the same as the default FS, you need to strip away any leading and trailing occurences of the FS: $ echo ',,a,,b,,' | awk -F',+' '{gsub("^"FS"|"FS"$","") rint NF}'2 So, the default FS behavior is a shorthand that lets us write: $ echo ' a b ' | awk '{print NF}' 2 instead of: $ echo ' a b ' | awk -F'[[:blank:]]+' '{gsub("^"FS"|"FS"$","") rint NF}'2 Ed. |
|
![]() |
| Outils de la discussion | |
|
|