|
|
|
#1 |
|
Messages: n/a
Hébergeur: |
hello folks,
I get a problem according to the "case" i am trying to the parse lots of .txt files into a .csv file. the txt file looks like this 2006-05-17 00:01:08 [INFO] "Name: joe" 2006-05-17 00:01:08 [INFO] "Sex: m" 2006-05-17 00:01:08 [INFO] "Co. Nr.: ece222" the csv file name sex age co. Nr. joe m 21 ece222 may f 20 csi091 .... for FILENAME in `find ./ -name *.txt` do unset tmpline # output buffer IFS=' ' while read new_line do KEY_WORD=`awk '{print $4}' | sed -e s/\"// | sed -e s/\://` case "$KEYWORD" in "Name" ) echo "hello" ;; "Sex" ) echo "sex" ;; * ) echo "byebye" ;; esac done < $FILENAME done but it doesnot work. it comes only "byebye" as output. anyone who has an idea? THANKS! Dong |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
donghuang.de@googlemail.com wrote:
> hello folks, > > I get a problem according to the "case" What problem exactly? > > i am trying to the parse lots of .txt files into a .csv file. > > the txt file looks like this > > 2006-05-17 00:01:08 [INFO] "Name: joe" > 2006-05-17 00:01:08 [INFO] "Sex: m" > 2006-05-17 00:01:08 [INFO] "Co. Nr.: ece222" > > the csv file > name sex age co. Nr. > joe m 21 ece222 > may f 20 csi091 > ... That is no CSV file. Where's the "21" and the "20" coming from? > > for FILENAME in `find ./ -name *.txt` > do > > unset tmpline # output buffer Where is 'tmpline' used? > IFS=' > ' > while read new_line Where is 'new_line' used? > do > KEY_WORD=`awk '{print $4}' | sed -e s/\"// | sed -e s/\://` > case "$KEYWORD" in > "Name" ) echo "hello" ;; > "Sex" ) echo "sex" ;; > * ) echo "byebye" ;; > esac > > done < $FILENAME > done Describe at least what you exactly want to achieve instead of just posting horrifying code. > > but it doesnot work. it comes only "byebye" as output. > anyone who has an idea? Without knowing what you exactly want to try to do, I can just guess - try... awk 'BEGIN{FS="\""; print "name\tsex\tage\tco. Nr."} { split($2,arr,": ") printf("%s%c", arr[2], (arr[1] != "Co. Nr.")?"\t":"\n") }' Janis > > THANKS! > Dong > |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
donghuang.de@googlemail.com wrote:
> hello folks, > > I get a problem according to the "case" > > i am trying to the parse lots of .txt files into a .csv file. > > the txt file looks like this > > 2006-05-17 00:01:08 [INFO] "Name: joe" > 2006-05-17 00:01:08 [INFO] "Sex: m" > 2006-05-17 00:01:08 [INFO] "Co. Nr.: ece222" > > the csv file > name sex age co. Nr. > joe m 21 ece222 > may f 20 csi091 > ... > > for FILENAME in `find ./ -name *.txt` > do > > unset tmpline # output buffer > IFS=' > ' > while read new_line > do > KEY_WORD=`awk '{print $4}' | sed -e s/\"// | sed -e s/\://` > case "$KEYWORD" in > "Name" ) echo "hello" ;; > "Sex" ) echo "sex" ;; > * ) echo "byebye" ;; > esac > > done < $FILENAME > done > > but it doesnot work. it comes only "byebye" as output. > anyone who has an idea? > You main problem is that awk tries to read many lines from stdin, i.e. competes with the read command. You certainly want to feed awk from the current line: KEYWORD=`echo "$new_line" | awk '{print $4}' | sed -e s/\"// -e s/\://` -- Michael Tosch @ hp : com |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
donghuang.de@googlemail.com writes:
> hello folks, > > I get a problem according to the "case" > > i am trying to the parse lots of .txt files into a .csv file. > > the txt file looks like this > > 2006-05-17 00:01:08 [INFO] "Name: joe" > 2006-05-17 00:01:08 [INFO] "Sex: m" > 2006-05-17 00:01:08 [INFO] "Co. Nr.: ece222" > > the csv file > name sex age co. Nr. > joe m 21 ece222 > may f 20 csi091 > ... > > for FILENAME in `find ./ -name *.txt` That's a very bad way to read file names. > do > > unset tmpline # output buffer > IFS=' > ' You don't need to modify IFS here. > while read new_line > do > KEY_WORD=`awk '{print $4}' | sed -e s/\"// | sed -e s/\://` > case "$KEYWORD" in > "Name" ) echo "hello" ;; > "Sex" ) echo "sex" ;; > * ) echo "byebye" ;; > esac > > done < $FILENAME > done #v+ COUNTER=0 find . -name '*.txt' -exec cat -- {} \; | while IFS= read value; do value=${value#*"} value=${value%"*} field=${value%%:*} value=${value#*: } # now you can do whatever you want with $field and $value case "$field" in (Name) NAME=$value ;; (Sex) SEX=$value ;; (Co. Nr.) if [ -n "$NAME" ] && [ -n "$SEX" ]; then COUNTER=$(( $COUNTER + 1 )) printf '%-10s %-3s %3s %3d %-6s\n' "$NAME" "$SEX" \ '' $COUNTER "$value" unset NAME SEX else # misformatted fi esac done unset COUNTER #v- -- Best regards, _ _ .o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michal "mina86" Nazarewicz (o o) ooo +--<mina86*tlen.pl>---<jid:mina86*chrome.pl>--ooO--(_)--Ooo-- |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
hi michael,
thanks for your reply. i don't try your code yet, i used this KEY_WORD=`awk '/Name:/{print $6} \ /Sex:/{print $6} \ /Age:/{print $5} }' $FILENAME` it works out with limitation that the order of keyword in .csv file is the same as they come in the .txt 2006-05-17 00:01:08 [INFO] "Name: joe" 2006-05-17 00:01:08 [INFO] "Sex: m" 2006-05-17 00:01:08 [INFO] "Co. Nr.: ece222" the csv file name sex co. Nr. age joe m ece222 21 may f csi091 20 but it is not possible like this: name sex age co. Nr. joe m 21 ece222 may f 20 csi091 the first approach for this code I used a tmp file into which the items of each key_word stored. like joe m 21 ece222 may f 20 and then process the tmp file into a .csv file. but it was very slow. double time So it came the idea use awk to increase the efficiency. back to your solution . one question i may face is that the print $4 coz sometimes i need print out $ 5 or $ 6 like for "co. Nr." Another issue: for the last item in each .csv line i am using # APPRO # if (grep -q 'Approved' "$FILENAME"); then KEY_WORD=${KEY_WORD}"TRUE" else KEY_WORD=${KEY_WORD}"FALSE" fi the grep reads the file one more time which takes longer time. On Nov 3, 1:03 am, Michael Tosch <eed...@NO.eed.SPAM.ericsson.PLS.se> wrote: > donghuang...@googlemail.com wrote: > > hello folks, > > > I get a problem according to the "case" > > > i am trying to the parse lots of .txt files into a .csv file. > > > the txt file looks like this > > > 2006-05-17 00:01:08 [INFO] "Name: joe" > > 2006-05-17 00:01:08 [INFO] "Sex: m" > > 2006-05-17 00:01:08 [INFO] "Co. Nr.: ece222" > > > the csv file > > name sex age co. Nr. > > joe m 21 ece222 > > may f 20 csi091 > > ... > > > for FILENAME in `find ./ -name *.txt` > > do > > > unset tmpline # output buffer > > IFS=' > > ' > > while read new_line > > do > > KEY_WORD=`awk '{print $4}' | sed -e s/\"// | sed -e s/\://` > > case "$KEYWORD" in > > "Name" ) echo "hello" ;; > > "Sex" ) echo "sex" ;; > > * ) echo "byebye" ;; > > esac > > > done < $FILENAME > > done > > > but it doesnot work. it comes only "byebye" as output. > > anyone who has an idea?You main problem is that awk tries to read many lines from stdin, > i.e. competes with the read command. > You certainly want to feed awk from the current line: > > KEYWORD=`echo "$new_line" | awk '{print $4}' | sed -e s/\"// -e s/\://` > > -- > Michael Tosch @ hp : com |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
david huan wrote:
> hi michael, > > thanks for your reply. > i don't try your code yet, i used this > > KEY_WORD=`awk '/Name:/{print $6} \ > /Sex:/{print $6} \ > /Age:/{print $5} }' $FILENAME` > > it works out with limitation that the order of keyword in .csv file is > the same as they come in the .txt > > 2006-05-17 00:01:08 [INFO] "Name: joe" > 2006-05-17 00:01:08 [INFO] "Sex: m" > 2006-05-17 00:01:08 [INFO] "Co. Nr.: ece222" > > the csv file > name sex co. Nr. age > joe m ece222 21 > may f csi091 20 > > but it is not possible like this: > name sex age co. Nr. > joe m 21 ece222 > may f 20 csi091 > > the first approach for this code I used a tmp file into which the items > of each key_word stored. like > > joe > m > 21 > ece222 > may > f > 20 > > and then process the tmp file into a .csv file. > but it was very slow. double time > So it came the idea use awk to increase the efficiency. You can avoid the tmp file. Rather than .... > tmpfile use .... | awk ' > > back to your solution . > one question i may face is that the print $4 > coz sometimes i need print out $ 5 or $ 6 like for "co. Nr." > If the field delimiters are different, sed is often better than awk. Delete until leftmost quote and from leftmost colon: echo "$new_line" | sed -e 's/[^"]*"//; s/:.*//' Certainly faster is to pipe everything through sed, which then must deliver all the needed information: cat *.txt | sed -e 's/[^"]*"//; s/: /:/; s/"[^"]*//' | { IFS=":" while read KEYWORD VALUE do case KEYWORD in ... done } > Another issue: > for the last item in each .csv line i am using > > # APPRO # > if (grep -q 'Approved' "$FILENAME"); then > KEY_WORD=${KEY_WORD}"TRUE" > else > KEY_WORD=${KEY_WORD}"FALSE" > fi > You can certainly avoid the double read. Please give an example, where the 'Approved' is located, and how it should appear in the output file. -- Michael Tosch @ hp : com |
|
![]() |
| Outils de la discussion | |
|
|