|
|
|
|
||||||
| comp.unix.shell Using and programming the Unix shell. |
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
I have a shell script that invokes an awk command on a file of the
following format: (Date, Time) 2007-01-01, 00:00:00,121 2007-01-01, 00:00:00,311 2007-01-01, 00:00:00,432 .... .... 2007-01-01, 00:01:10,778 2007-01-01, 00:01:10,981 2007-01-01, 00:01:11,121 .... .... The script I have basically parses the file and generates a comma separated output of Date, Time, Count 2007-01-01, 00:00:00, 3 2007-01-01, 00:01:10, 2 2007-01-01, 00:01:11, 1 The command looks like this: awk '{++hr[$1,substr($2,1,9)]}END{for(item in hr)printf("%s%s\n",item, hr[item])}' ${logfile} > ${logfile}.csv At the end of the process I have an output of 86400 lines, too big for Excel. I want to truncate the output to only output the max value in any given minute and thus reduce the output to 1440 lines. I have the following script that I'm struggling with... #!/usr/bin/ksh awk 'BEGIN{ for (i=0;i<23;i++) { for (j=0;j<=59;j++) { for (k=0;k<=59;k++) { p=sprintf("%02d:%02d:%02d", i, j, k); hr[p] = 0; } } } } { ++hr[$1,substr($2,1,9)] }END{ for (i=0;i<23;i++) { for (j=0;j<=59;j++) { h=sprintf("%02d:%02d", i, j); hh[h] = 0; for (k=0;k<=59;k++) { p=sprintf("%02d:%02d:%02d", i, j, k); if(k=0) hh[h]=hr[p]; else{ pp=sprintf("%02d:%02d:%02d", i, j, k-1); print hr[pp]; if(hh[pp]>hh[h]) hh[h]=hr[pp]; } } } } for(item in hr){ printf("%s%s\n",item, hr[item]); } }' $1 Can anyone tell me what I'm doing wrong? I want the following output: 2007-01-01, 00:00, 4 2007-01-01, 00:01, 5 2007-01-01, 00:02, 4 .... .... Thanks |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
AyOut wrote:
> I have a shell script that invokes an awk command on a file of the > following format: > (Date, Time) > 2007-01-01, 00:00:00,121 > 2007-01-01, 00:00:00,311 > 2007-01-01, 00:00:00,432 > ... > ... > 2007-01-01, 00:01:10,778 > 2007-01-01, 00:01:10,981 > 2007-01-01, 00:01:11,121 > ... > ... > cut -d: -f1-2 FILE | uniq -c | awk '{print $2 " " $3 " " $1}' > I want the following output: > > 2007-01-01, 00:00, 4 > 2007-01-01, 00:01, 5 > 2007-01-01, 00:02, 4 > ... > ... -- Best regards | "The only way to really learn scripting is to write Cyrus | scripts." -- Advanced Bash-Scripting Guide |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
AyOut wrote:
> I have a shell script that invokes an awk command on a file of the > following format: > (Date, Time) > 2007-01-01, 00:00:00,121 > 2007-01-01, 00:00:00,311 > 2007-01-01, 00:00:00,432 > ... > ... > 2007-01-01, 00:01:10,778 > 2007-01-01, 00:01:10,981 > 2007-01-01, 00:01:11,121 > ... > ... > cut -d: -f1-2 FILE | uniq -c | awk '{print $2 " " $3 ", " $1}' > I want the following output: > > 2007-01-01, 00:00, 4 > 2007-01-01, 00:01, 5 > 2007-01-01, 00:02, 4 > ... > ... -- Best regards | "The only way to really learn scripting is to write Cyrus | scripts." -- Advanced Bash-Scripting Guide |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
AyOut wrote:
> I have a shell script that invokes an awk command on a file of the > following format: > (Date, Time) > 2007-01-01, 00:00:00,121 > 2007-01-01, 00:00:00,311 > 2007-01-01, 00:00:00,432 > ... > ... > 2007-01-01, 00:01:10,778 > 2007-01-01, 00:01:10,981 > 2007-01-01, 00:01:11,121 > ... > ... > > The script I have basically parses the file and generates a comma > separated output of > Date, Time, Count > 2007-01-01, 00:00:00, 3 > 2007-01-01, 00:01:10, 2 > 2007-01-01, 00:01:11, 1 > > The command looks like this: > > awk '{++hr[$1,substr($2,1,9)]}END{for(item in hr)printf("%s%s\n",item, > hr[item])}' ${logfile} > ${logfile}.csv > > At the end of the process I have an output of 86400 lines, too big for > Excel. I want to truncate the output to only output the max value in > any given minute and thus reduce the output to 1440 lines. > Try with substr($2,1,5)] instead of substr($2,1,9)]. Hermann |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
On Aug 30, 2:10 pm, Hermann Peifer <pei...@gmx.eu> wrote:
> AyOut wrote: > > I have a shell script that invokes an awk command on a file of the > > following format: > > (Date, Time) > > 2007-01-01, 00:00:00,121 > > 2007-01-01, 00:00:00,311 > > 2007-01-01, 00:00:00,432 > > ... > > ... > > 2007-01-01, 00:01:10,778 > > 2007-01-01, 00:01:10,981 > > 2007-01-01, 00:01:11,121 > > ... > > ... > > > The script I have basically parses the file and generates a comma > > separated output of > > Date, Time, Count > > 2007-01-01, 00:00:00, 3 > > 2007-01-01, 00:01:10, 2 > > 2007-01-01, 00:01:11, 1 > > > The command looks like this: > > > awk '{++hr[$1,substr($2,1,9)]}END{for(item in hr)printf("%s%s\n",item, > > hr[item])}' ${logfile} > ${logfile}.csv > > > At the end of the process I have an output of 86400 lines, too big for > > Excel. I want to truncate the output to only output the max value in > > any given minute and thus reduce the output to 1440 lines. > > Try with substr($2,1,5)] instead of substr($2,1,9)]. > > Hermann I thought about doing this, but that's only going to give me TPS granularity at a minute level. Doing $2,1,9 allows me to capture the hits per second. I.e.: (counts) 22 23 20 ... ... 22 24 21 The max second value for the above given minute would be 24. |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
AyOut wrote:
> On Aug 30, 2:10 pm, Hermann Peifer <pei...@gmx.eu> wrote: >> AyOut wrote: >>> I have a shell script that invokes an awk command on a file of the >>> following format: >>> (Date, Time) >>> 2007-01-01, 00:00:00,121 >>> 2007-01-01, 00:00:00,311 >>> 2007-01-01, 00:00:00,432 >>> ... >>> ... >>> 2007-01-01, 00:01:10,778 >>> 2007-01-01, 00:01:10,981 >>> 2007-01-01, 00:01:11,121 >>> ... >>> ... >>> The script I have basically parses the file and generates a comma >>> separated output of >>> Date, Time, Count >>> 2007-01-01, 00:00:00, 3 >>> 2007-01-01, 00:01:10, 2 >>> 2007-01-01, 00:01:11, 1 >>> The command looks like this: >>> awk '{++hr[$1,substr($2,1,9)]}END{for(item in hr)printf("%s%s\n",item, >>> hr[item])}' ${logfile} > ${logfile}.csv >>> At the end of the process I have an output of 86400 lines, too big for >>> Excel. I want to truncate the output to only output the max value in >>> any given minute and thus reduce the output to 1440 lines. >> Try with substr($2,1,5)] instead of substr($2,1,9)]. >> >> Hermann > > > I thought about doing this, but that's only going to give me TPS > granularity at a minute level. Doing $2,1,9 allows me to capture the > hits per second. I.e.: > (counts) > 22 > 23 > 20 > .. > .. > 22 > 24 > 21 > > The max second value for the above given minute would be 24. > I was reading too quickly and misinterpreted "max value per minute" with the total count per minute. For the max number of hits per second on the minute level, you could try this: $ awk '{ second=substr($0,1,20) minute=substr($0,1,17) ++hr[second] if (hr[second]>max[minute]) max[minute]=hr[second] }END{for(item in max)printf "%s, %s\n",item,max[item]}' ... Hermann |
|
![]() |
| Outils de la discussion | |
|
|