|
|
|
|
||||||
| comp.unix.shell Using and programming the Unix shell. |
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Guys,
I would like to describe the current scenario. I got two type of files Primary and secondary. There is only one primary file and around hundred secondary files. Primary.txt Contains two columns i.e name and number of rows ================================================== ========= Currency_exchange|25000 Sales|21000 instruments|120000 ================================================== ========= Secondary1.txt Contains two columns i.e name and number of rows ================================================== ========= Currency_exchange|21000 Sales|21000 instruments|120000 ================================================== ========= Secondary2.txt Contains two columns i.e name and number of rows ================================================== ========= Currency_exchange|23100 Sales|21000 instruments|120000 ================================================== ========= There are 100 more secondary files like Secondary3.txt,Secondary4.txt.....Secondary100.txt . First column( name) contains the same value among all files but second column (number of rows) may contain different values. Now, I want to compare each secondary file (i.e Secondary1.txt,Secondary1.txt ....so on) with Primary.txt and copy those rows in another file where number of rows are not matching. In other words I want to figure out where the number of rows in secondary files(i.e Secondary1.txt,Secondary1.txt ....so on) are not matching with primary (primary.txt) What is the best way to do this ? I will heartly thankful to all for any assistance regarding this. Thanks in advance SS |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
On Sat, 29 Dec 2007 17:55:07 -0800, sonal10july wrote:
> Guys, > I would like to describe the current scenario. > I got two type of files Primary and secondary. There is only one primary > file and around hundred secondary files. > > Primary.txt Contains two columns i.e name and number of rows > ================================================== ========= > Currency_exchange|25000 > Sales|21000 > instruments|120000 > > ================================================== ========= > > > Secondary1.txt Contains two columns i.e name and number of rows > > ================================================== ========= > Currency_exchange|21000 > Sales|21000 > instruments|120000 > > ================================================== ========= > > Secondary2.txt Contains two columns i.e name and number of rows > > ================================================== ========= > Currency_exchange|23100 > Sales|21000 > instruments|120000 > > ================================================== ========= Good so far, you have show us some typical input files. > There are 100 more secondary files like > Secondary3.txt,Secondary4.txt.....Secondary100.txt . First column( name) > contains the same value among all files but second column (number of > rows) may contain different values. Useful information - again ful. > Now, I want to compare each secondary file (i.e > Secondary1.txt,Secondary1.txt ....so on) with Primary.txt and copy those > rows in another file where number of rows are not matching. In other > words I want to figure out where the number of rows in secondary > files(i.e Secondary1.txt,Secondary1.txt ....so on) are not matching with > primary (primary.txt) At this point your request becomes less ful. You didn;t show us the required output. For instance you say "copy those rows in another file", do you want a single "another file", or one file for each secondary. Do you want some information on which secondary the mismatched row came from? awk -F'|' 'NR==FNR {v[$1]=$2;} v[$1]!=$2 {print FILENAME,$0}' primary.txt Secondary*.txt > out may do what you want. |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
In article
<c32538df-82de-4d2b-9ed9-5b67070d1d12@y5g2000hsf.googlegroups.com>, sonal10july@gmail.com wrote: > Guys, > > I would like to describe the current scenario. > I got two type of files Primary and secondary. There is only one > primary file and around hundred secondary files. > > Primary.txt Contains two columns i.e name and number of rows > ================================================== ========= > Currency_exchange|25000 > Sales|21000 > instruments|120000 > > ================================================== ========= > > > Secondary1.txt Contains two columns i.e name and number of rows > > ================================================== ========= > Currency_exchange|21000 > Sales|21000 > instruments|120000 > > ================================================== ========= > > Secondary2.txt Contains two columns i.e name and number of rows > > ================================================== ========= > Currency_exchange|23100 > Sales|21000 > instruments|120000 > > ================================================== ========= > > There are 100 more secondary files like > Secondary3.txt,Secondary4.txt.....Secondary100.txt . > First column( name) contains the same value among all files but second > column (number of rows) may contain different values. > > Now, I want to compare each secondary file (i.e > Secondary1.txt,Secondary1.txt ....so on) with Primary.txt and copy > those rows in another file where number of rows are not matching. > In other words I want to figure out where the number of rows in > secondary files(i.e Secondary1.txt,Secondary1.txt ....so on) are not > matching with primary (primary.txt) > > What is the best way to do this ? I will heartly thankful to all for > any assistance regarding this. > > Thanks in advance > > SS This seems like a good starting point: for file in Secodary*.txt do diff Primary.txt "$file" done Depending on your specific needs, you may want to use options to diff and/or pipe the output to something to grab the parts you want. -- Barry Margolin, barmar@alum.mit.edu Arlington, MA *** PLEASE post questions in newsgroups, not directly to me *** *** PLEASE don't copy me on replies, I'll read them in the group *** |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
Thanks for your quick reply . There is a answer to your question : "Do you want a single "another file", or one file for each secondary" Yes. I want to create a single output file and also want to know which secondary the mismatched row came from? Following will be my output Secondary1.txt|Currency_exchange|21000 Secondary2.txt|Currency_exchange|23100 |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
On Sat, 29 Dec 2007 19:30:49 -0800, sonal10july wrote:
> Thanks for your quick reply . > There is a answer to your question : > "Do you want a single "another file", or one file for each secondary" > > Yes. I want to create a single output file and also want to know which > secondary the mismatched row came from? > > > Following will be my output > > Secondary1.txt|Currency_exchange|21000 > Secondary2.txt|Currency_exchange|23100 Did you try the two lines I suggested? It will do what you ask for except there will be a space after the filename, rather than a "|". awk -F'|' 'NR==FNR {v[$1]=$2;} v[$1]!=$2 {print FILENAME "|" $0}' primary.txt Secondary*.txt > out is a fix for this problem. Make sure you are using a 'sh' family shell (sh, ksh, bash, zsh) when you type this, rather than a csh family (csh, tcsh) or something even more exotic (rc, scsh, es, .....). |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
I copied the above command in a file and ran the script but it's showing all the records from all files. I'm not very much familier with awk command .So, following are the steps I performed. 1. Copied the above command in a file called 'main_script.sh' ################################################## ####### $cat main_script.sh #!/bin/ksh awk -F'|' 'NR==FNR {v[$1]=$2;} v[$1]!=$2 {print FILENAME "|" $0}' Primary.txt Secondary* ################################################## ####### 2. Ran the script. ################################################## ####### $ sh main_script.sh Primary.txt|Currency_exchange|25000 Primary.txt|Sales|21000 Primary.txt|instruments|120000 Secondary1.txt|Currency_exchange|25000 Secondary1.txt|Sales|20000 Secondary1.txt|instruments|120000 Secondary2.txt|Currency_exchange|25000 Secondary2.txt|Sales|20000 Secondary2.txt|instruments|110000 Secondary3.txt|Currency_exchange|25000 Secondary3.txt|Sales|6600 Secondary3.txt|instruments|9000 ################################################## ####### Basically It printed the whole contents from all four files. Thanks in advance for your . I'm using korn shell. $ echo $SHELL /bin/ksh Best Regards SS |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
On Sun, 30 Dec 2007 12:22:56 -0800, sonal10july wrote:
> I copied the above command in a file and ran the script but it's > showing all the records from all files. I'm not very much familier with > awk command .So, following are the steps I performed. > > 1. Copied the above command in a file called 'main_script.sh' > > ################################################## ####### $cat > main_script.sh > #!/bin/ksh > awk -F'|' 'NR==FNR {v[$1]=$2;} > v[$1]!=$2 {print FILENAME "|" $0}' Primary.txt Secondary* > > ################################################## ####### > > 2. Ran the script. > > ################################################## ####### $ sh > main_script.sh > > Primary.txt|Currency_exchange|25000 > Primary.txt|Sales|21000 > Primary.txt|instruments|120000 > Secondary1.txt|Currency_exchange|25000 > Secondary1.txt|Sales|20000 > Secondary1.txt|instruments|120000 > Secondary2.txt|Currency_exchange|25000 > Secondary2.txt|Sales|20000 > Secondary2.txt|instruments|110000 > Secondary3.txt|Currency_exchange|25000 > Secondary3.txt|Sales|6600 > Secondary3.txt|instruments|9000 > ################################################## ####### > > Basically It printed the whole contents from all four files. Thanks in > advance for your . > > I'm using korn shell. > $ echo $SHELL > /bin/ksh > > > Best Regards > SS OK, something is very wrong. The -F'|' sets the field delimiter to be a vertical bar, which is the correct value for the data you have shown us. The "NR==FNR" is an awk idiom, which is true for the first file, and false for the second and later files. So "NR==FNR { v[$1]=$2}" says "save in the array 'v' the value of the second field in the element indexed by the first field". The second line "v[$1]!=$2" says "If the value stored in the 'v' array for the first field is not the same as the second field, then do the action", and the action is "{print FILENAME "|" $0}" which is "print out the filename, a vertical bar, and the line from the file". The second line, by definition, must be true for the first file, as the first line sets the elements of the 'v' array. When I copy your files I get the following output Secondary1.txt|Currency_exchange|21000 Secondary2.txt|Currency_exchange|23100 Can you send me your files by email (the email address of this post is valid)? You might try changing the program to #!/bin/ksh awk -F'|' 'NR==FNR {v[$1]=$2; print "Setting v[" $1 "] to <" $2 ">"} v[$1]!=$2 {print FILENAME "|" $0 "(" $1 "," $2 ")"}' Primary.txt Secondary* as a debugging aid, and letting us see the output (either here or via email). Icarus |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
Here is the full details with the content of each file
lyca /home/sukumar/testing:cat Primary.txt Currency_exchange|25000 Sales|21000 instruments|120000 lyca /home/sukumar/testing:cat Secondary1.txt Currency_exchange|25000 Sales|20000 instruments|120000 lyca /home/sukumar/testing:cat Secondary2.txt Currency_exchange|25000 Sales|21000 instruments|120000 lyca /home/sukumar/testing:cat Secondary3.txt Currency_exchange|25000 Sales|6600 instruments|9000 lyca /home/sukumar/testing:cat main_script.sh #!/bin/ksh awk -F'|' 'NR==FNR {v[$1]=$2; print "Setting v[" $1 "] to <" $2 ">"} v[$1]!=$2 {print FILENAME "|" $0 "(" $1 "," $2 ")"}' Primary.txt Secondary* lyca /home/sukumar/testing:ksh main_script.sh Primary.txt|Currency_exchange|25000(Currency_excha nge,25000) Primary.txt|Sales|21000(Sales,21000) Primary.txt|instruments|120000(instruments,120000) Secondary1.txt|Currency_exchange|25000(Currency_ex change,25000) Secondary1.txt|Sales|20000(Sales,20000) Secondary1.txt|instruments|120000(instruments,1200 00) Secondary2.txt|Currency_exchange|25000(Currency_ex change,25000) Secondary2.txt|Sales|21000(Sales,21000) Secondary2.txt|instruments|120000(instruments,1200 00) Secondary3.txt|Currency_exchange|25000(Currency_ex change,25000) Secondary3.txt|Sales|6600(Sales,6600) Secondary3.txt|instruments|9000(instruments,9000) Can you please run the command for this input. My output should be ################################################## ########### Secondary1.txt|Sales|20000(Sales,20000) Secondary3.txt|Sales|6600(Sales,6600) Secondary3.txt|instruments|9000(instruments,9000) ################################################## ########### Thsnks for your . Best Regards SS |
|
![]() |
| Outils de la discussion | |
|
|