|
|
|
|
||||||
| comp.unix.shell Using and programming the Unix shell. |
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
I have file A with about 600 000 rows
File B contains all the line numbers I need to delete, one line per file, 87 rows this time (could be 200 rows tomorrow) how do I use sed to delete each of the line from File A to create file C the manual command is simple and works perfectly: sed -e '5865d 7754d 12406d .. .. .. 488596d 490322d 492259d 493646d' FileA >> FileC but I want to automate the process and sed gives me fits. |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
On Mar 20, 10:56am, pk <p...@pk.pk> wrote:
> pk wrote: > > sed 's/.*/&d/g' FileB > delete.sed > > Or also > > sed 's/$/d/g' FileB > delete.sed > > -- > All the commands are tested with bash and GNU tools, so they may use > nonstandard features. I try to mention when something is nonstandard (if > I'm aware of that), but I may miss something. Corrections are welcome. DAMN ... that was FAST..... thank you so much, sed 's/$/d/g' FileB > delete.sed worked like a champ. |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
LionelAndJen@gmail.com wrote:
> I have file A with about 600 000 rows > File B contains all the line numbers I need to delete, one line per > file, 87 rows this time (could be 200 rows tomorrow) > > how do I use sed to delete each of the line from File A to create file > C > > the manual command is simple and works perfectly: > > sed -e '5865d > 7754d > 12406d > . > . > . > 488596d > 490322d > 492259d > 493646d' FileA >> FileC > > but I want to automate the process and sed gives me fits. Assuming FileB has the format 123 145 233 2689 .... you can generate a sed command file starting from FileB (using sed, of course!) sed 's/.*/&d/g' FileB > delete.sed and then sed -f delete.sed FileA >> FileC -- All the commands are tested with bash and GNU tools, so they may use nonstandard features. I try to mention when something is nonstandard (if I'm aware of that), but I may miss something. Corrections are welcome. |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
pk wrote:
> sed 's/.*/&d/g' FileB > delete.sed Or also sed 's/$/d/g' FileB > delete.sed -- All the commands are tested with bash and GNU tools, so they may use nonstandard features. I try to mention when something is nonstandard (if I'm aware of that), but I may miss something. Corrections are welcome. |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
pk <pk@pk.pk> writes:
> pk wrote: > > > sed 's/.*/&d/g' FileB > delete.sed > > Or also > > sed 's/$/d/g' FileB > delete.sed or sed 's/$/d/' FileB > delete.sed |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
On 3/20/2008 10:28 AM, LionelAndJen@gmail.com wrote: > I have file A with about 600 000 rows > File B contains all the line numbers I need to delete, one line per > file, 87 rows this time (could be 200 rows tomorrow) > > how do I use sed to delete each of the line from File A to create file > C > > the manual command is simple and works perfectly: > > sed -e '5865d > 7754d > 12406d > . > . > . > 488596d > 490322d > 492259d > 493646d' FileA >> FileC > > but I want to automate the process and sed gives me fits. That's not what a good job for sed, but it's trivial in awk: awk 'NR==FNR{skip[$0];next}!(FNR in skip)' FileB FileA > FileC Ed. |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
Ed Morton wrote:
> awk 'NR==FNR{skip[$0];next}!(FNR in skip)' FileB FileA > FileC This is one I've been wondering for a long time. If FileA and FileB are very large, isn't the (FNR in skip) check inefficient? I mean, that seems to imply a walk over the entire array to see whether the element exists each time the condition is chacked. (I may be wrong of course, probably due to my ignorance about the inner workings of awk). Wouldn't something like this awk 'NR==FNR{skip[$0]++;next} skip[FNR]==0' FileB FileA > FileC be more efficient? Thanks -- All the commands are tested with bash and GNU tools, so they may use nonstandard features. I try to mention when something is nonstandard (if I'm aware of that), but I may miss something. Corrections are welcome. |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
On 3/21/2008 3:40 AM, pk wrote:
> Ed Morton wrote: > > >>awk 'NR==FNR{skip[$0];next}!(FNR in skip)' FileB FileA > FileC > > > This is one I've been wondering for a long time. If FileA and FileB are very > large, isn't the (FNR in skip) check inefficient? I mean, that seems to > imply a walk over the entire array to see whether the element exists each > time the condition is chacked. (I may be wrong of course, probably due to > my ignorance about the inner workings of awk). Wouldn't something like this > > awk 'NR==FNR{skip[$0]++;next} skip[FNR]==0' FileB FileA > FileC > > be more efficient? Could be, though I expect the "in" operator is using hashing so it'd be close as you're trading a hash lookup for an arithmetic increment plus an index plus a comparison. Here's the result of running both scripts twice deleting every odd-numbered line in a 1-million line file using GNU awk 3.1.6: $ wc -l FileB FileA 500000 FileB 1000000 FileA 1500000 total $ time awk 'NR==FNR{skip[$0];next}!(FNR in skip)' FileB FileA > FileC real 0m29.016s user 0m28.546s sys 0m0.328s $ time awk 'NR==FNR{skip[$0]++;next} skip[FNR]==0' FileB FileA > FileC real 0m29.558s user 0m29.015s sys 0m0.436s $ time awk 'NR==FNR{skip[$0];next}!(FNR in skip)' FileB FileA > FileC real 0m28.915s user 0m28.484s sys 0m0.483s $ time awk 'NR==FNR{skip[$0]++;next} skip[FNR]==0' FileB FileA > FileC real 0m29.502s user 0m29.186s sys 0m0.327s Regards, Ed. |
|
|
|
#9 |
|
Messages: n/a
Hébergeur: |
Ed Morton wrote:
> Here's the result of running both scripts twice deleting every > odd-numbered line in a 1-million line file using GNU awk 3.1.6: Yeah, I also consistently see the same results with GNU awk 3.1.5, so, contrarily to what I thought, the (FNR in skip) seems to be slightly more efficient. Thanks! -- All the commands are tested with bash and GNU tools, so they may use nonstandard features. I try to mention when something is nonstandard (if I'm aware of that), but I may miss something. Corrections are welcome. |
|
![]() |
| Outils de la discussion | |
|
|