|
|
|
|
||||||
| comp.unix.shell Using and programming the Unix shell. |
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Hi all,
Sorry to post a (rather) trivial question, but my brain can't find a high(er) gear at the moment.. I have a file like: #v+ 173&&&HILLEGOM (wit bord)& 174&11.1&200.5&KRS N208 Rechtdoor&Leidsestraat 175&0.1&200.6&Einde weg RE&Hyacintenlaan 176&1.3&201.9&Einde weg RE&Veenburgerweg 177&0.5&202.4&1ste weg LI ri Vogelenzang&3de Loosterweg 178&1.0&203.4&ROT LI&Beeklaan, N442 179&&&DE ZILK& 180&2.0&205.4&Einde weg LI&N206 181&0.8&206.2&1ste afslag RE&Vogelaardreef 182&0.7&206.9&VRW RE ri langevelderslag en de weg blijven volgen&Vogelaardreef 183&5.4&212.3&Bij Cafe-restaurant "De Witte Raaf" RE ri Noordwijk 2&Duinweg 184&&&NOORDWIJK& #v- (It's a route description for our motorclub btw) This file is going to be imported in a laTeX file, to serve as longtable data Anyway, the first figure of each line is a referencei-counter, the all cap words are place-sign names, and I would like to remove the counter on all lines with the all-caps names, but not on other lines. So e.g. 173&&&HILLEGOM (wit bord)& should become: &&&HILLEGOM (wit bord)& I'v been over awk and ised and grep and whatnot, but I can't get something working alas. Thanks in advance. Theo -- theo at van-werkhoven.nl ICQ:277217131 SuSE Linux linuxcounter.org: 99872 Jabber:muadib at jabber.xs4all.nl AMD XP3000+ 1024MB "ik _heb_ niets tegen Microsoft, ik heb iets tegen de uitwassen *van* Microsoft" |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
* Theo v. Werkhoven [2007.04.30 21:11]:
> So e.g. > 173&&&HILLEGOM (wit bord)& > should become: > &&&HILLEGOM (wit bord)& sed 's/^[0-9]*//' /tmp/foo -- JR |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
Theo v. Werkhoven wrote:
> Hi all, > Sorry to post a (rather) trivial question, but my brain can't find a > high(er) gear at the moment.. > I have a file like: > #v+ > 173&&&HILLEGOM (wit bord)& > 174&11.1&200.5&KRS N208 Rechtdoor&Leidsestraat > 175&0.1&200.6&Einde weg RE&Hyacintenlaan > 176&1.3&201.9&Einde weg RE&Veenburgerweg > 177&0.5&202.4&1ste weg LI ri Vogelenzang&3de Loosterweg > 178&1.0&203.4&ROT LI&Beeklaan, N442 > 179&&&DE ZILK& > 180&2.0&205.4&Einde weg LI&N206 > 181&0.8&206.2&1ste afslag RE&Vogelaardreef > 182&0.7&206.9&VRW RE ri langevelderslag en de weg blijven volgen&Vogelaardreef > 183&5.4&212.3&Bij Cafe-restaurant "De Witte Raaf" RE ri Noordwijk 2&Duinweg > 184&&&NOORDWIJK& > #v- > (It's a route description for our motorclub btw) > This file is going to be imported in a laTeX file, to serve as > longtable data > Anyway, the first figure of each line is a referencei-counter, the all cap words > are place-sign names, and I would like to remove the counter on all lines with > the all-caps names, but not on other lines. > So e.g. > 173&&&HILLEGOM (wit bord)& > should become: > &&&HILLEGOM (wit bord)& How is that "all cap words" defined? The fourth field (assumed & as separator) does also contain lower case letters (within the brackets). Here's a solution for fourth field uppercase... awk -F\& -v OFS=\& '$4~/^[A-Z]+$/{$1=""}1' If you just want a word with _two_ uppercase letters as criterion... awk -F\& -v OFS=\& '$4~/^[A-Z][A-Z]/{$1=""}1' If neither is what you want, please clarify. Janis > > I'v been over awk and ised and grep and whatnot, but I can't get something > working alas. > > Thanks in advance. > Theo |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
Theo v. Werkhoven wrote:
> Hi all, > Sorry to post a (rather) trivial question, but my brain can't find a > high(er) gear at the moment.. > I have a file like: > #v+ > 173&&&HILLEGOM (wit bord)& > 174&11.1&200.5&KRS N208 Rechtdoor&Leidsestraat > 175&0.1&200.6&Einde weg RE&Hyacintenlaan > 176&1.3&201.9&Einde weg RE&Veenburgerweg > 177&0.5&202.4&1ste weg LI ri Vogelenzang&3de Loosterweg > 178&1.0&203.4&ROT LI&Beeklaan, N442 > 179&&&DE ZILK& > 180&2.0&205.4&Einde weg LI&N206 > 181&0.8&206.2&1ste afslag RE&Vogelaardreef > 182&0.7&206.9&VRW RE ri langevelderslag en de weg blijven volgen&Vogelaardreef > 183&5.4&212.3&Bij Cafe-restaurant "De Witte Raaf" RE ri Noordwijk 2&Duinweg > 184&&&NOORDWIJK& > #v- > (It's a route description for our motorclub btw) > This file is going to be imported in a laTeX file, to serve as > longtable data > Anyway, the first figure of each line is a referencei-counter, the all cap words > are place-sign names, and I would like to remove the counter on all lines with > the all-caps names, but not on other lines. > So e.g. > 173&&&HILLEGOM (wit bord)& > should become: > &&&HILLEGOM (wit bord)& > > I'v been over awk and ised and grep and whatnot, but I can't get something > working alas. $ echo "#v+ 173&&&HILLEGOM (wit bord)& 174&11.1&200.5&KRS N208 Rechtdoor&Leidsestraat 175&0.1&200.6&Einde weg RE&Hyacintenlaan 176&1.3&201.9&Einde weg RE&Veenburgerweg 177&0.5&202.4&1ste weg LI ri Vogelenzang&3de Loosterweg 178&1.0&203.4&ROT LI&Beeklaan, N442 179&&&DE ZILK& 180&2.0&205.4&Einde weg LI&N206 181&0.8&206.2&1ste afslag RE&Vogelaardreef 182&0.7&206.9&VRW RE ri langevelderslag en de weg blijven volgen&Vogelaardreef 183&5.4&212.3&Bij Cafe-restaurant "De Witte Raaf" RE ri Noordwijk 2&Duinweg 184&&&NOORDWIJK& #v- " | perl -pe's/^\d+(?=&&&[[:upper:]]+\b)//' #v+ &&&HILLEGOM (wit bord)& 174&11.1&200.5&KRS N208 Rechtdoor&Leidsestraat 175&0.1&200.6&Einde weg RE&Hyacintenlaan 176&1.3&201.9&Einde weg RE&Veenburgerweg 177&0.5&202.4&1ste weg LI ri Vogelenzang&3de Loosterweg 178&1.0&203.4&ROT LI&Beeklaan, N442 &&&DE ZILK& 180&2.0&205.4&Einde weg LI&N206 181&0.8&206.2&1ste afslag RE&Vogelaardreef 182&0.7&206.9&VRW RE ri langevelderslag en de weg blijven volgen&Vogelaardreef 183&5.4&212.3&Bij Cafe-restaurant De Witte Raaf RE ri Noordwijk 2&Duinweg &&&NOORDWIJK& #v- John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order. -- Larry Wall |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
Theo v. Werkhoven wrote: > Hi all, > Sorry to post a (rather) trivial question, but my brain can't find a > high(er) gear at the moment.. > I have a file like: > #v+ > 173&&&HILLEGOM (wit bord)& > 174&11.1&200.5&KRS N208 Rechtdoor&Leidsestraat > 175&0.1&200.6&Einde weg RE&Hyacintenlaan > 176&1.3&201.9&Einde weg RE&Veenburgerweg > 177&0.5&202.4&1ste weg LI ri Vogelenzang&3de Loosterweg > 178&1.0&203.4&ROT LI&Beeklaan, N442 > 179&&&DE ZILK& > 180&2.0&205.4&Einde weg LI&N206 > 181&0.8&206.2&1ste afslag RE&Vogelaardreef > 182&0.7&206.9&VRW RE ri langevelderslag en de weg blijven volgen&Vogelaardreef > 183&5.4&212.3&Bij Cafe-restaurant "De Witte Raaf" RE ri Noordwijk 2&Duinweg > 184&&&NOORDWIJK& > #v- > (It's a route description for our motorclub btw) > This file is going to be imported in a laTeX file, to serve as > longtable data > Anyway, the first figure of each line is a referencei-counter, the all cap words > are place-sign names, and I would like to remove the counter on all lines with > the all-caps names, but not on other lines. > So e.g. > 173&&&HILLEGOM (wit bord)& > should become: > &&&HILLEGOM (wit bord)& > > I'v been over awk and ised and grep and whatnot, but I can't get something > working alas. > > Thanks in advance. > Theo > -- > theo at van-werkhoven.nl ICQ:277217131 SuSE Linux > linuxcounter.org: 99872 Jabber:muadib at jabber.xs4all.nl AMD XP3000+ 1024MB > "ik _heb_ niets tegen Microsoft, ik heb iets tegen > de uitwassen *van* Microsoft" ruby -pe 'sub(/\d+(?=&&&[[:upper:]]+\b)/,"")' |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
* Jean-Rene David [2007.04.30 21:39]:
> * Theo v. Werkhoven [2007.04.30 21:11]: >> So e.g. >> 173&&&HILLEGOM (wit bord)& >> should become: >> &&&HILLEGOM (wit bord)& > > sed 's/^[0-9]*//' /tmp/foo Sorry I misread the problem. Never mind that. -- JR |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
The carbonbased lifeform John W. Krahn inspired comp.unix.shell with:
> $ echo "#v+ > 173&&&HILLEGOM (wit bord)& > 174&11.1&200.5&KRS N208 Rechtdoor&Leidsestraat > 175&0.1&200.6&Einde weg RE&Hyacintenlaan > 176&1.3&201.9&Einde weg RE&Veenburgerweg > 177&0.5&202.4&1ste weg LI ri Vogelenzang&3de Loosterweg > 178&1.0&203.4&ROT LI&Beeklaan, N442 > 179&&&DE ZILK& > 180&2.0&205.4&Einde weg LI&N206 > 181&0.8&206.2&1ste afslag RE&Vogelaardreef > 182&0.7&206.9&VRW RE ri langevelderslag en de weg blijven volgen&Vogelaardreef > 183&5.4&212.3&Bij Cafe-restaurant "De Witte Raaf" RE ri Noordwijk 2&Duinweg > 184&&&NOORDWIJK& > #v- > " | perl -pe's/^\d+(?=&&&[[:upper:]]+\b)//' Thanks John, that's exactly what I had in mind. Now that I read it it's obvious of course, but unfortunatly I'm not very good in regexps. Cheers, Theo -- theo at van-werkhoven.nl ICQ:277217131 SuSE Linux linuxcounter.org: 99872 Jabber:muadib at jabber.xs4all.nl AMD XP3000+ 1024MB "ik _heb_ niets tegen Microsoft, ik heb iets tegen de uitwassen *van* Microsoft" |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
The carbonbased lifeform Janis Papanagnou inspired comp.unix.shell with:
> Theo v. Werkhoven wrote: [..] >> 173&&&HILLEGOM (wit bord)& >> should become: >> &&&HILLEGOM (wit bord)& > > How is that "all cap words" defined? The fourth field (assumed & as separator) > does also contain lower case letters (within the brackets). I know, I wasn't really clear. For a non-Dutch person it's not immediatly obvious that the words following ^\d+[&&&] are names of places. Those are the only lines I'm interested in. > If neither is what you want, please clarify. See John's answer. Thanks, Theo -- theo at van-werkhoven.nl ICQ:277217131 SuSE Linux linuxcounter.org: 99872 Jabber:muadib at jabber.xs4all.nl AMD XP3000+ 1024MB "ik _heb_ niets tegen Microsoft, ik heb iets tegen de uitwassen *van* Microsoft" |
|
|
|
#9 |
|
Messages: n/a
Hébergeur: |
The carbonbased lifeform William James inspired comp.unix.shell with:
> ruby -pe 'sub(/\d+(?=&&&[[:upper:]]+\b)/,"")' Tnx. Theo -- theo at van-werkhoven.nl ICQ:277217131 SuSE Linux linuxcounter.org: 99872 Jabber:muadib at jabber.xs4all.nl AMD XP3000+ 1024MB "ik _heb_ niets tegen Microsoft, ik heb iets tegen de uitwassen *van* Microsoft" |
|
|
|
#10 |
|
Messages: n/a
Hébergeur: |
Theo v. Werkhoven wrote:
> The carbonbased lifeform Janis Papanagnou inspired comp.unix.shell with: > >>Theo v. Werkhoven wrote: > > [..] > >>>173&&&HILLEGOM (wit bord)& >>>should become: >>>&&&HILLEGOM (wit bord)& >> >>How is that "all cap words" defined? The fourth field (assumed & as separator) >>does also contain lower case letters (within the brackets). > > > I know, I wasn't really clear. For a non-Dutch person it's not immediatly > obvious that the words following ^\d+[&&&] are names of places. Those > are the only lines I'm interested in. > > >>If neither is what you want, please clarify. > > > See John's answer. Does his answer explains the requirement YOU want? > > Thanks, > Theo |
|
|
|
#11 |
|
Messages: n/a
Hébergeur: |
> Thanks John, that's exactly what I had in mind. > Now that I read it it's obvious of course, but unfortunatly I'm not very > good in regexps. If you have Python, here's a more readable alternative: for line in open("file"): .... if "&&&" in line: .... for ch in line: .... if ch.isupper(): .... print line[3:].strip() .... break .... else: .... print line.strip() |
|
|
|
#12 |
|
Messages: n/a
Hébergeur: |
The carbonbased lifeform mik3l3374@gmail.com inspired comp.unix.shell with:
> >> Thanks John, that's exactly what I had in mind. >> Now that I read it it's obvious of course, but unfortunatly I'm not very >> good in regexps. > > If you have Python, here's a more readable alternative: > > for line in open("file"): > ... if "&&&" in line: > ... for ch in line: > ... if ch.isupper(): > ... print line[3:].strip() > ... break > ... else: > ... print line.strip() Thanks, I probably will make a Python script to do the complete translation from csv file to LaTeX longtable file, so this comes in handy. Theo -- theo at van-werkhoven.nl ICQ:277217131 SuSE Linux linuxcounter.org: 99872 Jabber:muadib at jabber.xs4all.nl AMD XP3000+ 1024MB "ik _heb_ niets tegen Microsoft, ik heb iets tegen de uitwassen *van* Microsoft" |
|
|
|
#13 |
|
Messages: n/a
Hébergeur: |
Theo v. Werkhoven wrote:
> Hi all, > Sorry to post a (rather) trivial question, but my brain can't find a > high(er) gear at the moment.. > I have a file like: > #v+ > 173&&&HILLEGOM (wit bord)& > 174&11.1&200.5&KRS N208 Rechtdoor&Leidsestraat > 175&0.1&200.6&Einde weg RE&Hyacintenlaan > 176&1.3&201.9&Einde weg RE&Veenburgerweg > 177&0.5&202.4&1ste weg LI ri Vogelenzang&3de Loosterweg > 178&1.0&203.4&ROT LI&Beeklaan, N442 > 179&&&DE ZILK& > 180&2.0&205.4&Einde weg LI&N206 > 181&0.8&206.2&1ste afslag RE&Vogelaardreef > 182&0.7&206.9&VRW RE ri langevelderslag en de weg blijven volgen&Vogelaardreef > 183&5.4&212.3&Bij Cafe-restaurant "De Witte Raaf" RE ri Noordwijk 2&Duinweg > 184&&&NOORDWIJK& > #v- > (It's a route description for our motorclub btw) > This file is going to be imported in a laTeX file, to serve as > longtable data > Anyway, the first figure of each line is a referencei-counter, the all cap words > are place-sign names, and I would like to remove the counter on all lines with > the all-caps names, but not on other lines. > So e.g. > 173&&&HILLEGOM (wit bord)& > should become: > &&&HILLEGOM (wit bord)& Why is "HILLEGOM (wit bord)" in the 4th &-separated field considered "all cap" but "ROT LI" isn't: 173&&&HILLEGOM (wit bord)& 178&1.0&203.4&ROT LI&Beeklaan, N442 > I'v been over awk and ised and grep and whatnot, but I can't get something > working alas. I THINK what you really want is to operate on the lines where one of the fields is empty: $ cat file 173&&&HILLEGOM (wit bord)& 174&11.1&200.5&KRS N208 Rechtdoor&Leidsestraat 175&0.1&200.6&Einde weg RE&Hyacintenlaan 176&1.3&201.9&Einde weg RE&Veenburgerweg 177&0.5&202.4&1ste weg LI ri Vogelenzang&3de Loosterweg 178&1.0&203.4&ROT LI&Beeklaan, N442 179&&&DE ZILK& 180&2.0&205.4&Einde weg LI&N206 181&0.8&206.2&1ste afslag RE&Vogelaardreef 182&0.7&206.9&VRW RE ri langevelderslag en de weg blijven volgen&Vogelaardreef 183&5.4&212.3&Bij Cafe-restaurant "De Witte Raaf" RE ri Noordwijk 2&Duinweg 184&&&NOORDWIJK& $ awk '/&&/{sub(/[^&]*/,"")}1' file &&&HILLEGOM (wit bord)& 174&11.1&200.5&KRS N208 Rechtdoor&Leidsestraat 175&0.1&200.6&Einde weg RE&Hyacintenlaan 176&1.3&201.9&Einde weg RE&Veenburgerweg 177&0.5&202.4&1ste weg LI ri Vogelenzang&3de Loosterweg 178&1.0&203.4&ROT LI&Beeklaan, N442 &&&DE ZILK& 180&2.0&205.4&Einde weg LI&N206 181&0.8&206.2&1ste afslag RE&Vogelaardreef 182&0.7&206.9&VRW RE ri langevelderslag en de weg blijven volgen&Vogelaardreef 183&5.4&212.3&Bij Cafe-restaurant "De Witte Raaf" RE ri Noordwijk 2&Duinweg &&&NOORDWIJK& Regards, Ed. |
|
|
|
#14 |
|
Messages: n/a
Hébergeur: |
The carbonbased lifeform Janis Papanagnou inspired comp.unix.shell with:
> Theo v. Werkhoven wrote: >> The carbonbased lifeform Janis Papanagnou inspired comp.unix.shell with: >> >>>Theo v. Werkhoven wrote: >> >> [..] >> >>>>173&&&HILLEGOM (wit bord)& >>>>should become: >>>>&&&HILLEGOM (wit bord)& >>> >>>How is that "all cap words" defined? The fourth field (assumed & as separator) >>>does also contain lower case letters (within the brackets). >> >> >> I know, I wasn't really clear. For a non-Dutch person it's not immediatly >> obvious that the words following ^\d+[&&&] are names of places. Those >> are the only lines I'm interested in. >> >> >>>If neither is what you want, please clarify. >> >> >> See John's answer. > > Does his answer explains the requirement YOU want? It does. Theo -- theo at van-werkhoven.nl ICQ:277217131 SuSE Linux linuxcounter.org: 99872 Jabber:muadib at jabber.xs4all.nl AMD XP3000+ 1024MB "ik _heb_ niets tegen Microsoft, ik heb iets tegen de uitwassen *van* Microsoft" |
|
|
|
#15 |
|
Messages: n/a
Hébergeur: |
The carbonbased lifeform Ed Morton inspired comp.unix.shell with:
> Theo v. Werkhoven wrote: [..] >> 173&&&HILLEGOM (wit bord)& >> should become: >> &&&HILLEGOM (wit bord)& > > Why is "HILLEGOM (wit bord)" in the 4th &-separated field considered > "all cap" but "ROT LI" isn't: You're right, they're both (all caps) of course, and your conclusion below is what I meant. In the result, the lines without figures between the 1st and 2nd '&', and 2nd and 3th '&', should have the first number stripped. > I THINK what you really want is to operate on the lines where one of the > fields is empty: > > $ awk '/&&/{sub(/[^&]*/,"")}1' file > &&&HILLEGOM (wit bord)& > 174&11.1&200.5&KRS N208 Rechtdoor&Leidsestraat > 175&0.1&200.6&Einde weg RE&Hyacintenlaan > 176&1.3&201.9&Einde weg RE&Veenburgerweg > 177&0.5&202.4&1ste weg LI ri Vogelenzang&3de Loosterweg > 178&1.0&203.4&ROT LI&Beeklaan, N442 > &&&DE ZILK& > 180&2.0&205.4&Einde weg LI&N206 > 181&0.8&206.2&1ste afslag RE&Vogelaardreef > 182&0.7&206.9&VRW RE ri langevelderslag en de weg blijven > volgen&Vogelaardreef > 183&5.4&212.3&Bij Cafe-restaurant "De Witte Raaf" RE ri Noordwijk 2&Duinweg > &&&NOORDWIJK& It looks like your answer does what I need aswell. Thanks. Theo -- theo at van-werkhoven.nl ICQ:277217131 SuSE Linux linuxcounter.org: 99872 Jabber:muadib at jabber.xs4all.nl AMD XP3000+ 1024MB "ik _heb_ niets tegen Microsoft, ik heb iets tegen de uitwassen *van* Microsoft" |
|
![]() |
| Outils de la discussion | |
|
|