PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Autres forums > Forum Programmation & Conception > comp.lang.ruby > Newbie: what's Ruby idiom for word-by-word input?
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
Newbie: what's Ruby idiom for word-by-word input?

Réponse
 
LinkBack Outils de la discussion
Vieux 17/09/2007, 20h49   #9
William James
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Newbie: what's Ruby idiom for word-by-word input?

On Sep 17, 1:00 pm, Alex Shulgin <alex.shul...@gmail.com> wrote:
> On Sep 17, 6:19 pm, William James <w_a_x_...@yahoo.com> wrote:
>
>
>
> > Awk is a very popular tool for text processing, but there is no
> > way to make it treat a sequence of whitespace characters as a
> > record-separator. So in awk, as in Ruby, text is almost always
> > read a line at a time.

>
> I thought Ruby is not just a text processing tool, but a general
> purpose programming language.


You thought correctly. But when you talk about reading a word at
at time from a text file, you're talking about text processing.
The point is that languages (including Ruby) that were designed
to be very good at processing text usually read a line at a time,
not a word at a time. (A language that is very good at processing
text can still be a general purpose language.) Reading a word at
a time seems to me to be odd and unnecessary, and I do a lot of
text processing. However, here's one way to do it. (It would be
a lot more efficient to read by lines.)

class IO
def get_word
word = nil
while c = self.read(1)
if c =~ /\s/
break if word
else
word||=""
word << c
end
end
word
end
end

File.open('data'){|file|
while w = file.get_word
p w
end
}

  Réponse avec citation
Vieux 17/09/2007, 22h13   #10
Robert Klemme
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Newbie: what's Ruby idiom for word-by-word input?

On 17.09.2007 21:49, William James wrote:
> On Sep 17, 1:00 pm, Alex Shulgin <alex.shul...@gmail.com> wrote:
>> On Sep 17, 6:19 pm, William James <w_a_x_...@yahoo.com> wrote:
>>
>>
>>
>>> Awk is a very popular tool for text processing, but there is no
>>> way to make it treat a sequence of whitespace characters as a
>>> record-separator. So in awk, as in Ruby, text is almost always
>>> read a line at a time.

>> I thought Ruby is not just a text processing tool, but a general
>> purpose programming language.

>
> You thought correctly. But when you talk about reading a word at
> at time from a text file, you're talking about text processing.
> The point is that languages (including Ruby) that were designed
> to be very good at processing text usually read a line at a time,
> not a word at a time. (A language that is very good at processing
> text can still be a general purpose language.) Reading a word at
> a time seems to me to be odd and unnecessary, and I do a lot of
> text processing. However, here's one way to do it. (It would be
> a lot more efficient to read by lines.)
>
> class IO
> def get_word
> word = nil
> while c = self.read(1)
> if c =~ /\s/
> break if word
> else
> word||=""
> word << c
> end
> end
> word
> end
> end
>
> File.open('data'){|file|
> while w = file.get_word
> p w
> end
> }
>


I'd probably encapsulate the word reading in a module so the
implementation can be reused and exchanged if necessary:

module WordIO
def each_word(&b)
each do |line|
line.scan(/\w+/, &b)
end
end
end

class IO
include WordIO

def self.readwords(file)
words = []
open(file) {|io| io.each_word {|wd| words << wd}}
words
end
end

ARGF.extend WordIO

# additional goody
class String
include WordIO
end

:-)

Kind regards

robert
  Réponse avec citation
Vieux 18/09/2007, 01h30   #11
William James
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Newbie: what's Ruby idiom for word-by-word input?

On Sep 17, 4:13 pm, Robert Klemme <shortcut...@googlemail.com> wrote:
> On 17.09.2007 21:49, William James wrote:
>
> > On Sep 17, 1:00 pm, Alex Shulgin <alex.shul...@gmail.com> wrote:
> >> On Sep 17, 6:19 pm, William James <w_a_x_...@yahoo.com> wrote:

>
> >>> Awk is a very popular tool for text processing, but there is no
> >>> way to make it treat a sequence of whitespace characters as a
> >>> record-separator. So in awk, as in Ruby, text is almost always
> >>> read a line at a time.
> >> I thought Ruby is not just a text processing tool, but a general
> >> purpose programming language.

>
> > You thought correctly. But when you talk about reading a word at
> > at time from a text file, you're talking about text processing.
> > The point is that languages (including Ruby) that were designed
> > to be very good at processing text usually read a line at a time,
> > not a word at a time. (A language that is very good at processing
> > text can still be a general purpose language.) Reading a word at
> > a time seems to me to be odd and unnecessary, and I do a lot of
> > text processing. However, here's one way to do it. (It would be
> > a lot more efficient to read by lines.)

>
> > class IO
> > def get_word
> > word = nil
> > while c = self.read(1)
> > if c =~ /\s/
> > break if word
> > else
> > word||=""
> > word << c
> > end
> > end
> > word
> > end
> > end

>
> > File.open('data'){|file|
> > while w = file.get_word
> > p w
> > end
> > }

>
> I'd probably encapsulate the word reading in a module so the
> implementation can be reused and exchanged if necessary:
>
> module WordIO
> def each_word(&b)
> each do |line|
> line.scan(/\w+/, &b)
> end
> end
> end
>
> class IO
> include WordIO
>
> def self.readwords(file)
> words = []
> open(file) {|io| io.each_word {|wd| words << wd}}
> words
> end
> end
>
> ARGF.extend WordIO
>
> # additional goody
> class String
> include WordIO
> end
>
> :-)
>
> Kind regards
>
> robert


Very sophisticated.

Since the o.p. wants whitespace as the word-separator,
the reg.exp. should be changed to /\S+/.

But, dang it all, I'm gonna say you're cheating because
you're still reading lines behind the scenes!
Reading lines and breaking them into words is a lot
easier than reading characters and constructing words.

  Réponse avec citation
Vieux 18/09/2007, 07h22   #12
Bertram Scharpf
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Newbie: what's Ruby idiom for word-by-word input?

Hi,

Am Dienstag, 18. Sep 2007, 06:15:05 +0900 schrieb Robert Klemme:
> module WordIO
> def each_word(&b)
> each do |line|
> line.scan(/\w+/, &b)


Loath to criticize it, but

irb(main):001:0> "tr=E4nen=FCberstr=F6mt".scan /\w+/
=3D> ["tr", "nen", "berstr", "mt"]
irb(main):002:0>

Sigh!

Bertram


--=20
Bertram Scharpf
Stuttgart, Deutschland/Germany
http://www.bertram-scharpf.de

  Réponse avec citation
Vieux 18/09/2007, 08h06   #13
Robert Klemme
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Newbie: what's Ruby idiom for word-by-word input?

2007/9/18, William James <w_a_x_man@yahoo.com>:
> On Sep 17, 4:13 pm, Robert Klemme <shortcut...@googlemail.com> wrote:
> > On 17.09.2007 21:49, William James wrote:
> >
> > > On Sep 17, 1:00 pm, Alex Shulgin <alex.shul...@gmail.com> wrote:
> > >> On Sep 17, 6:19 pm, William James <w_a_x_...@yahoo.com> wrote:

> >
> > >>> Awk is a very popular tool for text processing, but there is no
> > >>> way to make it treat a sequence of whitespace characters as a
> > >>> record-separator. So in awk, as in Ruby, text is almost always
> > >>> read a line at a time.
> > >> I thought Ruby is not just a text processing tool, but a general
> > >> purpose programming language.

> >
> > > You thought correctly. But when you talk about reading a word at
> > > at time from a text file, you're talking about text processing.
> > > The point is that languages (including Ruby) that were designed
> > > to be very good at processing text usually read a line at a time,
> > > not a word at a time. (A language that is very good at processing
> > > text can still be a general purpose language.) Reading a word at
> > > a time seems to me to be odd and unnecessary, and I do a lot of
> > > text processing. However, here's one way to do it. (It would be
> > > a lot more efficient to read by lines.)

> >
> > > class IO
> > > def get_word
> > > word = nil
> > > while c = self.read(1)
> > > if c =~ /\s/
> > > break if word
> > > else
> > > word||=""
> > > word << c
> > > end
> > > end
> > > word
> > > end
> > > end

> >
> > > File.open('data'){|file|
> > > while w = file.get_word
> > > p w
> > > end
> > > }

> >
> > I'd probably encapsulate the word reading in a module so the
> > implementation can be reused and exchanged if necessary:
> >
> > module WordIO
> > def each_word(&b)
> > each do |line|
> > line.scan(/\w+/, &b)
> > end
> > end
> > end
> >
> > class IO
> > include WordIO
> >
> > def self.readwords(file)
> > words = []
> > open(file) {|io| io.each_word {|wd| words << wd}}
> > words
> > end
> > end
> >
> > ARGF.extend WordIO
> >
> > # additional goody
> > class String
> > include WordIO
> > end
> >
> > :-)
> >
> > Kind regards
> >
> > robert

>
> Very sophisticated.
>
> Since the o.p. wants whitespace as the word-separator,
> the reg.exp. should be changed to /\S+/.


See also Bertram's remark. Btw, that's probably also the reason why
this is not in the standard: there is probably no one size fits all
definition of "word". We have seen at least two so far and I reckon
there are more. :-)

> But, dang it all, I'm gonna say you're cheating because
> you're still reading lines behind the scenes!


;-) But I said the implementation can be exchanged.

> Reading lines and breaking them into words is a lot
> easier than reading characters and constructing words.


Correct. But just a bit:

module WordIO
def wchar?(c)
/\A\w\z/ =~ c.chr
end

def each_word
word = nil
while ( c = getc )
if wchar? c
(word ||= "") << c
else
yield word if word
word = nil
end
end
self
end
end

Kind regards

robert

  Réponse avec citation
Vieux 18/09/2007, 13h04   #14
James Edward Gray II
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Newbie: what's Ruby idiom for word-by-word input?

On Sep 18, 2007, at 1:22 AM, Bertram Scharpf wrote:

> Hi,
>
> Am Dienstag, 18. Sep 2007, 06:15:05 +0900 schrieb Robert Klemme:
>> module WordIO
>> def each_word(&b)
>> each do |line|
>> line.scan(/\w+/, &b)

>
> Loath to criticize it, but
>
> irb(main):001:0> "tr=E4nen=FCberstr=F6mt".scan /\w+/
> =3D> ["tr", "nen", "berstr", "mt"]
> irb(main):002:0>
>
> Sigh!


$ irb -Ku
>> "tr=E4nen=FCberstr=F6mt".scan /\w+/

=3D> ["tr=E4nen=FCberstr=F6mt"]

James Edward Gray II=

  Réponse avec citation
Vieux 18/09/2007, 19h13   #15
Alex Shulgin
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Newbie: what's Ruby idiom for word-by-word input?

On Sep 18, 3:30 am, William James <w_a_x_...@yahoo.com> wrote:
>
> But, dang it all, I'm gonna say you're cheating because
> you're still reading lines behind the scenes!
> Reading lines and breaking them into words is a lot
> easier than reading characters and constructing words.


Yeah, that is my point. I only see a way to do this efficiently (w/o
reading the whole lines) by writing the routine in C and then using it
in Ruby.

Anyway, I probably won't bother, since there is no real problem--just
curiosity of mine. ;-)


Thanks all for discussing,
Alex

  Réponse avec citation
Vieux 18/09/2007, 22h51   #16
Robert Klemme
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Newbie: what's Ruby idiom for word-by-word input?

On 18.09.2007 20:13, Alex Shulgin wrote:
> On Sep 18, 3:30 am, William James <w_a_x_...@yahoo.com> wrote:
>> But, dang it all, I'm gonna say you're cheating because
>> you're still reading lines behind the scenes!
>> Reading lines and breaking them into words is a lot
>> easier than reading characters and constructing words.

>
> Yeah, that is my point. I only see a way to do this efficiently (w/o
> reading the whole lines) by writing the routine in C and then using it
> in Ruby.


Why do you think Ruby solutions are inefficient? If you fear that
reading individual characters is slow in Ruby: even if you use #getc
Ruby will do buffered IO (I'm not sure about $stdin though).

> Anyway, I probably won't bother, since there is no real problem--just
> curiosity of mine. ;-)


If you are curious why not just take the suggested implementations and
benchmark them. Benchmarking is actually pretty easy in Ruby because
there is module Benchmark already (plus some more advanced variants).

> Thanks all for discussing,


Thank you for bringing up interesting subjects!

Kind regards

robert
  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 11h38.


Édité par : vBulletin® version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,22448 seconds with 16 queries