PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Autres forums > Forum Programmation & Conception > comp.lang.ruby > html parser with regex, how to solve?
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
html parser with regex, how to solve?

Réponse
 
LinkBack Outils de la discussion
Vieux 06/01/2008, 00h07   #1
Luiz Vitor Martinez Cardoso
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut html parser with regex, how to solve?

[Note: parts of this message were removed to make it a legal post.]

Yeah,

I`m trying to develop a simple application using ruby (when this works i
will pass to rails). I need get the source code from a URL, and find for
this string:

<h3 class="zmp">$299.99</h3>

wow, but i need search for not only 149.00, but for all possible numbers, my
friend suggest this:

<h3 class="zmp">*$\d+\.\d{2}.*</h3>

i think this works! but i need other thing... look my code:

#!/usr/bin/ruby

require 'hpricot'
require 'open-uri'

@content = Hpricot(open("
http://www.newegg.com/Product/Product.aspx?Item=N82E16855101066"))

now how i can find for <h3 class="zmp">*$\d+\.\d{2}.*</h3> ?

@content.search("<h3 class="zmp">*$\d+\.\d{2}.*</h3>") is broken ;(

how i can solved this?


thanks for you attention,
Luiz Vitor Martinez Cardoso.



--
Regards,
Luiz Vitor Martinez Cardoso [Grabber].
(11) 8187-8662

rubz.org - engineer student at maua.br

  Réponse avec citation
Vieux 06/01/2008, 01h10   #2
s.ross
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: html parser with regex, how to solve?

Don't use the regex. Let hpricot do what it's good at:

$ irb
>> require 'rubygems'
>> require 'hpricot'
>> html = '<h3 class="zmp">149.00</h3>'
>> doc = Hpricot.parse(html)
>> ele = doc.search('h3.zmp')
>> puts ele.text

=> 149.00

In your code, your @content will be searchable the same way. Hpricot
will give you a collection of all h3's with class 'zmp'.

http://code.whytheluckystiff.net/doc/hpricot/

Hope this s.


On Jan 5, 2008, at 4:07 PM, Luiz Vitor Martinez Cardoso wrote:

> Yeah,
>
> I`m trying to develop a simple application using ruby (when this
> works i
> will pass to rails). I need get the source code from a URL, and find
> for
> this string:
>
> <h3 class="zmp">$299.99</h3>
>
> wow, but i need search for not only 149.00, but for all possible
> numbers, my
> friend suggest this:
>
> <h3 class="zmp">*$\d+\.\d{2}.*</h3>
>
> i think this works! but i need other thing... look my code:
>
> #!/usr/bin/ruby
>
> require 'hpricot'
> require 'open-uri'
>
> @content = Hpricot(open("
> http://www.newegg.com/Product/Product.aspx?Item=N82E16855101066"))
>
> now how i can find for <h3 class="zmp">*$\d+\.\d{2}.*</h3> ?
>
> @content.search("<h3 class="zmp">*$\d+\.\d{2}.*</h3>") is broken ;(
>
> how i can solved this?
>
>
> thanks for you attention,
> Luiz Vitor Martinez Cardoso.
>
>
>
> --
> Regards,
> Luiz Vitor Martinez Cardoso [Grabber].
> (11) 8187-8662
>
> rubz.org - engineer student at maua.br



  Réponse avec citation
Vieux 06/01/2008, 01h34   #3
Luiz Vitor Martinez Cardoso
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: html parser with regex, how to solve?

[Note: parts of this message were removed to make it a legal post.]

Thanks much! This really works

Now i`m having a new problem (very simple), the output is $1999,00, how i
can remove a $? I will need convert this to a float number

Regards,
Luiz Vitor Martinez Cardoso.

On Jan 5, 2008 11:10 PM, s.ross <cwdinfo@gmail.com> wrote:

> Don't use the regex. Let hpricot do what it's good at:
>
> $ irb
> >> require 'rubygems'
> >> require 'hpricot'
> >> html = '<h3 class="zmp">149.00</h3>'
> >> doc = Hpricot.parse(html)
> >> ele = doc.search('h3.zmp')
> >> puts ele.text

> => 149.00
>
> In your code, your @content will be searchable the same way. Hpricot
> will give you a collection of all h3's with class 'zmp'.
>
> http://code.whytheluckystiff.net/doc/hpricot/
>
> Hope this s.
>
>
> On Jan 5, 2008, at 4:07 PM, Luiz Vitor Martinez Cardoso wrote:
>
> > Yeah,
> >
> > I`m trying to develop a simple application using ruby (when this
> > works i
> > will pass to rails). I need get the source code from a URL, and find
> > for
> > this string:
> >
> > <h3 class="zmp">$299.99</h3>
> >
> > wow, but i need search for not only 149.00, but for all possible
> > numbers, my
> > friend suggest this:
> >
> > <h3 class="zmp">*$\d+\.\d{2}.*</h3>
> >
> > i think this works! but i need other thing... look my code:
> >
> > #!/usr/bin/ruby
> >
> > require 'hpricot'
> > require 'open-uri'
> >
> > @content = Hpricot(open("
> > http://www.newegg.com/Product/Product.aspx?Item=N82E16855101066"))
> >
> > now how i can find for <h3 class="zmp">*$\d+\.\d{2}.*</h3> ?
> >
> > @content.search("<h3 class="zmp">*$\d+\.\d{2}.*</h3>") is broken ;(
> >
> > how i can solved this?
> >
> >
> > thanks for you attention,
> > Luiz Vitor Martinez Cardoso.
> >
> >
> >
> > --
> > Regards,
> > Luiz Vitor Martinez Cardoso [Grabber].
> > (11) 8187-8662
> >
> > rubz.org - engineer student at maua.br

>
>
>



--
Regards,
Luiz Vitor Martinez Cardoso [Grabber].
(11) 8187-8662

rubz.org - engineer student at maua.br

  Réponse avec citation
Vieux 06/01/2008, 02h15   #4
Joe
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: html parser with regex, how to solve?

try this:

ele.text.sub('$', '')

Joe

On Jan 5, 2008 8:34 PM, Luiz Vitor Martinez Cardoso <grabber@gmail.com> wrote:
> Thanks much! This really works
>
> Now i`m having a new problem (very simple), the output is $1999,00, how i
> can remove a $? I will need convert this to a float number
>
> Regards,
> Luiz Vitor Martinez Cardoso.
>
>
> On Jan 5, 2008 11:10 PM, s.ross <cwdinfo@gmail.com> wrote:
>
> > Don't use the regex. Let hpricot do what it's good at:
> >
> > $ irb
> > >> require 'rubygems'
> > >> require 'hpricot'
> > >> html = '<h3 class="zmp">149.00</h3>'
> > >> doc = Hpricot.parse(html)
> > >> ele = doc.search('h3.zmp')
> > >> puts ele.text

> > => 149.00
> >
> > In your code, your @content will be searchable the same way. Hpricot
> > will give you a collection of all h3's with class 'zmp'.
> >
> > http://code.whytheluckystiff.net/doc/hpricot/
> >
> > Hope this s.
> >
> >
> > On Jan 5, 2008, at 4:07 PM, Luiz Vitor Martinez Cardoso wrote:
> >
> > > Yeah,
> > >
> > > I`m trying to develop a simple application using ruby (when this
> > > works i
> > > will pass to rails). I need get the source code from a URL, and find
> > > for
> > > this string:
> > >
> > > <h3 class="zmp">$299.99</h3>
> > >
> > > wow, but i need search for not only 149.00, but for all possible
> > > numbers, my
> > > friend suggest this:
> > >
> > > <h3 class="zmp">*$\d+\.\d{2}.*</h3>
> > >
> > > i think this works! but i need other thing... look my code:
> > >
> > > #!/usr/bin/ruby
> > >
> > > require 'hpricot'
> > > require 'open-uri'
> > >
> > > @content = Hpricot(open("
> > > http://www.newegg.com/Product/Product.aspx?Item=N82E16855101066"))
> > >
> > > now how i can find for <h3 class="zmp">*$\d+\.\d{2}.*</h3> ?
> > >
> > > @content.search("<h3 class="zmp">*$\d+\.\d{2}.*</h3>") is broken ;(
> > >
> > > how i can solved this?
> > >
> > >
> > > thanks for you attention,
> > > Luiz Vitor Martinez Cardoso.
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Luiz Vitor Martinez Cardoso [Grabber].
> > > (11) 8187-8662
> > >
> > > rubz.org - engineer student at maua.br

> >
> >
> >

>
>
> --
> Regards,
> Luiz Vitor Martinez Cardoso [Grabber].
> (11) 8187-8662
>
> rubz.org - engineer student at maua.br
>


  Réponse avec citation
Vieux 06/01/2008, 02h32   #5
Luiz Vitor Martinez Cardoso
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: html parser with regex, how to solve?

[Note: parts of this message were removed to make it a legal post.]

Thanks

I do it!

Regards,
Luiz Vitor Martinez Cardoso.

On Jan 6, 2008 12:15 AM, Joe <qbproger@gmail.com> wrote:

> try this:
>
> ele.text.sub('$', '')
>
> Joe
>
> On Jan 5, 2008 8:34 PM, Luiz Vitor Martinez Cardoso <grabber@gmail.com>
> wrote:
> > Thanks much! This really works
> >
> > Now i`m having a new problem (very simple), the output is $1999,00, how

> i
> > can remove a $? I will need convert this to a float number
> >
> > Regards,
> > Luiz Vitor Martinez Cardoso.
> >
> >
> > On Jan 5, 2008 11:10 PM, s.ross <cwdinfo@gmail.com> wrote:
> >
> > > Don't use the regex. Let hpricot do what it's good at:
> > >
> > > $ irb
> > > >> require 'rubygems'
> > > >> require 'hpricot'
> > > >> html = '<h3 class="zmp">149.00</h3>'
> > > >> doc = Hpricot.parse(html)
> > > >> ele = doc.search('h3.zmp')
> > > >> puts ele.text
> > > => 149.00
> > >
> > > In your code, your @content will be searchable the same way. Hpricot
> > > will give you a collection of all h3's with class 'zmp'.
> > >
> > > http://code.whytheluckystiff.net/doc/hpricot/
> > >
> > > Hope this s.
> > >
> > >
> > > On Jan 5, 2008, at 4:07 PM, Luiz Vitor Martinez Cardoso wrote:
> > >
> > > > Yeah,
> > > >
> > > > I`m trying to develop a simple application using ruby (when this
> > > > works i
> > > > will pass to rails). I need get the source code from a URL, and find
> > > > for
> > > > this string:
> > > >
> > > > <h3 class="zmp">$299.99</h3>
> > > >
> > > > wow, but i need search for not only 149.00, but for all possible
> > > > numbers, my
> > > > friend suggest this:
> > > >
> > > > <h3 class="zmp">*$\d+\.\d{2}.*</h3>
> > > >
> > > > i think this works! but i need other thing... look my code:
> > > >
> > > > #!/usr/bin/ruby
> > > >
> > > > require 'hpricot'
> > > > require 'open-uri'
> > > >
> > > > @content = Hpricot(open("
> > > > http://www.newegg.com/Product/Product.aspx?Item=N82E16855101066"))
> > > >
> > > > now how i can find for <h3 class="zmp">*$\d+\.\d{2}.*</h3> ?
> > > >
> > > > @content.search("<h3 class="zmp">*$\d+\.\d{2}.*</h3>") is broken ;(
> > > >
> > > > how i can solved this?
> > > >
> > > >
> > > > thanks for you attention,
> > > > Luiz Vitor Martinez Cardoso.
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Luiz Vitor Martinez Cardoso [Grabber].
> > > > (11) 8187-8662
> > > >
> > > > rubz.org - engineer student at maua.br
> > >
> > >
> > >

> >
> >
> > --
> > Regards,
> > Luiz Vitor Martinez Cardoso [Grabber].
> > (11) 8187-8662
> >
> > rubz.org - engineer student at maua.br
> >

>
>



--
Regards,
Luiz Vitor Martinez Cardoso [Grabber].
(11) 8187-8662

rubz.org - engineer student at maua.br

  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 07h48.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,14776 seconds with 13 queries