PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Autres forums > Forum Programmation & Conception > comp.lang.ruby > using HPricot to parse a fiddly table
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
using HPricot to parse a fiddly table

Réponse
 
LinkBack Outils de la discussion
Vieux 06/01/2008, 19h13   #1
Adam Dullenty
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut using HPricot to parse a fiddly table

Hi there,

I'm fairly new to Ruby, previously I was an average programmer in Java,
so it's all a bit foreign to me - especially XPath and cSS. I would be
grateful if someone could give me a hand with a problem I'm having. I
have a table which I'm trying to get the fields from in a certain way.
The table is in the form:

<table>
<tr>
<td>...stuff I don't want...</td>
</tr>
<tr>
<td>
<table>
------------rows i want
<tr>
<td>
<table>
<tr>
<td>Field 1</td>
<td>Field 2</td>
</tr>
</table>
</td>
<td>Field 3</td>
<td>Field 4, Field 5</td>
</tr>
------------end of rows i want
</table>
</td>
</tr>
</table>

I have managed to get HPricot to parse the page and return that HTML for
the table, however I'm struggling to get it into an array in the form
["Field 1", "Field 2", "Field 3", "Field 4", "Field 5"] for each row. I
would have hoped there would be some kind of built in method for
extracting data from a table, but I can't find one.

Thanks again, look forward to a reply
Adam
--
Posted via http://www.ruby-forum.com/.

  Réponse avec citation
Vieux 06/01/2008, 20h39   #2
s.ross
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: using HPricot to parse a fiddly table

For the innermost table, try:

eles = doc.search('table table table td')

for the enclosing table,

eles = doc.search('table table td')

I don't suppose the semantics can be improved any -- like class names
or ids?


On Jan 6, 2008, at 11:13 AM, Adam Dullenty wrote:

> Hi there,
>
> I'm fairly new to Ruby, previously I was an average programmer in
> Java,
> so it's all a bit foreign to me - especially XPath and cSS. I would be
> grateful if someone could give me a hand with a problem I'm having. I
> have a table which I'm trying to get the fields from in a certain way.
> The table is in the form:
>
> <table>
> <tr>
> <td>...stuff I don't want...</td>
> </tr>
> <tr>
> <td>
> <table>
> ------------rows i want
> <tr>
> <td>
> <table>
> <tr>
> <td>Field 1</td>
> <td>Field 2</td>
> </tr>
> </table>
> </td>
> <td>Field 3</td>
> <td>Field 4, Field 5</td>
> </tr>
> ------------end of rows i want
> </table>
> </td>
> </tr>
> </table>
>
> I have managed to get HPricot to parse the page and return that HTML
> for
> the table, however I'm struggling to get it into an array in the form
> ["Field 1", "Field 2", "Field 3", "Field 4", "Field 5"] for each
> row. I
> would have hoped there would be some kind of built in method for
> extracting data from a table, but I can't find one.
>
> Thanks again, look forward to a reply
> Adam
> --
> Posted via http://www.ruby-forum.com/.
>



  Réponse avec citation
Vieux 07/01/2008, 00h49   #3
Adam Dullenty
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: using HPricot to parse a fiddly table

Steve Ross wrote:

> I don't suppose the semantics can be improved any -- like class names
> or ids?


Thanks for your reply. Afraid not, no handy names or ids. The code you
posted I think I was doing anyway in a slightly different form as
"elements2 = (elements/"table//table//td")". Since I posted last though
I've managed to sort it out just by lots of array manipulation.

Thanks for the though :-)
Adam


--
Posted via http://www.ruby-forum.com/.

  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 06h17.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,09200 seconds with 11 queries