PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Autres forums > Forum Programmation & Conception > comp.lang.ruby > FasterCSV heavy loads?
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
FasterCSV heavy loads?

Réponse
 
LinkBack Outils de la discussion
Vieux 02/04/2008, 09h08   #1
Michael Linfield
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut FasterCSV heavy loads?

Recently I've attempted to push a huge csv into arrays via code that
looks along the lines of this:

csvFile = FasterCSV.read('data.csv', :headers => true)

array = []

csvFile.each do |row|
array << row['column_name']
end

The problem arises when the csv file is someodd 2 million lines or more.
Normally I would comment about how long it took but I decided to call it
quits after 9 hours of waiting lol. Any ideas on how to handle columns
in CSV docs the same way FasterCSV does?

(And yes, theoretically I could split the 80mb csv into 20 4mb files but
whats the accomplishment in that!)

Thanks,

- Mac
--
Posted via http://www.ruby-forum.com/.

  Réponse avec citation
Vieux 02/04/2008, 13h26   #2
Mike Woodhouse
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: FasterCSV heavy loads?

On Apr 2, 9:08 am, Michael Linfield <globyy3...@hotmail.com> wrote:
> Recently I've attempted to push a huge csv into arrays via code that
> looks along the lines of this:
>
> csvFile = FasterCSV.read('data.csv', :headers => true)
>
> array = []
>
> csvFile.each do |row|
> array << row['column_name']
> end
>
> The problem arises when the csv file is someodd 2 million lines or more.


How many fields in a row? You're appending that many times (2 million
or more) values to an array, which I suspect is where your performance
problem lies.

You could probably check by

csvFile = FasterCSV.read('data.csv', :headers => true)
count = 0
csvFile.each do |row|

end


  Réponse avec citation
Vieux 02/04/2008, 13h29   #3
Mike Woodhouse
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: FasterCSV heavy loads?

On Apr 2, 1:26 pm, Mike Woodhouse <mikewoodho...@gmail.com> wrote:
> On Apr 2, 9:08 am, Michael Linfield <globyy3...@hotmail.com> wrote:
>
> > Recently I've attempted to push a huge csv into arrays via code that
> > looks along the lines of this:

>
> > csvFile = FasterCSV.read('data.csv', :headers => true)

>
> > array = []

>
> > csvFile.each do |row|
> > array << row['column_name']
> > end

>
> > The problem arises when the csv file is someodd 2 million lines or more.

>
> How many fields in a row? You're appending that many times (2 million
> or more) values to an array, which I suspect is where your performance
> problem lies.
>
> You could probably check by
>
> csvFile = FasterCSV.read('data.csv', :headers => true)
> count = 0
> csvFile.each do |row|
>
> end


Hmph. I must have hit some unknown "send" key combination...

I meant to say, before I interrupted myself:

csvFile = FasterCSV.read('data.csv', :headers => true)
count = 0
csvFile.each do |row|
count += 1
end

....which replaces the array append with a lightweight operation. (I
don't know if Ruby is "smart" and likely to skip the iteration with an
empty block - probably not, but adding 1 shouldn't impose a heavy
load)

Mike



  Réponse avec citation
Vieux 02/04/2008, 14h16   #4
James Gray
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: FasterCSV heavy loads?

On Apr 2, 2008, at 3:08 AM, Michael Linfield wrote:

> Recently I've attempted to push a huge csv into arrays via code that
> looks along the lines of this:
>
> csvFile = FasterCSV.read('data.csv', :headers => true)
>
> array = []
>
> csvFile.each do |row|
> array << row['column_name']
> end


That code is pretty inefficient, since it reads the entire file into
memory only to walk over it row by row. Let's just read it row by
row, instead.

column = [ ]
FCSV.foreach('data.csv', :headers => true) do |row|
column << row['column_name']
end

> The problem arises when the csv file is someodd 2 million lines or
> more.
> Normally I would comment about how long it took but I decided to
> call it
> quits after 9 hours of waiting lol.


FasterCSV could be choking on the CSV data, if it's not valid. It
sometimes has to read to the end of the document to know that, which
could take a long while with that much data.

James Edward Gray II

  Réponse avec citation
Vieux 02/04/2008, 18h54   #5
Michael Linfield
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: FasterCSV heavy loads?

Mike Woodhouse wrote:
> On Apr 2, 1:26 pm, Mike Woodhouse <mikewoodho...@gmail.com> wrote:
>> > array << row['column_name']

>> csvFile = FasterCSV.read('data.csv', :headers => true)
>> count = 0
>> csvFile.each do |row|
>>
>> end

>
> Hmph. I must have hit some unknown "send" key combination...
>
> I meant to say, before I interrupted myself:
>
> csvFile = FasterCSV.read('data.csv', :headers => true)
> count = 0
> csvFile.each do |row|
> count += 1
> end
>
> ...which replaces the array append with a lightweight operation. (I
> don't know if Ruby is "smart" and likely to skip the iteration with an
> empty block - probably not, but adding 1 shouldn't impose a heavy
> load)
>
> Mike


> James
>Let's just read it row by
>row, instead.
>
> column = [ ]
> FCSV.foreach('data.csv', :headers => true) do |row|
> column << row['column_name']
> end


Well firstly, I did count the rows already in the csv via

file = File.readlines('data.csv')
file.length

output is about ~2,500,300

To answer your other question Mike the amount of columns is 3. All
integers.

Thanks James for that snippet, though it might be more efficient it
likely cuts the time in half (really nice) however, being that after 9
hours I gave up, I don't know what half even is! The data from each
column is being written into its own array. IE: column1Array = []
column2Array = [] ect.

Usually if the numbers were a defined length in each column I could just
use regexp's to pull them out, however the numbers are mostly random.
I'll give your snippet a shot James and let you know how the results
turn out. Till then any additional thoughts are much appreciated.

Thanks,

Mac
--
Posted via http://www.ruby-forum.com/.

  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 05h25.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,11918 seconds with 13 queries