PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Autres forums > Forum Programmation & Conception > php.general > Very Large text file parsing
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
Very Large text file parsing

Réponse
 
LinkBack Outils de la discussion
Vieux 20/09/2007, 12h23   #1
Paul Scott
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Very Large text file parsing


I have a very large text file that gets dumped into a directoory every
now and then. It is typically around 750MB long, at least, and my
question is:

What is the best method to parse this thing and insert the data into a
postgres db?

I have tried using file(), fget*() and some others, all with limited
success. It goes through OK (I am sending it to a background process on
the server and using a callback function to email me when done) but it
is really knocking the machine hard, besides taking a real long time to
finish.

Is there a better way of approaching this? Any would be greatly
appreciated.

--Paul


All Email originating from UWC is covered by disclaimer http://www.uwc.ac.za/portal/uwc2006/...imer/index.htm

  Réponse avec citation
Vieux 20/09/2007, 12h34   #2
Martin Marques
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: [PHP] Very Large text file parsing

Paul Scott wrote:
> I have a very large text file that gets dumped into a directoory every
> now and then. It is typically around 750MB long, at least, and my
> question is:
>
> What is the best method to parse this thing and insert the data into a
> postgres db?
>
> I have tried using file(), fget*() and some others, all with limited
> success. It goes through OK (I am sending it to a background process on
> the server and using a callback function to email me when done) but it
> is really knocking the machine hard, besides taking a real long time to
> finish.
>
> Is there a better way of approaching this? Any would be greatly
> appreciated.


First, which is your approach? I suspect that you are doing this with a
cron job through php-cli.

Now, to avoid using to many resources, try with fopen() and fgets().
Also work with persistent connections, so you don't have that overhead.

The problem with file() is that it will load all the file to memory, and
you don't want 700+MB in memory.

--
21:50:04 up 2 days, 9:07, 0 users, load average: 0.92, 0.37, 0.18
---------------------------------------------------------
Lic. Martín Marqués | SELECT 'mmarques' ||
Centro de Telemática | '@' || 'unl.edu.ar';
Universidad Nacional | DBA, Programador,
del Litoral | Administrador
---------------------------------------------------------
  Réponse avec citation
Vieux 20/09/2007, 12h42   #3
Lammi
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Very Large text file parsing

better split that file in small parts. it will take hours to parse a
750 mb file, it's not a good idea to allow a script to run for such a
long time. if you're on a multiprocessor machine you may use a single
processor for that job.

is that file updated from day to day and you need to insert the
changes into your database?

  Réponse avec citation
Vieux 20/09/2007, 12h50   #4
Edward Kay
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut RE: [PHP] Very Large text file parsing


> Paul Scott wrote:
> > I have a very large text file that gets dumped into a directoory every
> > now and then. It is typically around 750MB long, at least, and my
> > question is:
> >
> > What is the best method to parse this thing and insert the data into a
> > postgres db?
> >
> > I have tried using file(), fget*() and some others, all with limited
> > success. It goes through OK (I am sending it to a background process on
> > the server and using a callback function to email me when done) but it
> > is really knocking the machine hard, besides taking a real long time to
> > finish.
> >
> > Is there a better way of approaching this? Any would be greatly
> > appreciated.

>
> First, which is your approach? I suspect that you are doing this with a
> cron job through php-cli.
>
> Now, to avoid using to many resources, try with fopen() and fgets().
> Also work with persistent connections, so you don't have that overhead.
>
> The problem with file() is that it will load all the file to memory, and
> you don't want 700+MB in memory.


In addition to Martin's good suggestions (and also assuming you're running
php-cli via cron), you could use nice to stop it consuming too many
resources:

http://en.wikipedia.org/wiki/Nice_%28Unix%29

Edward
  Réponse avec citation
Vieux 20/09/2007, 12h55   #5
Paul Scott
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut RE: [PHP] Very Large text file parsing


On Thu, 2007-09-20 at 12:50 +0100, Edward Kay wrote:
> In addition to Martin's good suggestions (and also assuming you're running
> php-cli via cron), you could use nice to stop it consuming too many
> resources:
>


This is the current approach that I am taking, was just really wondering
if there was some kind of voodoo that would speed things up a bit.

Thanks both for your responses, appreciate it!

--Paul


All Email originating from UWC is covered by disclaimer http://www.uwc.ac.za/portal/uwc2006/...imer/index.htm

  Réponse avec citation
Vieux 20/09/2007, 13h03   #6
Robert Cummings
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut RE: [PHP] Very Large text file parsing

On Thu, 2007-09-20 at 13:55 +0200, Paul Scott wrote:
> On Thu, 2007-09-20 at 12:50 +0100, Edward Kay wrote:
> > In addition to Martin's good suggestions (and also assuming you're running
> > php-cli via cron), you could use nice to stop it consuming too many
> > resources:

>
> This is the current approach that I am taking, was just really wondering
> if there was some kind of voodoo that would speed things up a bit.
>
> Thanks both for your responses, appreciate it!


Post some samples of the data you are parsing and a sample of the code
you've written to parse them. If you're parsing 750 megs of data then
it's quite likely you could squeeze some performance out of the parse
routines themselves.

Cheers,
Rob.
--
.................................................. ..........
SwarmBuy.com - http://www.swarmbuy.com

Leveraging the buying power of the masses!
.................................................. ..........
  Réponse avec citation
Vieux 20/09/2007, 13h25   #7
Paul Scott
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut RE: [PHP] Very Large text file parsing


On Thu, 2007-09-20 at 08:03 -0400, Robert Cummings wrote:
> Post some samples of the data you are parsing and a sample of the code
> you've written to parse them. If you're parsing 750 megs of data then
> it's quite likely you could squeeze some performance out of the parse
> routines themselves.


Today's dataset is in a CSV (tab separated) , so I am using fgetcsv, it
looks like this (geo data):

936374 Roodepoort Roodepoort Roodeport-Maraisburg -26.1666667 27.8666667
P PPL ZA ZA 06 0 1759 Africa/Johannesburg 2004-05-11

Code:
[SNIP]
$row = 1;
$handle = fopen($csvfile, "r");
while (($data = fgetcsv($handle, 1000, "\t")) !== FALSE) {
$num = count($data);
$row++;
$insarr = array('userid' => $userid,
'geonameid' => $data[0],
'name' => $data[1],
'asciiname' => $data[2],
'alternatenames' => $data[3],
'latitude' => $data[4],
'longitude' => $data[5],
'featureclass' => $data[6],
'featurecode' => $data[7],
'countrycode' => $data[8],
'cc2' => $data[9],
'admin1code' => $data[10],
'admin2code' => $data[11],
'population' => $data[12],
'elevation' => $data[13],
'gtopo30' => $data[14],
'timezoneid' => $data[15],
'moddate' => $data[16]
);
$this->objDbGeo->insertRecord($insarr);
//$arr[] = $data;
}
fclose($handle);

--Paul



All Email originating from UWC is covered by disclaimer http://www.uwc.ac.za/portal/uwc2006/...imer/index.htm

  Réponse avec citation
Vieux 20/09/2007, 13h46   #8
Martin Marques
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: [PHP] Very Large text file parsing

Robert Cummings wrote:
> On Thu, 2007-09-20 at 13:55 +0200, Paul Scott wrote:
>> On Thu, 2007-09-20 at 12:50 +0100, Edward Kay wrote:
>>> In addition to Martin's good suggestions (and also assuming you're running
>>> php-cli via cron), you could use nice to stop it consuming too many
>>> resources:

>> This is the current approach that I am taking, was just really wondering
>> if there was some kind of voodoo that would speed things up a bit.
>>
>> Thanks both for your responses, appreciate it!

>
> Post some samples of the data you are parsing and a sample of the code
> you've written to parse them. If you're parsing 750 megs of data then
> it's quite likely you could squeeze some performance out of the parse
> routines themselves.


Adding to Robert's comment, try not to copy variables, but to reference
them, so you will use less resources.

Also free DB results after query.

--
21:50:04 up 2 days, 9:07, 0 users, load average: 0.92, 0.37, 0.18
---------------------------------------------------------
Lic. MartÃn Marqués | SELECT 'mmarques' ||
Centro de Telemática | '@' || 'unl.edu.ar';
Universidad Nacional | DBA, Programador,
del Litoral | Administrador
---------------------------------------------------------
  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 22h19.


Édité par : vBulletin® version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,16104 seconds with 16 queries