PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Autres forums > Forum Programmation & Conception > comp.databases.mysql > Optimizing tables to compare bilions of rows. How?
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
Optimizing tables to compare bilions of rows. How?

Réponse
 
LinkBack Outils de la discussion
Vieux 30/10/2007, 17h43   #1
pim@impulzief.nl
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Optimizing tables to compare bilions of rows. How?

Dear all,

Currently I am working on a project that has to do with logging
visitors traffic.
Let's say, every time a visitor has visited the website, one row will
be inserted into a table.

Thing is, just like OneStat, Nedstat or whatever, this project
retrieves it's input from a large number of website. This may result
in let's say 1000 rows per minute from the start but probably a hell
of a lot more.

First question is: How many rows can be processed per minute, or
second?


Now my second question is more difficult. I will also need to compare
the results in a guide page, that will compare all the data collected
and for example show, which site has had the most visitors. For the
last day, week, year... whatever.

My problem is that if I would store all data in one table, this table
will very soon be very very large. I don't know the maximum number of
rows that are allowed, but this will sure influence the speed. The
more rows, the longer it will take of course to compare and show
results in the Guide Page. And finaly I will hit the maximum number of
rows anyhow.

What I am thinking of is to automatically generate a table for every
month but I am not sure if this is wise. Will I be able to compare
fast enough after a year? Let's say I have 12 tables and I want to
compare all rows, let's say there are one bilion rows in each
table...

What I actually like to know I think, is how the database of NedStat
is more or less structure. They probably must have over a milion rows
to store every minute... Can anyone tell me how the manage to store
and compare all this data?


Kind regards,


Pim Zeekoers

  Réponse avec citation
Vieux 30/10/2007, 18h52   #2
Captain Paralytic
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Optimizing tables to compare bilions of rows. How?

On 30 Oct, 15:43, p...@impulzief.nl wrote:
> Dear all,
>
> Currently I am working on a project that has to do with logging
> visitors traffic.

Surely either "visitor traffic" or "visitor's traffic"
> Let's say, every time a visitor has visited the website, one row will
> be inserted into a table.
>
> Thing is, just like OneStat, Nedstat or whatever, this project
> retrieves it's

Surely "its" (see http://groups.google.co.uk/group/alt....no.apostrophe)


> First question is: How many rows can be processed per minute, or
> second?

This depends on many things, for example the processor speed, the disk
speed, how many disks, ...

> Now my second question is more difficult.

The first one was impossible to answer, so how can this be more
difficult?

  Réponse avec citation
Vieux 30/10/2007, 19h16   #3
Good Man
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Optimizing tables to compare bilions of rows. How?

pim@impulzief.nl wrote in news:1193759019.881961.6310
@y42g2000hsy.googlegroups.com:


> What I actually like to know I think, is how the database of NedStat
> is more or less structure. They probably must have over a milion rows
> to store every minute... Can anyone tell me how the manage to store
> and compare all this data?


As far as I know, databases are not used at all for "web analytics" type of
things... they tend to use server logs for information, and process them
into useful information.

  Réponse avec citation
Vieux 30/10/2007, 19h17   #4
pim@impulzief.nl
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Optimizing tables to compare bilions of rows. How?

On 30 okt, 17:52, Captain Paralytic <paul_laut...@yahoo.com> wrote:
> On 30 Oct, 15:43, p...@impulzief.nl wrote:> Dear all,
>
> > Currently I am working on a project that has to do with logging
> > visitors traffic.

>
> Surely either "visitor traffic" or "visitor's traffic"> Let's say, every time a visitor has visited the website, one row will
> > be inserted into a table.

>
> > Thing is, just like OneStat, Nedstat or whatever, this project
> > retrieves it's

>
> Surely "its" (seehttp://groups.google.co.uk/group/alt.possessive.its.has.no.apostrophe)
>
> > First question is: How many rows can be processed per minute, or
> > second?

>
> This depends on many things, for example the processor speed, the disk
> speed, how many disks, ...
>
> > Now my second question is more difficult.

>
> The first one was impossible to answer, so how can this be more
> difficult?



Dear Paul,

I am sorry my English does not meet your standards.
Although you had no answer, you seem to have been able to understand
my questions so I guess it wasn't that bad.

Of course, hardware will limit the amount of rows that can be
processed every minute. I am not looking for a detailed answer but
would just like to know what could normally be achieved under regular
circumstances. Like a regular dedicated server. What can be seen as a
hard maximum?

But the second question was a lot more important of course.

Do you, or someone else, have any tips on how to structure my database
in order to be able to insert unlimited rows to the database and still
be able to compare the results?

Regards,

Pim

  Réponse avec citation
Vieux 30/10/2007, 19h25   #5
pim@impulzief.nl
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Optimizing tables to compare bilions of rows. How?

On 30 okt, 18:16, Good Man <he...@letsgo.com> wrote:
> p...@impulzief.nl wrote in news:1193759019.881961.6310
> @y42g2000hsy.googlegroups.com:
>
> > What I actually like to know I think, is how the database of NedStat
> > is more or less structure. They probably must have over a milion rows
> > to store every minute... Can anyone tell me how the manage to store
> > and compare all this data?

>
> As far as I know, databases are not used at all for "web analytics" type of
> things... they tend to use server logs for information, and process them
> into useful information.


Dear Good Man,

Thank you for your reply.
They don't use databases? Mmm... that will probably have reasons.
But how would I be able to store unlimited rows and still be able to
compare the results?

A friend of mine just suggested to use tables as pairs, or couples ,
that will automatically store certain standard comparisons.
Is that an option?


Pim

  Réponse avec citation
Vieux 30/10/2007, 20h33   #6
Jerry Stuckle
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Optimizing tables to compare bilions of rows. How?

pim@impulzief.nl wrote:
> Dear all,
>
> Currently I am working on a project that has to do with logging
> visitors traffic.
> Let's say, every time a visitor has visited the website, one row will
> be inserted into a table.
>
> Thing is, just like OneStat, Nedstat or whatever, this project
> retrieves it's input from a large number of website. This may result
> in let's say 1000 rows per minute from the start but probably a hell
> of a lot more.
>
> First question is: How many rows can be processed per minute, or
> second?
>


Too many factors to determine. It depends on everything from the data
being inserted and indexes being used to the operating system, memory,
disk speed, etc.

But 1K rows/min. should be easily doable with good hardware, as long as
it isn't huge amounts of data on each insert.

>
> Now my second question is more difficult. I will also need to compare
> the results in a guide page, that will compare all the data collected
> and for example show, which site has had the most visitors. For the
> last day, week, year... whatever.
>
> My problem is that if I would store all data in one table, this table
> will very soon be very very large. I don't know the maximum number of
> rows that are allowed, but this will sure influence the speed. The
> more rows, the longer it will take of course to compare and show
> results in the Guide Page. And finaly I will hit the maximum number of
> rows anyhow.
>


In most cases you'll run into OS limitations first, i.e. the maximum
size of a file allowed. A 64 bit platform will give you more room. And
yes, it will take a while to search such large tables. Perhaps you
would be better off implementing replication, inserting to the master
and searching on the slave.


> What I am thinking of is to automatically generate a table for every
> month but I am not sure if this is wise. Will I be able to compare
> fast enough after a year? Let's say I have 12 tables and I want to
> compare all rows, let's say there are one bilion rows in each
> table...
>


It will be slower to join tables. It means multiple indexes need to be
searched, etc.

> What I actually like to know I think, is how the database of NedStat
> is more or less structure. They probably must have over a milion rows
> to store every minute... Can anyone tell me how the manage to store
> and compare all this data?
>


Not here.

>
> Kind regards,
>
>
> Pim Zeekoers
>
>



--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

  Réponse avec citation
Vieux 30/10/2007, 21h39   #7
pim@impulzief.nl
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Optimizing tables to compare bilions of rows. How?

On 30 okt, 19:33, Jerry Stuckle <jstuck...@attglobal.net> wrote:
> p...@impulzief.nl wrote:
> > Dear all,

>
> > Currently I am working on a project that has to do with logging
> > visitors traffic.
> > Let's say, every time a visitor has visited the website, one row will
> > be inserted into a table.

>
> > Thing is, just like OneStat, Nedstat or whatever, this project
> > retrieves it's input from a large number of website. This may result
> > in let's say 1000 rows per minute from the start but probably a hell
> > of a lot more.

>
> > First question is: How many rows can be processed per minute, or
> > second?

>
> Too many factors to determine. It depends on everything from the data
> being inserted and indexes being used to the operating system, memory,
> disk speed, etc.
>
> But 1K rows/min. should be easily doable with good hardware, as long as
> it isn't huge amounts of data on each insert.
>
>
>
> > Now my second question is more difficult. I will also need to compare
> > the results in a guide page, that will compare all the data collected
> > and for example show, which site has had the most visitors. For the
> > last day, week, year... whatever.

>
> > My problem is that if I would store all data in one table, this table
> > will very soon be very very large. I don't know the maximum number of
> > rows that are allowed, but this will sure influence the speed. The
> > more rows, the longer it will take of course to compare and show
> > results in the Guide Page. And finaly I will hit the maximum number of
> > rows anyhow.

>
> In most cases you'll run into OS limitations first, i.e. the maximum
> size of a file allowed. A 64 bit platform will give you more room. And
> yes, it will take a while to search such large tables. Perhaps you
> would be better off implementing replication, inserting to the master
> and searching on the slave.
>
> > What I am thinking of is to automatically generate a table for every
> > month but I am not sure if this is wise. Will I be able to compare
> > fast enough after a year? Let's say I have 12 tables and I want to
> > compare all rows, let's say there are one bilion rows in each
> > table...

>
> It will be slower to join tables. It means multiple indexes need to be
> searched, etc.
>
> > What I actually like to know I think, is how the database of NedStat
> > is more or less structure. They probably must have over a milion rows
> > to store every minute... Can anyone tell me how the manage to store
> > and compare all this data?

>
> Not here.
>
>
>
> > Kind regards,

>
> > Pim Zeekoers

>
> --
> ==================
> Remove the "x" from my email address
> Jerry Stuckle
> JDS Computer Training Corp.
> jstuck...@attglobal.net
> ==================- Tekst uit oorspronkelijk bericht niet weergeven -
>
> - Tekst uit oorspronkelijk bericht weergeven -


Dear Jerry,

Thank you for your answer.
Sure s me to find the right answer.
The 'real' big table will only have 5 tables with id's so 1k will
never be reached.
I will see what I will do, first have to test your master/slave
solution.

Kind regards,


Pim

  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 00h52.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,18885 seconds with 15 queries