PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Autres forums > Forum Programmation & Conception > comp.databases.mysql > join to a table itself takes too long
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
join to a table itself takes too long

Réponse
 
LinkBack Outils de la discussion
Vieux 15/02/2008, 08h47   #1
Frank Arthur
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut join to a table itself takes too long

I made a query with a join of one table to itself, but the execution time
of thequery takes too long. Afer 200 sec I killed it.

SELECT COUNT(*)
FROM `statistics_psm` AS `psm1`
LEFT JOIN `statistics_psm` AS `psm2`
USING(`RemoteAddr`)

I know, the query above makes no sense, its just a simplified version of
the original.

Result of EXPLAIN:

+-------+-------+---------------+------------+
| table | type | possible_keys | key |
+-------+-------+---------------+------------+
| psm1 | index | NULL | RemoteAddr |
| psm2 | ref | RemoteAddr | RemoteAddr |
+-------+-------+---------------+------------+
+-------+---------------------+-------+-------------+
| table | ref | rows | Extra |
+-------+---------------------+-------+-------------+
| psm1 | NULL | 47034 | Using index |
| psm2 | xxx.psm1.RemoteAddr | 3 | Using index |
+-------+---------------------+-------+-------------+


Table structure:

CREATE TABLE IF NOT EXISTS `priz24_statistics_psm` (
`psm_id` int(10) unsigned NOT NULL default '0',
`products_id` int(10) unsigned NOT NULL default '0',
`RemoteAddr` varchar(39) NOT NULL default '',
`Datetime` datetime NOT NULL default '0000-00-00 00:00:00',
`Referer` varchar(255) NOT NULL default '',
PRIMARY KEY (`psm_id`,`products_id`,`RemoteAddr`,`Datetime`),
KEY `RemoteAddr` (`RemoteAddr`)
) TYPE=MyISAM;

Name Typ Kardinalität Feld
PRIMARY PRIMARY 47006 psm_id
products_id
RemoteAddr
Datetime
RemoteAddr INDEX 15668 RemoteAddr

Angaben Value
Format dynamic
rows 47,006
rowlength ø 41
rowsize ø 85 Bytes

I have tested the query on two different MySQL versions, 3.0.x und 5.x,
with no differnece.
Can somebody tell me, why is this query soo slow and how to speed it up?

PS:
PSM means comparison shopping site.
The table counts the clicks to products in an online shop from different
comparison shopping sites.
  Réponse avec citation
Vieux 15/02/2008, 09h38   #2
Luuk
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: join to a table itself takes too long

Frank Arthur schreef:
> I made a query with a join of one table to itself, but the execution time
> of thequery takes too long. Afer 200 sec I killed it.
>
> SELECT COUNT(*)
> FROM `statistics_psm` AS `psm1`
> LEFT JOIN `statistics_psm` AS `psm2`
> USING(`RemoteAddr`)
>


you did not read the manual completly...
you need to specify the relationship between `psm1` and `psm2`
http://dev.mysql.com/doc/refman/5.0/en/join.html

SELECT COUNT(*)
FROM `statistics_psm` AS `psm1`
LEFT JOIN `statistics_psm` AS `psm2`
ON (`psm1`.`psm_id`=`psm2`.`psm_id`)
USING(`RemoteAddr`)


--
Luuk
  Réponse avec citation
Vieux 15/02/2008, 10h22   #3
Frank Arthur
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: join to a table itself takes too long

Luuk wrote:

> Frank Arthur schrieb:
> you did not read the manual completly... you need to specify the
> relationship between `psm1` and `psm2`
> http://dev.mysql.com/doc/refman/5.0/en/join.html
>
> SELECT COUNT(*)
> FROM `statistics_psm` AS `psm1`
> LEFT JOIN `statistics_psm` AS `psm2`
> ON (`psm1`.`psm_id`=`psm2`.`psm_id`)
> USING(`RemoteAddr`)


You are wrong.

I specified the Relationship with:
USING(`RemoteAddr`)
This is the same as:
ON `psm1`.`RemoteAddr` = `psm2`.`RemoteAddr`

I don't want to join by psm_id, because (for the original query) I need
the relation between RemoteAddr with and without psm_id.
  Réponse avec citation
Vieux 15/02/2008, 11h11   #4
Luuk
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: join to a table itself takes too long

Frank Arthur schreef:
> Luuk wrote:
>
>> Frank Arthur schrieb:
>> you did not read the manual completly... you need to specify the
>> relationship between `psm1` and `psm2`
>> http://dev.mysql.com/doc/refman/5.0/en/join.html
>>
>> SELECT COUNT(*)
>> FROM `statistics_psm` AS `psm1`
>> LEFT JOIN `statistics_psm` AS `psm2`
>> ON (`psm1`.`psm_id`=`psm2`.`psm_id`)
>> USING(`RemoteAddr`)

>
> You are wrong.
>
> I specified the Relationship with:
> USING(`RemoteAddr`)
> This is the same as:
> ON `psm1`.`RemoteAddr` = `psm2`.`RemoteAddr`
>
> I don't want to join by psm_id, because (for the original query) I need
> the relation between RemoteAddr with and without psm_id.


sorry, i overlooked...

But a second look at your query is think your result set can be large,
and because of that slow.

for every `RemoteAddr` you are linking al other `RemoteAddr` values, so
if a `RemoteAddr` is used often in your database you will get a lot of
results

if `RemoteAddr` is unique, you'll only get about 47K records
if `RemoteAddr` is used on two records your result set is 47K*2 = 94K
records
....
if `RemoteAddr` is used on 400 records your result set is 47K*400 = 18.8
milion records...


can you post the results of ?:
select RemoteAddr, count(*) c from statistics_psm group by RemoteAddr
order by c desc limit 10;

--
Luuk
  Réponse avec citation
Vieux 15/02/2008, 11h51   #5
Frank Arthur
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: join to a table itself takes too long

Luuk wrote:

> Frank Arthur schreef:
>> I specified the Relationship with:
>> USING(`RemoteAddr`)
>> This is the same as:
>> ON `psm1`.`RemoteAddr` = `psm2`.`RemoteAddr`

>
> sorry, i overlooked...


No problem.^^

> But a second look at your query is think your result set can be large,
> and because of that slow.
>
> for every `RemoteAddr` you are linking al other `RemoteAddr` values, so
> if a `RemoteAddr` is used often in your database you will get a lot of
> results
>
> if `RemoteAddr` is unique, you'll only get about 47K records if
> `RemoteAddr` is used on two records your result set is 47K*2 = 94K
> records
> ...
> if `RemoteAddr` is used on 400 records your result set is 47K*400 = 18.8
> milion records...
>
>
> can you post the results of ?:


mysql> SELECT `RemoteAddr`
-> , COUNT(*) AS `c`
-> FROM `statistics_psm`
-> GROUP BY `RemoteAddr`
-> ORDER BY `c` DESC
-> LIMIT 10;
+----------------+-------+
| RemoteAddr | c |
+----------------+-------+
| 66.249.66.20 | 19303 |
| 38.98.120.68 | 3609 |
| 84.189.229.26 | 395 |
| 69.65.122.206 | 310 |
| 84.189.235.199 | 293 |
| 121.246.24.116 | 144 |
| 84.189.217.18 | 94 |
| 87.194.5.102 | 85 |
| 84.189.238.249 | 80 |
| 84.189.246.222 | 75 |
+----------------+-------+
10 rows in set (0.04 sec)

You may right. 19303 * 19303 = 372605809 rows
This is too much für a fast query.
Hmm, I may tray to use a temporary table and delete such IPs with too
much entries.
  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 01h34.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,12674 seconds with 13 queries