|
|
|
|
||||||
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
I made a query with a join of one table to itself, but the execution time
of thequery takes too long. Afer 200 sec I killed it. SELECT COUNT(*) FROM `statistics_psm` AS `psm1` LEFT JOIN `statistics_psm` AS `psm2` USING(`RemoteAddr`) I know, the query above makes no sense, its just a simplified version of the original. Result of EXPLAIN: +-------+-------+---------------+------------+ | table | type | possible_keys | key | +-------+-------+---------------+------------+ | psm1 | index | NULL | RemoteAddr | | psm2 | ref | RemoteAddr | RemoteAddr | +-------+-------+---------------+------------+ +-------+---------------------+-------+-------------+ | table | ref | rows | Extra | +-------+---------------------+-------+-------------+ | psm1 | NULL | 47034 | Using index | | psm2 | xxx.psm1.RemoteAddr | 3 | Using index | +-------+---------------------+-------+-------------+ Table structure: CREATE TABLE IF NOT EXISTS `priz24_statistics_psm` ( `psm_id` int(10) unsigned NOT NULL default '0', `products_id` int(10) unsigned NOT NULL default '0', `RemoteAddr` varchar(39) NOT NULL default '', `Datetime` datetime NOT NULL default '0000-00-00 00:00:00', `Referer` varchar(255) NOT NULL default '', PRIMARY KEY (`psm_id`,`products_id`,`RemoteAddr`,`Datetime`), KEY `RemoteAddr` (`RemoteAddr`) ) TYPE=MyISAM; Name Typ Kardinalität Feld PRIMARY PRIMARY 47006 psm_id products_id RemoteAddr Datetime RemoteAddr INDEX 15668 RemoteAddr Angaben Value Format dynamic rows 47,006 rowlength ø 41 rowsize ø 85 Bytes I have tested the query on two different MySQL versions, 3.0.x und 5.x, with no differnece. Can somebody tell me, why is this query soo slow and how to speed it up? PS: PSM means comparison shopping site. The table counts the clicks to products in an online shop from different comparison shopping sites. |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
Frank Arthur schreef:
> I made a query with a join of one table to itself, but the execution time > of thequery takes too long. Afer 200 sec I killed it. > > SELECT COUNT(*) > FROM `statistics_psm` AS `psm1` > LEFT JOIN `statistics_psm` AS `psm2` > USING(`RemoteAddr`) > you did not read the manual completly... you need to specify the relationship between `psm1` and `psm2` http://dev.mysql.com/doc/refman/5.0/en/join.html SELECT COUNT(*) FROM `statistics_psm` AS `psm1` LEFT JOIN `statistics_psm` AS `psm2` ON (`psm1`.`psm_id`=`psm2`.`psm_id`) USING(`RemoteAddr`) -- Luuk |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
Luuk wrote:
> Frank Arthur schrieb: > you did not read the manual completly... you need to specify the > relationship between `psm1` and `psm2` > http://dev.mysql.com/doc/refman/5.0/en/join.html > > SELECT COUNT(*) > FROM `statistics_psm` AS `psm1` > LEFT JOIN `statistics_psm` AS `psm2` > ON (`psm1`.`psm_id`=`psm2`.`psm_id`) > USING(`RemoteAddr`) You are wrong. I specified the Relationship with: USING(`RemoteAddr`) This is the same as: ON `psm1`.`RemoteAddr` = `psm2`.`RemoteAddr` I don't want to join by psm_id, because (for the original query) I need the relation between RemoteAddr with and without psm_id. |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
Frank Arthur schreef:
> Luuk wrote: > >> Frank Arthur schrieb: >> you did not read the manual completly... you need to specify the >> relationship between `psm1` and `psm2` >> http://dev.mysql.com/doc/refman/5.0/en/join.html >> >> SELECT COUNT(*) >> FROM `statistics_psm` AS `psm1` >> LEFT JOIN `statistics_psm` AS `psm2` >> ON (`psm1`.`psm_id`=`psm2`.`psm_id`) >> USING(`RemoteAddr`) > > You are wrong. > > I specified the Relationship with: > USING(`RemoteAddr`) > This is the same as: > ON `psm1`.`RemoteAddr` = `psm2`.`RemoteAddr` > > I don't want to join by psm_id, because (for the original query) I need > the relation between RemoteAddr with and without psm_id. sorry, i overlooked... But a second look at your query is think your result set can be large, and because of that slow. for every `RemoteAddr` you are linking al other `RemoteAddr` values, so if a `RemoteAddr` is used often in your database you will get a lot of results if `RemoteAddr` is unique, you'll only get about 47K records if `RemoteAddr` is used on two records your result set is 47K*2 = 94K records .... if `RemoteAddr` is used on 400 records your result set is 47K*400 = 18.8 milion records... can you post the results of ?: select RemoteAddr, count(*) c from statistics_psm group by RemoteAddr order by c desc limit 10; -- Luuk |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
Luuk wrote:
> Frank Arthur schreef: >> I specified the Relationship with: >> USING(`RemoteAddr`) >> This is the same as: >> ON `psm1`.`RemoteAddr` = `psm2`.`RemoteAddr` > > sorry, i overlooked... No problem.^^ > But a second look at your query is think your result set can be large, > and because of that slow. > > for every `RemoteAddr` you are linking al other `RemoteAddr` values, so > if a `RemoteAddr` is used often in your database you will get a lot of > results > > if `RemoteAddr` is unique, you'll only get about 47K records if > `RemoteAddr` is used on two records your result set is 47K*2 = 94K > records > ... > if `RemoteAddr` is used on 400 records your result set is 47K*400 = 18.8 > milion records... > > > can you post the results of ?: mysql> SELECT `RemoteAddr` -> , COUNT(*) AS `c` -> FROM `statistics_psm` -> GROUP BY `RemoteAddr` -> ORDER BY `c` DESC -> LIMIT 10; +----------------+-------+ | RemoteAddr | c | +----------------+-------+ | 66.249.66.20 | 19303 | | 38.98.120.68 | 3609 | | 84.189.229.26 | 395 | | 69.65.122.206 | 310 | | 84.189.235.199 | 293 | | 121.246.24.116 | 144 | | 84.189.217.18 | 94 | | 87.194.5.102 | 85 | | 84.189.238.249 | 80 | | 84.189.246.222 | 75 | +----------------+-------+ 10 rows in set (0.04 sec) You may right. 19303 * 19303 = 372605809 rows This is too much für a fast query. Hmm, I may tray to use a temporary table and delete such IPs with too much entries. |
|
![]() |
| Outils de la discussion | |
|
|