|
|
|
#1 |
|
Messages: n/a
Hébergeur: |
I'm trying to determine why a subquery is slower than running two
separate queries. I have a simple many-to-many association using 3 tables: projects, tags and projects_tags. Here's the query I'm using to find the projects with a given tag: SELECT * FROM projects WHERE id IN (SELECT projects_tags.project_id FROM tags, projects_tags WHERE tags.name='foo' AND projects_tags.project_id=projects.id); (0.36 sec) Compare that with splitting it into two queries: SELECT projects_tags.project_id FROM tags, projects_tags WHERE tags.name='foo' AND projects_tags.project_id=projects.id (0.00 sec) /* returns 1, 2, 3 */ SELECT * FROM projects WHERE id IN (1, 2, 3); (0.00 sec) Why is it so much faster? Looking at the explain statement (below) of the one with the subquery, it appears it's not using the primary key index on the projects table. Why is it that MySQL doesn't perform this simple optimization? And is there a solution that will allow me to still use a subquery? I realize I can use a join instead of a subquery, but this is a simplified example. Here's the explain statement: *************************** 1. row *************************** id: 1 select_type: PRIMARY table: projects type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 15433 Extra: Using where *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: tags type: ref possible_keys: PRIMARY,index_tags_on_name key: index_tags_on_name key_len: 258 ref: const rows: 1 Extra: Using where; Using index *************************** 3. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: projects_tags type: ref possible_keys: tag_id key: tag_id key_len: 5 ref: my_database.tags.id rows: 10 Extra: Using where Here's the table dumps: CREATE TABLE `projects` ( `id` int(11) NOT NULL auto_increment, `name` varchar(255) default NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; CREATE TABLE `tags` ( `id` int(11) NOT NULL auto_increment, `name` varchar(255) default NULL, PRIMARY KEY (`id`), KEY `index_tags_on_name` (`name`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; CREATE TABLE `projects_tags` ( `project_id` int(11) default NULL, `tag_id` int(11) default NULL, KEY `tag_id` (`tag_id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; I'm using MySQL 5.0.37. Thanks in advance. Ryan |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
Ryan,
>Why is it so much faster? Subquery optimisation in MySQL is a problem. For ideas see 'The unbearable slowness of IN()' at http://www.artfulsoftware.com/infotree/queries.php. PB Ryan Bates wrote: > I'm trying to determine why a subquery is slower than running two > separate queries. I have a simple many-to-many association using 3 > tables: projects, tags and projects_tags. Here's the query I'm using > to find the projects with a given tag: > > SELECT * FROM projects WHERE id IN (SELECT projects_tags.project_id > FROM tags, projects_tags WHERE tags.name='foo' AND > projects_tags.project_id=projects.id); > (0.36 sec) > > Compare that with splitting it into two queries: > > SELECT projects_tags.project_id FROM tags, projects_tags WHERE > tags.name='foo' AND projects_tags.project_id=projects.id > (0.00 sec) /* returns 1, 2, 3 */ > > SELECT * FROM projects WHERE id IN (1, 2, 3); > (0.00 sec) > > Why is it so much faster? Looking at the explain statement (below) of > the one with the subquery, it appears it's not using the primary key > index on the projects table. Why is it that MySQL doesn't perform this > simple optimization? And is there a solution that will allow me to > still use a subquery? > > I realize I can use a join instead of a subquery, but this is a > simplified example. > > > Here's the explain statement: > > *************************** 1. row *************************** > id: 1 > select_type: PRIMARY > table: projects > type: ALL > possible_keys: NULL > key: NULL > key_len: NULL > ref: NULL > rows: 15433 > Extra: Using where > *************************** 2. row *************************** > id: 2 > select_type: DEPENDENT SUBQUERY > table: tags > type: ref > possible_keys: PRIMARY,index_tags_on_name > key: index_tags_on_name > key_len: 258 > ref: const > rows: 1 > Extra: Using where; Using index > *************************** 3. row *************************** > id: 2 > select_type: DEPENDENT SUBQUERY > table: projects_tags > type: ref > possible_keys: tag_id > key: tag_id > key_len: 5 > ref: my_database.tags.id > rows: 10 > Extra: Using where > > > Here's the table dumps: > > CREATE TABLE `projects` ( > `id` int(11) NOT NULL auto_increment, > `name` varchar(255) default NULL, > PRIMARY KEY (`id`) > ) ENGINE=InnoDB DEFAULT CHARSET=latin1; > > CREATE TABLE `tags` ( > `id` int(11) NOT NULL auto_increment, > `name` varchar(255) default NULL, > PRIMARY KEY (`id`), > KEY `index_tags_on_name` (`name`) > ) ENGINE=InnoDB DEFAULT CHARSET=latin1; > > CREATE TABLE `projects_tags` ( > `project_id` int(11) default NULL, > `tag_id` int(11) default NULL, > KEY `tag_id` (`tag_id`) > ) ENGINE=InnoDB DEFAULT CHARSET=latin1; > > > I'm using MySQL 5.0.37. Thanks in advance. > > Ryan > |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
Ryan Bates wrote:
> I'm trying to determine why a subquery is slower than running two > separate queries. I have a simple many-to-many association using 3 > tables: projects, tags and projects_tags. Here's the query I'm using to > find the projects with a given tag: > > SELECT * FROM projects WHERE id IN (SELECT projects_tags.project_id FROM > tags, projects_tags WHERE tags.name='foo' AND > projects_tags.project_id=projects.id); > (0.36 sec) As another poster said, this kind of subquery runs slowly, but just to elaborate on it: it's not every subquery that's a problem, just IN() and NOT IN(). Use EXPLAIN EXTENDED followed by SHOW WARNINGS to see what's happening. Baron |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
Indeed, as you say, Brent, correlated subqueries are not well-optimized
in MySQL. The specific subquery (the IN() subquery) demonstrated in the original post is, however, optimized in MySQL 6.0 ![]() More comments inline. Brent Baisley wrote: > You are using a correlated subquery, which MySQL is terrible at. > Whenever you find yourself doing a correlated subquery, see if you can > switch it to a derived table with a join, which MySQL is far better at. > A derived table is like a "virtual" table you create on the fly. It's > very simple, just assign a name to your query and then treat it as if it > is a regular table. Actually, in this case, no need for a derived table. A simple join will suffice: SELECT * FROM projects p JOIN project_tags pt ON p.project_id = pt.project_id JOIN tags t ON pt.tag_id = t.tag_id WHERE tags.name='foo'; Make sure you've got indexes on p (project_id), pt (project_id, tag_id), t (name) Cheers, Jay > So your query would look something like this: > SELECT projects.* FROM projects > JOIN (SELECT projects_tags.project_id FROM tags, projects_tags WHERE > tags.name='foo' AND projects_tags.project_id=projects.id) AS ptagids > ON project.id=ptagids.project_id > > Your IN has become a JOIN and mysql optimizes it far better. > > On Oct 19, 2007, at 6:57 PM, Ryan Bates wrote: > >> I'm trying to determine why a subquery is slower than running two >> separate queries. I have a simple many-to-many association using 3 >> tables: projects, tags and projects_tags. Here's the query I'm using >> to find the projects with a given tag: >> >> SELECT * FROM projects WHERE id IN (SELECT projects_tags.project_id >> FROM tags, projects_tags WHERE tags.name='foo' AND >> projects_tags.project_id=projects.id); >> (0.36 sec) >> >> Compare that with splitting it into two queries: >> >> SELECT projects_tags.project_id FROM tags, projects_tags WHERE >> tags.name='foo' AND projects_tags.project_id=projects.id >> (0.00 sec) /* returns 1, 2, 3 */ >> >> SELECT * FROM projects WHERE id IN (1, 2, 3); >> (0.00 sec) >> >> Why is it so much faster? Looking at the explain statement (below) of >> the one with the subquery, it appears it's not using the primary key >> index on the projects table. Why is it that MySQL doesn't perform this >> simple optimization? And is there a solution that will allow me to >> still use a subquery? >> >> I realize I can use a join instead of a subquery, but this is a >> simplified example. >> >> >> Here's the explain statement: >> >> *************************** 1. row *************************** >> id: 1 >> select_type: PRIMARY >> table: projects >> type: ALL >> possible_keys: NULL >> key: NULL >> key_len: NULL >> ref: NULL >> rows: 15433 >> Extra: Using where >> *************************** 2. row *************************** >> id: 2 >> select_type: DEPENDENT SUBQUERY >> table: tags >> type: ref >> possible_keys: PRIMARY,index_tags_on_name >> key: index_tags_on_name >> key_len: 258 >> ref: const >> rows: 1 >> Extra: Using where; Using index >> *************************** 3. row *************************** >> id: 2 >> select_type: DEPENDENT SUBQUERY >> table: projects_tags >> type: ref >> possible_keys: tag_id >> key: tag_id >> key_len: 5 >> ref: my_database.tags.id >> rows: 10 >> Extra: Using where >> >> >> Here's the table dumps: >> >> CREATE TABLE `projects` ( >> `id` int(11) NOT NULL auto_increment, >> `name` varchar(255) default NULL, >> PRIMARY KEY (`id`) >> ) ENGINE=InnoDB DEFAULT CHARSET=latin1; >> >> CREATE TABLE `tags` ( >> `id` int(11) NOT NULL auto_increment, >> `name` varchar(255) default NULL, >> PRIMARY KEY (`id`), >> KEY `index_tags_on_name` (`name`) >> ) ENGINE=InnoDB DEFAULT CHARSET=latin1; >> >> CREATE TABLE `projects_tags` ( >> `project_id` int(11) default NULL, >> `tag_id` int(11) default NULL, >> KEY `tag_id` (`tag_id`) >> ) ENGINE=InnoDB DEFAULT CHARSET=latin1; >> >> >> I'm using MySQL 5.0.37. Thanks in advance. >> >> Ryan >> >> -- >> MySQL General Mailing List >> For list archives: http://lists.mysql.com/mysql >> To unsubscribe: http://lists.mysql.com/mysql?unsub=brenttech@gmail.com >> > > |
|
![]() |
| Outils de la discussion | |
|
|