|
|
|
#1 |
|
Messages: n/a
Hébergeur: |
MSSQL Server 2000
I have a table with 150M rows, and I have an non-clustered index of an int type field named 'saved_date' . When I issue the SQL Query: SELECT * FROM mytable where saved_date >= 20080101 AND saved_date < 20080201 I always see a table scan taking place, and not an index scan or seek. I did run a DBCC SHOWCONTIG ('mytable') WITH ALL_INDEXES and all is in optimal condition. Why is it opting to do a table scan rather than using my index? |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
> SELECT * FROM mytable where saved_date >= 20080101 AND saved_date
> < 20080201 Are you sure you didn't mean to have single quotes around your dates? > Why is it opting to do a table scan rather than using my index? Because you are using SELECT *, so it needs to read all of the data pages to return all of the data anyway. You should see the behavior change if you switch to: SELECT saved_date FROM mytable WHERE ... Of course that column is probably not relevant all on its own. But choose a smaller column set than *, maybe a set that is covered by indexes. It's hard to recommend anything specific here because we have no idea what your table looks like, what columns have indexes, and which columns you really need. In any case, you should never use SELECT * in "real" code. A |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
Can you script out and send us the table schema and the script of the
indexes on the table? I'm guessing we don't have all the facts yet. The index should speed up the select if it's in the right format. Is that save_date column an INT column or a DATETIME column? If the index is in proper format, it will use the index to limit the number of rows, then join the index to the table to get the subset of rows from the table for the SELECT *, unless it's a small table and the optimizer determines that using the index is pointless because the gain wouldn't be enough. |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
On May 15, 1:29pm, Eric Isaacs <eisa...@gmail.com> wrote:
> Can you script out and send us the table schema and the script of the > indexes on the table? I'm guessing we don't have all the facts yet. > The index should speed up the select if it's in the right format. Is > that save_date column an INT column or a DATETIME column? The column saved_date is of type INT > > If the index is in proper format, it will use the index to limit the > number of rows, then join the index to the table to get the subset of > rows from the table for the SELECT *, unless it's a small table and > the optimizer determines that using the index is pointless because the > gain wouldn't be enough. The range of saved_date is 20070101 20080514 - so the subset of data that I'm attempting to retrieve is fairly small in comparison to the wholesome. I think I'm going to follow the advice of Aaron Bertrand and choose only the few columns that I need for the display. |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
On May 15, 12:47pm, "Aaron Bertrand [SQL Server MVP]"
<ten....@dnartreb.noraa> wrote: > > SELECT * FROM mytable where saved_date >= 20080101 AND saved_date > > < 20080201 > > Are you sure you didn't mean to have single quotes around your dates? > > > Why is it opting to do a table scan rather than using my index? > > Because you are using SELECT *, so it needs to read all of the data pages to > return all of the data anyway. I don't understand why selecting * would/should be any different than choosing specific columns, since shouldn't it be the index that leads as to where the data may be found, and not the columns that the SELECT involves? I don't know how the backend code of MSSQL Server is written, but as a coder, I can't see why the SELECT statement would play a role on the selection of the Index - where can I find more information about this? You should see the behavior change if you > switch to: > > SELECT saved_date FROM mytable WHERE ... > > Of course that column is probably not relevant all on its own. But choose a > smaller column set than *, maybe a set that is covered by indexes. It's > hard to recommend anything specific here because we have no idea what your > table looks like, what columns have indexes, and which columns you really > need. In any case, you should never use SELECT * in "real" code. > > A |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
shija03,
If the selectivity for that range is low (to many rows will be selected), then it could more expensive using the nonclustered index, than scanning the clustered one or the table (heap). That is why Aaron said that the columns being referenced in the column list have an impact on which index could be used. If the nonclustered index is not a covering one for this query, and I guess it is not because you are requesting all columns and the index key is just [saved_date] and you did not mention anything about included columns, so the engine has to use the table or the clustered index if there is one, to pull the data for the columns not in the index. AMB "shija03" wrote: > On May 15, 1:29 pm, Eric Isaacs <eisa...@gmail.com> wrote: > > Can you script out and send us the table schema and the script of the > > indexes on the table? I'm guessing we don't have all the facts yet. > > The index should speed up the select if it's in the right format. Is > > that save_date column an INT column or a DATETIME column? > > The column saved_date is of type INT > > > > > If the index is in proper format, it will use the index to limit the > > number of rows, then join the index to the table to get the subset of > > rows from the table for the SELECT *, unless it's a small table and > > the optimizer determines that using the index is pointless because the > > gain wouldn't be enough. > > The range of saved_date is 20070101 20080514 - so the subset of > data that I'm attempting to retrieve is fairly small in comparison to > the wholesome. > > I think I'm going to follow the advice of Aaron Bertrand and choose > only the few columns that I need for the display. > |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
On May 15, 11:40 am, shija03 <shij...@gmail.com> wrote:
> MSSQL Server 2000 > > I have a table with 150M rows, and I have an non-clustered index of an > int type field named 'saved_date' . When I issue the SQL Query: > > SELECT * FROM mytable where saved_date >= 20080101 AND saved_date > < 20080201 > > I always see a table scan taking place, and not an index scan or seek. > > I did run a DBCC SHOWCONTIG ('mytable') WITH ALL_INDEXES and all is in > optimal condition. > > Why is it opting to do a table scan rather than using my index? your criteria may just be not selective enough to justify using an index - scanning the table may be more efficient |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
On May 15, 1:43 pm, shija03 <shij...@gmail.com> wrote:
> On May 15, 12:47 pm, "Aaron Bertrand [SQL Server MVP]" > > <ten....@dnartreb.noraa> wrote: > > > SELECT * FROM mytable where saved_date >= 20080101 AND saved_date > > > < 20080201 > > > Are you sure you didn't mean to have single quotes around your dates? > > > > Why is it opting to do a table scan rather than using my index? > > > Because you are using SELECT *, so it needs to read all of the data pages to > > return all of the data anyway. > > I don't understand why selecting * would/should be any different than > choosing specific columns, since shouldn't it be the index that leads > as to where the data may be found, and not the columns that the SELECT > involves? I don't know how the backend code of MSSQL Server is > written, but as a coder, I can't see why the SELECT statement would > play a role on the selection of the Index - where can I find more > information about this? read about index covering http://www.devx.com/dbzone/Article/29530 |
|
![]() |
| Outils de la discussion | |
|
|