|
|
|
|
||||||
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Does the optimizer take into account server load and available memory - and
any other resources it needs? We have a database moved to SQL Server 2005, and like most people it suddenly runs slow. For one query in particular the logical I/O incresed by a factor of 10. My question is not about tuning this query, or tuning any other queries on this new 2005 production server that a customer has. SQL Server's optimizer uses heuristics to determine the best execution plan. In this case, the statistics on the "old" SQL 2000 and the "new" SQL 2005 machines were updated (with fullscan). The database on 2005 is a restore of the database that was on 2000. So if the optimizer is choosing different execution plans, it isn't because of the data distribution - it was the same data with all the rows resampled. So i'm thinking of what else could cause the optimizer to perform a query differently, if not based on DDL and data distribution statistics, and i think of the resources on the server. Perhaps the new 2005 machine has more simultaneous users, or more I/O requests, or it knows it has a slower (or faster) network connection. Perhaps it is in 2-phase commit with transaction log shipping. Maybe there's not enough RAM to handle all 20 live databases. Maybe if there isn't enough available memory, then the optimizer will decide to run the query and have to use a worktable, rather than being able to do everything in memory. Maybe if the network is too slow, it knows it can't dump the rows out the wire fast enough to free up it's own memory. Blah blah blah etc etc. My question is: would an SQL 2005 database engine run a query than another SQL 20005 engine (with identical DDL and statistics), based on availabe resources? Could a busier machine, with hundreds of simultaneous users, and less free RAM (because of hundreds of simultaneous users) choose to run a query differently? Or does the optimizer stick just to the DDL and statistics, and that's it? |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
Ian,
Well, one thing that the optimizer can take into account is availability of extra processors. On a server with more processors available at the time the execution plan is created, it might choose a parallel plan. But if the machine is very busy, it probably will not choose a parallel plan. This seems like a good idea, but many performance problems have been traced to parallelism issues. This is one issue discussed here: http://support.microsoft.com/kb/243589 FWIW, RLF "Ian Boyd" <ian.msnews010@avatopia.com> wrote in message news:%23q0Fn7r5IHA.4468@TK2MSFTNGP02.phx.gbl... > Does the optimizer take into account server load and available memory - > and any other resources it needs? > > We have a database moved to SQL Server 2005, and like most people it > suddenly runs slow. For one query in particular the logical I/O incresed > by a factor of 10. My question is not about tuning this query, or tuning > any other queries on this new 2005 production server that a customer has. > > SQL Server's optimizer uses heuristics to determine the best execution > plan. In this case, the statistics on the "old" SQL 2000 and the "new" SQL > 2005 machines were updated (with fullscan). The database on 2005 is a > restore of the database that was on 2000. > > So if the optimizer is choosing different execution plans, it isn't > because of the data distribution - it was the same data with all the rows > resampled. > > > So i'm thinking of what else could cause the optimizer to perform a query > differently, if not based on DDL and data distribution statistics, and i > think of the resources on the server. Perhaps the new 2005 machine has > more simultaneous users, or more I/O requests, or it knows it has a slower > (or faster) network connection. Perhaps it is in 2-phase commit with > transaction log shipping. Maybe there's not enough RAM to handle all 20 > live databases. > > Maybe if there isn't enough available memory, then the optimizer will > decide to run the query and have to use a worktable, rather than being > able to do everything in memory. Maybe if the network is too slow, it > knows it can't dump the rows out the wire fast enough to free up it's own > memory. Blah blah blah etc etc. > > > My question is: would an SQL 2005 database engine run a query than another > SQL 20005 engine (with identical DDL and statistics), based on availabe > resources? > > Could a busier machine, with hundreds of simultaneous users, and less free > RAM (because of hundreds of simultaneous users) choose to run a query > differently? Or does the optimizer stick just to the DDL and statistics, > and that's it? > > > |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
> This seems like a good idea, but many performance problems have been
> traced to parallelism issues. This is one issue discussed here: > http://support.microsoft.com/kb/243589 Are you sure that's the right kb article? That's an article about generally how to speed up your queries (make sure you have indexes, don't have join hints). They say if it was a parallel plan, for the hell of it try forcing it to single path with OPTION (MAXDOP 1) Not really a discussion of people having issues on multi-cpu/core machines having to either force on or off parallel executions. |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
Ian,
I figured that was enough to give you an idea since you asked what might change and parallelism is one answer. If you want more details, there are many interesting articles: http://technet.microsoft.com/en-us/l.../ms178065.aspx BOL on parallelism http://support.microsoft.com/kb/329204 http://bytes.com/forum/thread144731.html http://www.sql-server-pro.com/max-de...rallelism.html http://blogs.msdn.com/psssql/archive...r-count-s.aspx In some articles you will see that forcing parallelism off is not the only option. Better indexes, re-examining your code, and so forth may be better answers. All the best, RLF "Ian Boyd" <ian.msnews010@avatopia.com> wrote in message news:%232MsDoz5IHA.3420@TK2MSFTNGP05.phx.gbl... >> This seems like a good idea, but many performance problems have been >> traced to parallelism issues. This is one issue discussed here: >> http://support.microsoft.com/kb/243589 > > Are you sure that's the right kb article? That's an article about > generally how to speed up your queries (make sure you have indexes, don't > have join hints). > > They say if it was a parallel plan, for the hell of it try forcing it to > single path with OPTION (MAXDOP 1) > > Not really a discussion of people having issues on multi-cpu/core machines > having to either force on or off parallel executions. > |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
> I figured that was enough to give you an idea since you asked what might
> change and parallelism is one answer. If you want more details, there are > many interesting articles: i thought it would have talked about someone's real world problem, or talk about parallism and how it can affect the optimizer. > http://technet.microsoft.com/en-us/l.../ms178065.aspx BOL on > parallelism > http://support.microsoft.com/kb/329204 > http://bytes.com/forum/thread144731.html > http://www.sql-server-pro.com/max-de...rallelism.html > http://blogs.msdn.com/psssql/archive...r-count-s.aspx > > In some articles you will see that forcing parallelism off is not the only > option. Better indexes, re-examining your code, and so forth may be > better answers. i, like many people, are trying to understand why identical data with identical statistics and identical DDL can run one way on SQL 2000 engines, but vastly different by 2005 engines. i was hoping that it has to be a problem with the optimizer heuristics on the machine it's running, not a change in the optimizer heuristics themselves in 2005. If it's the latter: then everyone is faced with potentially re-writing every query in every application in order to trick the 2005 optimizer to run it like the 2000 optimizer did. |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
> i, like many people, are trying to understand why identical data with
> identical statistics and identical DDL can run one way on SQL 2000 > engines, but vastly different by 2005 engines. For example, just yesterday a guy named PVG posted a question in ..programming. A query of the form: IF ( SELECT COUNT(*) ...) = 0 BEGIN ... END runs in 10 seconds on 2000. On 2005 it takes 120 seconds. That's an order of magnitude slower. If he changed it to: DECLARE @TheCount int SELECT @TheCount=COUNT(*) ... IF @TheCount = 0 BEGIN ... END it ran fast again on 2005. Now everyone can see that the two forms are identical, but 2005 chose to run it very very poorly. But, as Roy Harvey says, "The optimizer will never be perfect, and there will always be cases we have to deal with. Such issues come up when upgrading on a fairly regular basis." Well why is that? You would think that a more advanced of the SQL Server engine would be better and determining identical queries, and running them identically. At the very least 2005 should be able to run a query as good as 2000. Roy goes on to say, "We would never think twice about them when they come up while writing an application because we would simply write the query a bit differently until performance was acceptable, and not even remember there was an issue." Which is simply not true. i come across problems all the time where a query runs poorly. When i rearrange it to an identical form, i post a question in the .programming newsgroup asking, "Why are these two queries not identical? If the optimizer is running them differently it MUST be because they have the potential to return different results. If they were truly identical, they would have identical plans." And the reponse always is, "The optimizer sometimes gets it wrong. If you don't like it, don't upgrade." |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
Ian,
The optimizer code changes with every release of SQL Server and often changes to some degree with service packs as well. Overall in my experience the improvement is significant. However, with change to the optimizer some plans will not be as good as before. Naturally, we notice the pain points more than the improvements. (At least I do. Improvements put no burden on me, but degraded plans and performance mean a lot of effort to figure out what is happening and how to fix it.) I think that, using parallelism as an example, techniques such as better analysis and design of indexes is probably a safe fix for most releases. It exploits a fundamental control on the optimizer's decisions. Hints (whether MAXDOP or something else) either reflect a failing in the optimizer that we have to work around (it happens) or else a failure on our part to fully understand what we should have done. Sorry, that I do not have the information to give you a more detailed answer. You might find some interest in reading some of the Microsoft SQL Server Engine and Query blogs. Links can be found here: http://searchsqlserver.techtarget.co...132175,00.html RLF "Ian Boyd" <ian.msnews010@avatopia.com> wrote in message news:Ooi50Z05IHA.1176@TK2MSFTNGP02.phx.gbl... >> I figured that was enough to give you an idea since you asked what might >> change and parallelism is one answer. If you want more details, there >> are many interesting articles: > > i thought it would have talked about someone's real world problem, or talk > about parallism and how it can affect the optimizer. > >> http://technet.microsoft.com/en-us/l.../ms178065.aspx BOL on >> parallelism >> http://support.microsoft.com/kb/329204 >> http://bytes.com/forum/thread144731.html >> http://www.sql-server-pro.com/max-de...rallelism.html >> http://blogs.msdn.com/psssql/archive...r-count-s.aspx >> >> In some articles you will see that forcing parallelism off is not the >> only option. Better indexes, re-examining your code, and so forth may be >> better answers. > > i, like many people, are trying to understand why identical data with > identical statistics and identical DDL can run one way on SQL 2000 > engines, but vastly different by 2005 engines. > > i was hoping that it has to be a problem with the optimizer heuristics on > the machine it's running, not a change in the optimizer heuristics > themselves in 2005. If it's the latter: then everyone is faced with > potentially re-writing every query in every application in order to trick > the 2005 optimizer to run it like the 2000 optimizer did. > |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
> My question is: would an SQL 2005 database engine run a query than another
> SQL 20005 engine (with identical DDL and statistics), based on availabe > resources? Yes this is possible but mainly in the area of when to use parallel processing and how many. So a query can easily run faster or slower, use more or less I/O etc. depending on the number of threads it chooses to generate. To some degree this may also be true of the type of joins or processing it does. For instance hashing & sorting are very memory intensive. If one machine has more available memory it may perform better with some of these. > Could a busier machine, with hundreds of simultaneous users, and less free > RAM (because of hundreds of simultaneous users) choose to run a query > differently? Or does the optimizer stick just to the DDL and statistics, > and that's it? Absolutely as explained above. Again this is especially true with parallel queries. A very busy machine will be much less likely to even use parallelism over one with just a few users given the same hardware. > SQL Server's optimizer uses heuristics to determine the best execution > plan. In this case, the statistics on the "old" SQL 2000 and the "new" SQL > 2005 machines were updated (with fullscan). The database on 2005 is a > restore of the database that was on 2000. While it is true the statistics have changed between 2000 and 2005 that is not the only determining factor in how a query plans gets generated. The optimizer is dramatically different in many areas and may very well choose a different plan on 2005 than 2000 given everything else the same. Every upgrade I have seen from 2000 to 2005 has had an increase in overall performance but there are definitely some individual plans that are worse that people have run across. This is due to the changes we mentioned. Most of the plans are the same or better but a few are not. Some you may just need to tweak. One prime example was a client who had a nasty query in which they used some sub selects with NOT IN etc. They came to me and said this runs much slower on 2005. When I looked at the query I immediately indicated that is not how I would write the query in the first place and in this case it should have been a Left Outer Join. When they changed it to that it ran considerably faster than in 2000. This is a classic example of when a query is written poorly the optimizer favors the correct syntax instead. I for one would rather have the optimizer do a better job on correctly written queries than poorly written one at the expense of the others. You can not get 100% of the queries optimized to 100% no matter how hard you try so they have to do what is best overall and some fall thru the cracks. This is true of any product like this. You may simply have to tweak some queries to get the best plan when upgrading but overall most should be fine. -- Andrew J. Kelly SQL MVP Solid Quality Mentors "Ian Boyd" <ian.msnews010@avatopia.com> wrote in message news:%23q0Fn7r5IHA.4468@TK2MSFTNGP02.phx.gbl... > Does the optimizer take into account server load and available memory - > and any other resources it needs? > > We have a database moved to SQL Server 2005, and like most people it > suddenly runs slow. For one query in particular the logical I/O incresed > by a factor of 10. My question is not about tuning this query, or tuning > any other queries on this new 2005 production server that a customer has. > > SQL Server's optimizer uses heuristics to determine the best execution > plan. In this case, the statistics on the "old" SQL 2000 and the "new" SQL > 2005 machines were updated (with fullscan). The database on 2005 is a > restore of the database that was on 2000. > > So if the optimizer is choosing different execution plans, it isn't > because of the data distribution - it was the same data with all the rows > resampled. > > > So i'm thinking of what else could cause the optimizer to perform a query > differently, if not based on DDL and data distribution statistics, and i > think of the resources on the server. Perhaps the new 2005 machine has > more simultaneous users, or more I/O requests, or it knows it has a slower > (or faster) network connection. Perhaps it is in 2-phase commit with > transaction log shipping. Maybe there's not enough RAM to handle all 20 > live databases. > > Maybe if there isn't enough available memory, then the optimizer will > decide to run the query and have to use a worktable, rather than being > able to do everything in memory. Maybe if the network is too slow, it > knows it can't dump the rows out the wire fast enough to free up it's own > memory. Blah blah blah etc etc. > > > My question is: would an SQL 2005 database engine run a query than another > SQL 20005 engine (with identical DDL and statistics), based on availabe > resources? > > Could a busier machine, with hundreds of simultaneous users, and less free > RAM (because of hundreds of simultaneous users) choose to run a query > differently? Or does the optimizer stick just to the DDL and statistics, > and that's it? > > > |
|
|
|
#9 |
|
Messages: n/a
Hébergeur: |
> However, with change to the optimizer some plans will not be as good as
> before. Naturally, we notice the pain points more than the improvements. > (At least I do. Improvements put no burden on me, but degraded plans and > performance mean a lot of effort to figure out what is happening and how > to fix it.) But you understand the confusion right? Even if it query was poorly written, we know that SQL Server has the ability to run it lightening fast. But instead it says, "*tsk* *tsk* *tsk* You shouldn't have done that. Look at this, this is all wrong. You're doing a correlated sub-query? No, no, no. As punishment i'm going to run this slow. Don't write your query so poorly next time." The real problem is a screaming customer, SCREAMING, who wants us to fix 2005 so everything runs like it did on 2000. AND doesn't want us tuning individual queries. "You fixed that slow query, but who's to say we won't find another slow query next week, or next month. i want you to fix it so that EVERYTHING runs as good as it did on 2000." My own reaction is, "F*ck you.", but then i'm now allowed to talk to customers. |
|
|
|
#10 |
|
Messages: n/a
Hébergeur: |
Sorry if you don't like the answers.
The optimizer chooses a plan based on statistics. The statistics are incomplete by their very nature, a description of the data rather than the data itself. This imprecision is one reason the optimizer makes mistakes. Another is that the optimizer makes assumptions about hardware performance, particularly the relative performance of processor and disk, but that is only a general estimate not the specific hardware being used. I can only see two ways for the optimizer can guarantee never to regress and give a worse plan than an earlier version. One would be if the new version is perfect. The second is if the new version is unchanged from the old version. But as long as they keep improving the optimizer without making it perfect, some queries will perform worse. Not many, but enough to raise these questions every time a new release (or sometimes just a service pack) comes out. I suppose I've gone through it enough times to be jaded, so I just rewrite the query and move on. Roy Harvey Beacon Falls, CT On Wed, 16 Jul 2008 09:35:05 -0400, "Ian Boyd" <ian.msnews010@avatopia.com> wrote: >> i, like many people, are trying to understand why identical data with >> identical statistics and identical DDL can run one way on SQL 2000 >> engines, but vastly different by 2005 engines. > >For example, just yesterday a guy named PVG posted a question in >.programming. > >A query of the form: > IF ( > SELECT COUNT(*) ...) = 0 > BEGIN > ... > END >runs in 10 seconds on 2000. On 2005 it takes 120 seconds. That's an order of >magnitude slower. > >If he changed it to: > DECLARE @TheCount int > SELECT @TheCount=COUNT(*) ... > IF @TheCount = 0 > BEGIN > ... > END > >it ran fast again on 2005. Now everyone can see that the two forms are >identical, but 2005 chose to run it very very poorly. But, as Roy Harvey >says, > "The optimizer will never be perfect, and there will always be cases we >have to deal with. Such issues come up when upgrading on a fairly >regular basis." > >Well why is that? You would think that a more advanced of the SQL Server >engine would be better and determining identical queries, and running them >identically. At the very least 2005 should be able to run a query as good as >2000. Roy goes on to say, > "We would never think twice about them when they come up while writing >an application because we would simply write the query a bit differently >until performance was acceptable, and not even remember there was an issue." > >Which is simply not true. i come across problems all the time where a query >runs poorly. When i rearrange it to an identical form, i post a question in >the .programming newsgroup asking, > "Why are these two queries not identical? If the optimizer is running >them differently it MUST be because they have the potential to return >different results. If they were truly identical, they would have identical >plans." > >And the reponse always is, "The optimizer sometimes gets it wrong. If you >don't like it, don't upgrade." > |
|
|
|
#11 |
|
Messages: n/a
Hébergeur: |
> some queries will perform worse. Not many, but enough to
> raise these questions every time a new release (or sometimes just a > service pack) comes out. I suppose I've gone through it enough times > to be jaded, so I just rewrite the query and move on. i would love to re-write the queries and move on. i mean it's obvious that the query can run better, and was probably already sub-optimal. The real problem here is that we have a customer who is demanding that we fix the entire system, without having to go though the system to find slow queries. "You fixed that query, who's to say that we won't find some other poorly performing query tomorrow? We want it to all be fixed, and we don't want to have to keep coming back to fix query after query." It's not me that doesn't like the answers, it's the salesmen and customers. |
|
|
|
#12 |
|
Messages: n/a
Hébergeur: |
> The real problem here is that we have a customer who is demanding that we
> fix the entire system, without having to go though the system to find slow > queries. "You fixed that query, who's to say that we won't find some other > poorly performing query tomorrow? We want it to all be fixed, and we don't > want to have to keep coming back to fix query after query." In reality there are probably only a handful of queries that are an issue and need to be re-written. Finding which ones these are and prioritizing them should take you less than an hour. If you fix the one that is called the most often and is inefficient first you will make the biggest bang for the buck. Usually just a few queries are 90% of the problems in a system. I know this because that is what I do for a living and see it day in and day out from client to client. Yes it is true that when ever you fix one bottleneck another will appear but usually at a rate much less than the previous one. If you tackle just a few of the top ones you can make a world of difference in the overall performance. -- Andrew J. Kelly SQL MVP Solid Quality Mentors |
|
|
|
#13 |
|
Messages: n/a
Hébergeur: |
> i, like many people, are trying to understand why identical data with
> identical statistics and identical DDL can run one way on SQL 2000 > engines, but vastly different by 2005 engines. How about because the 2005 optimizer was almost entirely rewritten? There were MANY instances of performance regression, including one just fixed recently in CU7. You may have stumbled upon another. I will note that in total the optimzier and engine for 2005 is MUCH better than 2000, but if you get hit by a 'hole' it can be painful. -- Kevin G. Boles Indicium Resources, Inc. SQL Server MVP kgboles a earthlink dt net "Ian Boyd" <ian.msnews010@avatopia.com> wrote in message news:Ooi50Z05IHA.1176@TK2MSFTNGP02.phx.gbl... >> I figured that was enough to give you an idea since you asked what might >> change and parallelism is one answer. If you want more details, there >> are many interesting articles: > > i thought it would have talked about someone's real world problem, or talk > about parallism and how it can affect the optimizer. > >> http://technet.microsoft.com/en-us/l.../ms178065.aspx BOL on >> parallelism >> http://support.microsoft.com/kb/329204 >> http://bytes.com/forum/thread144731.html >> http://www.sql-server-pro.com/max-de...rallelism.html >> http://blogs.msdn.com/psssql/archive...r-count-s.aspx >> >> In some articles you will see that forcing parallelism off is not the >> only option. Better indexes, re-examining your code, and so forth may be >> better answers. > > i, like many people, are trying to understand why identical data with > identical statistics and identical DDL can run one way on SQL 2000 > engines, but vastly different by 2005 engines. > > i was hoping that it has to be a problem with the optimizer heuristics on > the machine it's running, not a change in the optimizer heuristics > themselves in 2005. If it's the latter: then everyone is faced with > potentially re-writing every query in every application in order to trick > the 2005 optimizer to run it like the 2000 optimizer did. > |
|
|
|
#14 |
|
Messages: n/a
Hébergeur: |
1) Probably a good thing you aren't allowed to talk to customers. :-))
2) Why don't you simply roll back to 2000 for the product and/or client in question? 3) Better yet, why didn't your internal testing discover this (and other??) performance issues before deciding to migrate to 2005 and fix them during that pre-migration cycle - or decide to not migrate at all due to issues noted? 4) "No, no, no. As punishment i'm going to run this slow. Don't write your query so poorly next time." What a silly statement ... ![]() 5) "The real problem is a screaming customer, SCREAMING, who wants us to fix 2005 so everything runs like it did on 2000. " Seems to me that the real problem is not doing a good job on number 3 above. -- Kevin G. Boles Indicium Resources, Inc. SQL Server MVP kgboles a earthlink dt net "Ian Boyd" <ian.msnews010@avatopia.com> wrote in message news:O8lG1a15IHA.1204@TK2MSFTNGP04.phx.gbl... >> However, with change to the optimizer some plans will not be as good as >> before. Naturally, we notice the pain points more than the improvements. >> (At least I do. Improvements put no burden on me, but degraded plans and >> performance mean a lot of effort to figure out what is happening and how >> to fix it.) > > But you understand the confusion right? > > Even if it query was poorly written, we know that SQL Server has the > ability to run it lightening fast. But instead it says, "*tsk* *tsk* *tsk* > You shouldn't have done that. Look at this, this is all wrong. You're > doing a correlated sub-query? No, no, no. As punishment i'm going to run > this slow. Don't write your query so poorly next time." > > > The real problem is a screaming customer, SCREAMING, who wants us to fix > 2005 so everything runs like it did on 2000. AND doesn't want us tuning > individual queries. "You fixed that slow query, but who's to say we won't > find another slow query next week, or next month. i want you to fix it so > that EVERYTHING runs as good as it did on 2000." > > > My own reaction is, "F*ck you.", but then i'm now allowed to talk to > customers. > |
|
|
|
#15 |
|
Messages: n/a
Hébergeur: |
Andy;
I've been wondering about whether teh SQL Server query optimizer may choose a different plan depending on the current load level. Put parallelism aside for now, I'm not sure it does, and even if it does, I'm not sure it's a good idea as fundamentally that would destroy plan stability. But of course it depends on what it exactly means by the optimizer choosing a different plan depending on the current load level. First, it could mean that teh optimizer actively checks out the current load level. I highly doubt that would be the case. For one thing, it can be expensive for the query optimizer to correctly assess the current load level. And it could be really bad if it gets the load level wrong. And then, what happens if the current load level is very different from the load level when the query starts to execute, i.e. if the load level is rather dynamic? Second, it could mean that the current load level may cause the optimizer to do less or more work in evaluating plans, therefore indirectly impacting the plan generation. That sounds more sensible in terms of the impact the current load may have on plan generation. Still, I'm not sure it's a good idea, for the same reason of wanting to have plan stability. I don't recall any explicit discussions by the MS folks on how the current load level may influence the query optimizer. If anybody has a link to any MS official documentation on this, I'd appreciate it if you could post it. If the load level is an important factor into optimizer's decisions, we'd better know exactly how it may change a query plan. Linchi "Andrew J. Kelly" wrote: > > My question is: would an SQL 2005 database engine run a query than another > > SQL 20005 engine (with identical DDL and statistics), based on availabe > > resources? > > Yes this is possible but mainly in the area of when to use parallel > processing and how many. So a query can easily run faster or slower, use > more or less I/O etc. depending on the number of threads it chooses to > generate. To some degree this may also be true of the type of joins or > processing it does. For instance hashing & sorting are very memory > intensive. If one machine has more available memory it may perform better > with some of these. > > > Could a busier machine, with hundreds of simultaneous users, and less free > > RAM (because of hundreds of simultaneous users) choose to run a query > > differently? Or does the optimizer stick just to the DDL and statistics, > > and that's it? > > Absolutely as explained above. Again this is especially true with parallel > queries. A very busy machine will be much less likely to even use > parallelism over one with just a few users given the same hardware. > > > SQL Server's optimizer uses heuristics to determine the best execution > > plan. In this case, the statistics on the "old" SQL 2000 and the "new" SQL > > 2005 machines were updated (with fullscan). The database on 2005 is a > > restore of the database that was on 2000. > > While it is true the statistics have changed between 2000 and 2005 that is > not the only determining factor in how a query plans gets generated. The > optimizer is dramatically different in many areas and may very well choose a > different plan on 2005 than 2000 given everything else the same. Every > upgrade I have seen from 2000 to 2005 has had an increase in overall > performance but there are definitely some individual plans that are worse > that people have run across. This is due to the changes we mentioned. Most > of the plans are the same or better but a few are not. Some you may just > need to tweak. One prime example was a client who had a nasty query in which > they used some sub selects with NOT IN etc. They came to me and said this > runs much slower on 2005. When I looked at the query I immediately indicated > that is not how I would write the query in the first place and in this case > it should have been a Left Outer Join. When they changed it to that it ran > considerably faster than in 2000. This is a classic example of when a query > is written poorly the optimizer favors the correct syntax instead. I for one > would rather have the optimizer do a better job on correctly written queries > than poorly written one at the expense of the others. You can not get 100% > of the queries optimized to 100% no matter how hard you try so they have to > do what is best overall and some fall thru the cracks. This is true of any > product like this. You may simply have to tweak some queries to get the best > plan when upgrading but overall most should be fine. > > -- > Andrew J. Kelly SQL MVP > Solid Quality Mentors > > > "Ian Boyd" <ian.msnews010@avatopia.com> wrote in message > news:%23q0Fn7r5IHA.4468@TK2MSFTNGP02.phx.gbl... > > Does the optimizer take into account server load and available memory - > > and any other resources it needs? > > > > We have a database moved to SQL Server 2005, and like most people it > > suddenly runs slow. For one query in particular the logical I/O incresed > > by a factor of 10. My question is not about tuning this query, or tuning > > any other queries on this new 2005 production server that a customer has. > > > > SQL Server's optimizer uses heuristics to determine the best execution > > plan. In this case, the statistics on the "old" SQL 2000 and the "new" SQL > > 2005 machines were updated (with fullscan). The database on 2005 is a > > restore of the database that was on 2000. > > > > So if the optimizer is choosing different execution plans, it isn't > > because of the data distribution - it was the same data with all the rows > > resampled. > > > > > > So i'm thinking of what else could cause the optimizer to perform a query > > differently, if not based on DDL and data distribution statistics, and i > > think of the resources on the server. Perhaps the new 2005 machine has > > more simultaneous users, or more I/O requests, or it knows it has a slower > > (or faster) network connection. Perhaps it is in 2-phase commit with > > transaction log shipping. Maybe there's not enough RAM to handle all 20 > > live databases. > > > > Maybe if there isn't enough available memory, then the optimizer will > > decide to run the query and have to use a worktable, rather than being > > able to do everything in memory. Maybe if the network is too slow, it > > knows it can't dump the rows out the wire fast enough to free up it's own > > memory. Blah blah blah etc etc. > > > > > > My question is: would an SQL 2005 database engine run a query than another > > SQL 20005 engine (with identical DDL and statistics), based on availabe > > resources? > > > > Could a busier machine, with hundreds of simultaneous users, and less free > > RAM (because of hundreds of simultaneous users) choose to run a query > > differently? Or does the optimizer stick just to the DDL and statistics, > > and that's it? > > > > > > > > |
|
|
|
#16 |
|
Messages: n/a
Hébergeur: |
Linchi,
Well the focus was definitely on parallelism and it most definitely checks the current load level before running the query each and every time. If a query has a parallel plan (it always has a single threaded one) it will evaluate the current processor load, available memory, number of users etc. to determine if and how many threads it should create and run with. This time it may be 1 and next it may be 32 or anywhere in between. As for other non parallel plans I will have to go back and read Craig Freedman's chapter of Inside SQL Server to see where it affects things. But off hand I seem to remember from the 2000 days that some parts of the overall plan may deviate from one plan to the next based on available memory and such. That may not be true in 2005 anymore. But the performance of the operation can certainly deviate up or down depending on conditions., especially memory. For instance there is a certain amount of memory that a query needs before it will even start to process. If there is insufficient memory it will wait. Or if the amount of memory for a hash or sort operation was underestimated by the optimizer it will have to spill to disk and cause slower performance. So technically a query running with the same query plan can still run dramatically different from iteration to iteration apart from the normal blocking and I/O issues in a performance standpoint. -- Andrew J. Kelly SQL MVP Solid Quality Mentors "Linchi Shea" <LinchiShea@discussions.microsoft.com> wrote in message news:7F573532-1825-4FCE-903D-4482B7D089F9@microsoft.com... > Andy; > > I've been wondering about whether teh SQL Server query optimizer may > choose > a different plan depending on the current load level. Put parallelism > aside > for now, I'm not sure it does, and even if it does, I'm not sure it's a > good > idea as fundamentally that would destroy plan stability. > > But of course it depends on what it exactly means by the optimizer > choosing > a different plan depending on the current load level. First, it could mean > that teh optimizer actively checks out the current load level. I highly > doubt > that would be the case. For one thing, it can be expensive for the query > optimizer to correctly assess the current load level. And it could be > really > bad if it gets the load level wrong. And then, what happens if the current > load level is very different from the load level when the query starts to > execute, i.e. if the load level is rather dynamic? > > Second, it could mean that the current load level may cause the optimizer > to > do less or more work in evaluating plans, therefore indirectly impacting > the > plan generation. That sounds more sensible in terms of the impact the > current > load may have on plan generation. Still, I'm not sure it's a good idea, > for > the same reason of wanting to have plan stability. > > I don't recall any explicit discussions by the MS folks on how the current > load level may influence the query optimizer. If anybody has a link to any > MS > official documentation on this, I'd appreciate it if you could post it. > > If the load level is an important factor into optimizer's decisions, we'd > better know exactly how it may change a query plan. > > Linchi > > "Andrew J. Kelly" wrote: > >> > My question is: would an SQL 2005 database engine run a query than >> > another >> > SQL 20005 engine (with identical DDL and statistics), based on availabe >> > resources? >> >> Yes this is possible but mainly in the area of when to use parallel >> processing and how many. So a query can easily run faster or slower, use >> more or less I/O etc. depending on the number of threads it chooses to >> generate. To some degree this may also be true of the type of joins or >> processing it does. For instance hashing & sorting are very memory >> intensive. If one machine has more available memory it may perform better >> with some of these. >> >> > Could a busier machine, with hundreds of simultaneous users, and less >> > free >> > RAM (because of hundreds of simultaneous users) choose to run a query >> > differently? Or does the optimizer stick just to the DDL and >> > statistics, >> > and that's it? >> >> Absolutely as explained above. Again this is especially true with >> parallel >> queries. A very busy machine will be much less likely to even use >> parallelism over one with just a few users given the same hardware. >> >> > SQL Server's optimizer uses heuristics to determine the best execution >> > plan. In this case, the statistics on the "old" SQL 2000 and the "new" >> > SQL >> > 2005 machines were updated (with fullscan). The database on 2005 is a >> > restore of the database that was on 2000. >> >> While it is true the statistics have changed between 2000 and 2005 that >> is >> not the only determining factor in how a query plans gets generated. The >> optimizer is dramatically different in many areas and may very well >> choose a >> different plan on 2005 than 2000 given everything else the same. Every >> upgrade I have seen from 2000 to 2005 has had an increase in overall >> performance but there are definitely some individual plans that are worse >> that people have run across. This is due to the changes we mentioned. >> Most >> of the plans are the same or better but a few are not. Some you may just >> need to tweak. One prime example was a client who had a nasty query in >> which >> they used some sub selects with NOT IN etc. They came to me and said this >> runs much slower on 2005. When I looked at the query I immediately >> indicated >> that is not how I would write the query in the first place and in this >> case >> it should have been a Left Outer Join. When they changed it to that it >> ran >> considerably faster than in 2000. This is a classic example of when a >> query >> is written poorly the optimizer favors the correct syntax instead. I for >> one >> would rather have the optimizer do a better job on correctly written >> queries >> than poorly written one at the expense of the others. You can not get >> 100% >> of the queries optimized to 100% no matter how hard you try so they have >> to >> do what is best overall and some fall thru the cracks. This is true of >> any >> product like this. You may simply have to tweak some queries to get the >> best >> plan when upgrading but overall most should be fine. >> >> -- >> Andrew J. Kelly SQL MVP >> Solid Quality Mentors >> >> >> "Ian Boyd" <ian.msnews010@avatopia.com> wrote in message >> news:%23q0Fn7r5IHA.4468@TK2MSFTNGP02.phx.gbl... >> > Does the optimizer take into account server load and available memory - >> > and any other resources it needs? >> > >> > We have a database moved to SQL Server 2005, and like most people it >> > suddenly runs slow. For one query in particular the logical I/O >> > incresed >> > by a factor of 10. My question is not about tuning this query, or >> > tuning >> > any other queries on this new 2005 production server that a customer >> > has. >> > >> > SQL Server's optimizer uses heuristics to determine the best execution >> > plan. In this case, the statistics on the "old" SQL 2000 and the "new" >> > SQL >> > 2005 machines were updated (with fullscan). The database on 2005 is a >> > restore of the database that was on 2000. >> > >> > So if the optimizer is choosing different execution plans, it isn't >> > because of the data distribution - it was the same data with all the >> > rows >> > resampled. >> > >> > >> > So i'm thinking of what else could cause the optimizer to perform a >> > query >> > differently, if not based on DDL and data distribution statistics, and >> > i >> > think of the resources on the server. Perhaps the new 2005 machine has >> > more simultaneous users, or more I/O requests, or it knows it has a >> > slower >> > (or faster) network connection. Perhaps it is in 2-phase commit with >> > transaction log shipping. Maybe there's not enough RAM to handle all 20 >> > live databases. >> > >> > Maybe if there isn't enough available memory, then the optimizer will >> > decide to run the query and have to use a worktable, rather than being >> > able to do everything in memory. Maybe if the network is too slow, it >> > knows it can't dump the rows out the wire fast enough to free up it's >> > own >> > memory. Blah blah blah etc etc. >> > >> > >> > My question is: would an SQL 2005 database engine run a query than >> > another >> > SQL 20005 engine (with identical DDL and statistics), based on availabe >> > resources? >> > >> > Could a busier machine, with hundreds of simultaneous users, and less >> > free >> > RAM (because of hundreds of simultaneous users) choose to run a query >> > differently? Or does the optimizer stick just to the DDL and >> > statistics, >> > and that's it? >> > >> > >> > >> >> |
|
|
|
#17 |
|
Messages: n/a
Hébergeur: |
For non-parallel plans, the amount of memory used or required, spill of a hash to disk, and so on are all resource usage issues. None of these involves a different query plan or causes the optimizer to come up with a different plan. It's the same plan whose execution responds to the conditions of the system. Consider the situation when a page is not found in the buffer pool, the page is retrieved from disk. You wouldn't classify the changes in the number of pages read from disk or the changes in the number of pages flushed to disk as deviations from its original plan. Again, for non-parallel queries/plans, I would think it's a bad idea for the optimizer to include the current load level as an input parameter into its plan selection decision for the reasons stated previously. Linchi "Andrew J. Kelly" wrote: > Linchi, > > Well the focus was definitely on parallelism and it most definitely checks > the current load level before running the query each and every time. If a > query has a parallel plan (it always has a single threaded one) it will > evaluate the current processor load, available memory, number of users etc. > to determine if and how many threads it should create and run with. This > time it may be 1 and next it may be 32 or anywhere in between. As for other > non parallel plans I will have to go back and read Craig Freedman's chapter > of Inside SQL Server to see where it affects things. But off hand I seem to > remember from the 2000 days that some parts of the overall plan may deviate > from one plan to the next based on available memory and such. That may not > be true in 2005 anymore. But the performance of the operation can certainly > deviate up or down depending on conditions., especially memory. For instance > there is a certain amount of memory that a query needs before it will even > start to process. If there is insufficient memory it will wait. Or if the > amount of memory for a hash or sort operation was underestimated by the > optimizer it will have to spill to disk and cause slower performance. So > technically a query running with the same query plan can still run > dramatically different from iteration to iteration apart from the normal > blocking and I/O issues in a performance standpoint. > > -- > Andrew J. Kelly SQL MVP > Solid Quality Mentors > > > "Linchi Shea" <LinchiShea@discussions.microsoft.com> wrote in message > news:7F573532-1825-4FCE-903D-4482B7D089F9@microsoft.com... > > Andy; > > > > I've been wondering about whether teh SQL Server query optimizer may > > choose > > a different plan depending on the current load level. Put parallelism > > aside > > for now, I'm not sure it does, and even if it does, I'm not sure it's a > > good > > idea as fundamentally that would destroy plan stability. > > > > But of course it depends on what it exactly means by the optimizer > > choosing > > a different plan depending on the current load level. First, it could mean > > that teh optimizer actively checks out the current load level. I highly > > doubt > > that would be the case. For one thing, it can be expensive for the query > > optimizer to correctly assess the current load level. And it could be > > really > > bad if it gets the load level wrong. And then, what happens if the current > > load level is very different from the load level when the query starts to > > execute, i.e. if the load level is rather dynamic? > > > > Second, it could mean that the current load level may cause the optimizer > > to > > do less or more work in evaluating plans, therefore indirectly impacting > > the > > plan generation. That sounds more sensible in terms of the impact the > > current > > load may have on plan generation. Still, I'm not sure it's a good idea, > > for > > the same reason of wanting to have plan stability. > > > > I don't recall any explicit discussions by the MS folks on how the current > > load level may influence the query optimizer. If anybody has a link to any > > MS > > official documentation on this, I'd appreciate it if you could post it. > > > > If the load level is an important factor into optimizer's decisions, we'd > > better know exactly how it may change a query plan. > > > > Linchi > > > > "Andrew J. Kelly" wrote: > > > >> > My question is: would an SQL 2005 database engine run a query than > >> > another > >> > SQL 20005 engine (with identical DDL and statistics), based on availabe > >> > resources? > >> > >> Yes this is possible but mainly in the area of when to use parallel > >> processing and how many. So a query can easily run faster or slower, use > >> more or less I/O etc. depending on the number of threads it chooses to > >> generate. To some degree this may also be true of the type of joins or > >> processing it does. For instance hashing & sorting are very memory > >> intensive. If one machine has more available memory it may perform better > >> with some of these. > >> > >> > Could a busier machine, with hundreds of simultaneous users, and less > >> > free > >> > RAM (because of hundreds of simultaneous users) choose to run a query > >> > differently? Or does the optimizer stick just to the DDL and > >> > statistics, > >> > and that's it? > >> > >> Absolutely as explained above. Again this is especially true with > >> parallel > >> queries. A very busy machine will be much less likely to even use > >> parallelism over one with just a few users given the same hardware. > >> > >> > SQL Server's optimizer uses heuristics to determine the best execution > >> > plan. In this case, the statistics on the "old" SQL 2000 and the "new" > >> > SQL > >> > 2005 machines were updated (with fullscan). The database on 2005 is a > >> > restore of the database that was on 2000. > >> > >> While it is true the statistics have changed between 2000 and 2005 that > >> is > >> not the only determining factor in how a query plans gets generated. The > >> optimizer is dramatically different in many areas and may very well > >> choose a > >> different plan on 2005 than 2000 given everything else the same. Every > >> upgrade I have seen from 2000 to 2005 has had an increase in overall > >> performance but there are definitely some individual plans that are worse > >> that people have run across. This is due to the changes we mentioned. > >> Most > >> of the plans are the same or better but a few are not. Some you may just > >> need to tweak. One prime example was a client who had a nasty query in > >> which > >> they used some sub selects with NOT IN etc. They came to me and said this > >> runs much slower on 2005. When I looked at the query I immediately > >> indicated > >> that is not how I would write the query in the first place and in this > >> case > >> it should have been a Left Outer Join. When they changed it to that it > >> ran > >> considerably faster than in 2000. This is a classic example of when a > >&g |