|
|
|
|
||||||
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
We've been dealing with this issue for 3 weeks now, and i've gotten tired,
frustrated, and as some of you may have noticed: a little cranky. We have an application that we wrote. It runs fine on our development 2005 machine, it runs fine on their test 2005 server, but when they installed it on their live 2005 machine, it runs differently. Our developement server is not too impressive (4GB, dual core), their test server was cheapo machine hanging around. Their live server is quad core 8GB monster. There is one representative (note: representative, for examples sake, a case in point) query that takes 90k logical reads here, and 90k logical reads on the test server, but 200k logical reads on the live server - and runs noticably slower. The execution plans are identical on development and test, but different on live. The query can be tuned, and then on live it has an identical execution plan to development and test. The data is identical (backuped and restord). All statistics are up to date WITH FULLSCAN. The question is why is it running poorly on live. In fact the question isn't even why is it running poorly on live, why is it running differenly on live? Boss: "Ian, what i don't understand is why is it running differently on live? It runs fine here on 2005 with the ninety thousand reads. Why is it running differently on live." Ian: (very nearly quoting Andrew J. Kelly) "That's just the way it is. The optimizer in 2005 is deciding to run the query different. This sort of thing is going to happen when people move to 2005 from 2000. It happened when people moved from 7 to 2000, and it happened when people moved from 65 to 7. The optimizer in 2005 is choosing to run the query differently, and there's nothing you can do about it except to tune the query. Most things will be faster in 2005 and those aren't the ones we see. There will be some queries that perform worse. And there's no choise to re-write them, and when they do they'll be faster than they were even on 2000." Boss: "i understand that. But this isn't an upgrade from 2000. We're 2005 here, they tested it on 2005. Why is it running differently on their live 2005 server? Of course you could tune queries, but the queries already run fine on 2005. What's going on that's making it perform poorly on live?" Ian: "i don't know. Nobody can answer that question. The optimizer is deciding what to do, and there's little you can do to understand why it chooses to do something some way. There's no way to understand why the optimizer chose to do something some way. " So, SQL Server community, i turn to you tired and exhausted. Is the answer, "i don't know?" Is the answer that they need to move the database back to a rinky dink server, which can run it better than the super-expensive live monster they have, that runs 20 other databases fine? Of course we could re-tune queries, and in an ideal world we would have retuned them 3 weeks ago and be done with it. But that's not the question. Customer wants to know why it runs good on other 2005 machines, but not good on the server they want it to be installed on. And my hand-waving arguments about optimizer differences between 2000 and 2005 just isn't gonna work. note: my past confusion came from knowing that years ago it was on 2000, but it had since been ported to 2005 - where the queries in question run identically as they did on 2000. |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
Ian,
Understanding why any optimizer picks a certain plan can be one of the most frustrating (and yet rewarding) experiences. I am not a SQL Server expert. I've converted from UDB/DB2/Oracle background. But I believe that optimizers all operate logically and there has to be a difference that you have not noted. You mentioned that the statistics were up-to-date, but did not mention whether they were identical. (yes, it sounds stupid but when something this confusing happens - it sometimes hlps to go back to square one and make sure the simple things really do match up and that you have what you think you have. Then, is there excessive fragmentation in the production file systems ? Are the indexes on the production tables fragmented? Are the test and production systems set for different levels of parallelization? You may have already verified all this, but I do not think the optimizer is magic. Something is different and it may be something more basic that you have not thought to recheck because it seemed to be obvious. |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
1) if you REALLY TRULY want to have the best shot at understanding what is
going on and why, you need to hire a performance tuning expert to come in and evaluate the system. There are simply too many variables and things at play for us to hunt-and-peck a solution from afar. Therer are a number of such people on this forum and more than a few consulting companies that specialize is such work. Note that there is no guarantee that a true reason will be found. 2) Likewise you can hire Microsoft Consulting Services to come in and figure it out. 3) IIRC, you were given lots of advice on configurations to evaluate in your previous threads (chief among them MAXDOP). Did you evaluate every single suggestion made to you? 4) Are the sql server versions EXACTLY IDENTICAL? Are both same bit level (32/64) and patch level? 5) What is the memory and server settings on the mondo box and how does the IO situation differ from test? 6) You certainly have been (more than) a bit cranky. And I assure you that has affected peoples' desire to you now and in the future. Note that 4 and 5 above are questions that do not require an answer. As I led off with, it is VERY unlikely that we can solve this problem to your satisfaction without access to both your test and production servers. -- Kevin G. Boles Indicium Resources, Inc. SQL Server MVP kgboles a earthlink dt net "Ian Boyd" <ian.msnews010@avatopia.com> wrote in message news:eBEWhOB6IHA.192@TK2MSFTNGP06.phx.gbl... > We've been dealing with this issue for 3 weeks now, and i've gotten tired, > frustrated, and as some of you may have noticed: a little cranky. > > > We have an application that we wrote. It runs fine on our development 2005 > machine, it runs fine on their test 2005 server, but when they installed > it on their live 2005 machine, it runs differently. Our developement > server is not too impressive (4GB, dual core), their test server was > cheapo machine hanging around. Their live server is quad core 8GB monster. > > There is one representative (note: representative, for examples sake, a > case in point) query that takes 90k logical reads here, and 90k logical > reads on the test server, but 200k logical reads on the live server - and > runs noticably slower. The execution plans are identical on development > and test, but different on live. The query can be tuned, and then on live > it has an identical execution plan to development and test. > > The data is identical (backuped and restord). All statistics are up to > date WITH FULLSCAN. > > The question is why is it running poorly on live. In fact the question > isn't even why is it running poorly on live, why is it running differenly > on live? > > > Boss: "Ian, what i don't understand is why is it running differently on > live? It runs fine here on 2005 with the ninety thousand reads. Why is it > running differently on live." > > Ian: (very nearly quoting Andrew J. Kelly) "That's just the way it is. > The optimizer in 2005 is deciding to run the query different. This sort of > thing is going to happen when people move to 2005 from 2000. It happened > when people moved from 7 to 2000, and it happened when people moved from > 65 to 7. The optimizer in 2005 is choosing to run the query differently, > and there's nothing you can do about it except to tune the query. Most > things will be faster in 2005 and those aren't the ones we see. There will > be some queries that perform worse. And there's no choise to re-write > them, and when they do they'll be faster than they were even on 2000." > > Boss: "i understand that. But this isn't an upgrade from 2000. We're > 2005 here, they tested it on 2005. Why is it running differently on their > live 2005 server? Of course you could tune queries, but the queries > already run fine on 2005. What's going on that's making it perform poorly > on live?" > > Ian: "i don't know. Nobody can answer that question. The optimizer is > deciding what to do, and there's little you can do to understand why it > chooses to do something some way. There's no way to understand why the > optimizer chose to do something some way. " > > > So, SQL Server community, i turn to you tired and exhausted. Is the > answer, "i don't know?" Is the answer that they need to move the database > back to a rinky dink server, which can run it better than the > super-expensive live monster they have, that runs 20 other databases fine? > Of course we could re-tune queries, and in an ideal world we would have > retuned them 3 weeks ago and be done with it. But that's not the question. > Customer wants to know why it runs good on other 2005 machines, but not > good on the server they want it to be installed on. > > And my hand-waving arguments about optimizer differences between 2000 and > 2005 just isn't gonna work. > > > note: my past confusion came from knowing that years ago it was on 2000, > but it had since been ported to 2005 - where the queries in question run > identically as they did on 2000. > |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
> 6) And I assure you that has affected peoples' desire to you now and
> in the future. k |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
> You mentioned that the statistics were up-to-date, but did not mention
> whether they were identical. (yes, it sounds stupid but when something > this > confusing happens - it sometimes hlps to go back to square one and make > sure > the simple things really do match up and that you have what you think you > have. Well, you're right. After reading off key densities, bucket sizes, and distinct value counts for all the indexes of the tables involved in the query on both servers, the index on one side seemed to indicate that it was about 25% empty. Looking for that index in both execution plans showed that the "live" server wasn't using that index at all. Ian: "How could it be that this index is mostly empty here?" Developer#1: "Oh, you know what that could be. That was probably when we restored their database here we began testing; and [tester#1] created an event with 30,000 people. After we restored their live database again on our development server - the query runs with the same poor execution plan. It runs 50% faster here, but the same plan at least. i'll chalk up the doubled execution time to their busy server. Now we just have to explain to them how more data made it faster. Thank you. And thanks be to Kevin, Andrew, Russell, Roy and Eric - wherever you are. |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
So, now you are seeing exactly the same plan in prod and dev. Are they doing
the same 200k logical reads in both environments? Also, how long does the slow query typically run? If it runs sufficiently long, I'd suggest looking at the I/O block size (Disk Bytes/Read) and the latency (Disk sec/Read) on the drive where the data file(s) resides during the query execution. Perhaps, there is a significant difference in the I//O performance in getting these pages into the buffer pool. In addition to that, if the amount of data that this query touches can fit into the buffer pool, execute the query several times and eliminate physical reads, and compare the performance in both environments (if it's possible to do that). This s to eliminate the I/O subsystem being a factor. Linchi "Ian Boyd" wrote: > > You mentioned that the statistics were up-to-date, but did not mention > > whether they were identical. (yes, it sounds stupid but when something > > this > > confusing happens - it sometimes hlps to go back to square one and make > > sure > > the simple things really do match up and that you have what you think you > > have. > > Well, you're right. After reading off key densities, bucket sizes, and > distinct value counts for all the indexes of the tables involved in the > query on both servers, the index on one side seemed to indicate that it was > about 25% empty. > > Looking for that index in both execution plans showed that the "live" server > wasn't using that index at all. > > Ian: "How could it be that this index is mostly empty here?" > > Developer#1: "Oh, you know what that could be. That was probably when we > restored their database here we began testing; and [tester#1] created an > event with 30,000 people. > > After we restored their live database again on our development server - the > query runs with the same poor execution plan. It runs 50% faster here, but > the same plan at least. i'll chalk up the doubled execution time to their > busy server. > > Now we just have to explain to them how more data made it faster. > > > Thank you. > > And thanks be to Kevin, Andrew, Russell, Roy and Eric - wherever you are. > > > |
|
![]() |
| Outils de la discussion | |
|
|