|
|
|
|
||||||
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
I am experiencing a strange problem with SQL Server 2K.
Every so often our website application will lock up because it is waiting for SQL server. We run SQL2K on a dedicated server. The server has 3 RAID1 drives (total of 6 disks) setup in the following way: C: System Drive (RAID1) – Contains OS Files D: Data Drive (RAID1) – Contains SQL Databases E: Log Drive (RAID1) – Contains the Log files If I run performance counters on the database server, I can see that everything runs smoothly for a period of time, say 2-10 minutes, and then all of a sudden, the Disk Write Counter for drive D: (Data files) jumps to 100% - this is when everything grinds to a halt. After a few moments, the server recovers again. Please see the screen grabs below I have uploaded to flickr: Normal State - This screen grab shows how the performance counters normally look: http://www.flickr.com/photos/30394576@N03/2848279856/ Problem State - This screen grab shows how the performance counters look when the problem occurs: http://www.flickr.com/photos/30394576@N03/2848279806/ You can see from the second screen grab that the disk write time of drive D: (the data files drive) jumps to 100% yet the processor usage remains consistent through the problem period at about 10% usage. The log files drive E: is also very active during this time. The database constantly does this throughout the day ever 2-10 minutes; it is very annoying as our website application locks up every time this happens. Here is what I have tried so far: 1) I have disabled all maintenance tasks, scheduled jobs etc. but I still get the problem. 2) I have made sure the Auto Grow is not set to a percentage and monitored the size of the data and log files to see if they change size when the problem happens – but they do not change filesize size, so I don’t think this is cause of the problem. 3) I have run the profiler to try and find any long running queries that could be causing the problem, but all I can see are queries that normally run very quickly are taking much longer to run when the problem happens. 4) I have tried changing the Fill factor of clustered indexes of frequently updated tables to 70%. This has not solved the problem. I am now at a loss to what could be causing the problem. Has anybody experienced this type of problem before? Any would be greatly appreciated. Kind Regards George |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
Have you checked the perfmon counters for Check Point Pages Per Second
during that time? If not have a look. If this spikes at the same time it is due to a checkpoint processes. But 2 to 10 minutes is extreme unless you are doing lots of transactions. What about Anti-Virus software? Have you looked at task manager to see what is using CPU & Reads or Writes at that time? And with profiler don't look so much for the duration but rather the Reads or Writes. It could be lots of low duration tasks doing lots of IO. -- Andrew J. Kelly SQL MVP Solid Quality Mentors "george" <george@discussions.microsoft.com> wrote in message news:334B6CCB-15D2-4294-A1EE-4EB478AED901@microsoft.com... >I am experiencing a strange problem with SQL Server 2K. > > Every so often our website application will lock up because it is waiting > for SQL server. > > We run SQL2K on a dedicated server. The server has 3 RAID1 drives (total > of > 6 disks) setup in the following way: > > C: System Drive (RAID1) – Contains OS Files > D: Data Drive (RAID1) – Contains SQL Databases > E: Log Drive (RAID1) – Contains the Log files > > If I run performance counters on the database server, I can see that > everything runs smoothly for a period of time, say 2-10 minutes, and then > all > of a sudden, the Disk Write Counter for drive D: (Data files) jumps to > 100% - > this is when everything grinds to a halt. After a few moments, the server > recovers again. > > Please see the screen grabs below I have uploaded to flickr: > > Normal State - This screen grab shows how the performance counters > normally > look: > http://www.flickr.com/photos/30394576@N03/2848279856/ > > Problem State - This screen grab shows how the performance counters look > when the problem occurs: > http://www.flickr.com/photos/30394576@N03/2848279806/ > > You can see from the second screen grab that the disk write time of drive > D: > (the data files drive) jumps to 100% yet the processor usage remains > consistent through the problem period at about 10% usage. The log files > drive > E: is also very active during this time. > > The database constantly does this throughout the day ever 2-10 minutes; it > is very annoying as our website application locks up every time this > happens. > > > Here is what I have tried so far: > > 1) I have disabled all maintenance tasks, scheduled jobs etc. but I still > get the problem. > > 2) I have made sure the Auto Grow is not set to a percentage and monitored > the size of the data and log files to see if they change size when the > problem happens – but they do not change filesize size, so I don’t think > this > is cause of the problem. > > 3) I have run the profiler to try and find any long running queries that > could be causing the problem, but all I can see are queries that normally > run > very quickly are taking much longer to run when the problem happens. > > 4) I have tried changing the Fill factor of clustered indexes of > frequently > updated tables to 70%. This has not solved the problem. > > I am now at a loss to what could be causing the problem. > > Has anybody experienced this type of problem before? Any would be > greatly appreciated. > > Kind Regards > George > > |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
Dear Andrew, Thank you for your response. I don’t have any anti-virus software running on the server. Task manager shows sqlservr has performed the most write operations. There really isn’t anything else running on the server other than windows os. Check Point Pages Per Second Counter ------------------------------------ During normal running state, the counter is: 0 When the problem occurs, the counter jumps up and down to maximum of: 1875 SQL Profiler ------------ Tried running the profiler to return only queries with writes >= 100 During normal running state: None When the problem occurs: None Any ideas? The server is windows 2003R2 , has 4 Gb of RAM (with 1.8 GB free). "Andrew J. Kelly" wrote: > Have you checked the perfmon counters for Check Point Pages Per Second > during that time? If not have a look. If this spikes at the same time it is > due to a checkpoint processes. But 2 to 10 minutes is extreme unless you are > doing lots of transactions. What about Anti-Virus software? Have you looked > at task manager to see what is using CPU & Reads or Writes at that time? And > with profiler don't look so much for the duration but rather the Reads or > Writes. It could be lots of low duration tasks doing lots of IO. > > -- > Andrew J. Kelly SQL MVP > Solid Quality Mentors > > > "george" <george@discussions.microsoft.com> wrote in message > news:334B6CCB-15D2-4294-A1EE-4EB478AED901@microsoft.com... > >I am experiencing a strange problem with SQL Server 2K. > > > > Every so often our website application will lock up because it is waiting > > for SQL server. > > > > We run SQL2K on a dedicated server. The server has 3 RAID1 drives (total > > of > > 6 disks) setup in the following way: > > > > C: System Drive (RAID1) – Contains OS Files > > D: Data Drive (RAID1) – Contains SQL Databases > > E: Log Drive (RAID1) – Contains the Log files > > > > If I run performance counters on the database server, I can see that > > everything runs smoothly for a period of time, say 2-10 minutes, and then > > all > > of a sudden, the Disk Write Counter for drive D: (Data files) jumps to > > 100% - > > this is when everything grinds to a halt. After a few moments, the server > > recovers again. > > > > Please see the screen grabs below I have uploaded to flickr: > > > > Normal State - This screen grab shows how the performance counters > > normally > > look: > > http://www.flickr.com/photos/30394576@N03/2848279856/ > > > > Problem State - This screen grab shows how the performance counters look > > when the problem occurs: > > http://www.flickr.com/photos/30394576@N03/2848279806/ > > > > You can see from the second screen grab that the disk write time of drive > > D: > > (the data files drive) jumps to 100% yet the processor usage remains > > consistent through the problem period at about 10% usage. The log files > > drive > > E: is also very active during this time. > > > > The database constantly does this throughout the day ever 2-10 minutes; it > > is very annoying as our website application locks up every time this > > happens. > > > > > > Here is what I have tried so far: > > > > 1) I have disabled all maintenance tasks, scheduled jobs etc. but I still > > get the problem. > > > > 2) I have made sure the Auto Grow is not set to a percentage and monitored > > the size of the data and log files to see if they change size when the > > problem happens – but they do not change filesize size, so I don’t think > > this > > is cause of the problem. > > > > 3) I have run the profiler to try and find any long running queries that > > could be causing the problem, but all I can see are queries that normally > > run > > very quickly are taking much longer to run when the problem happens. > > > > 4) I have tried changing the Fill factor of clustered indexes of > > frequently > > updated tables to 70%. This has not solved the problem. > > > > I am now at a loss to what could be causing the problem. > > > > Has anybody experienced this type of problem before? Any would be > > greatly appreciated. > > > > Kind Regards > > George > > > > > > |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
sound like it is a disk issue. If the checkpoints are happening during this
time it means your disks are not configured properly to handle the load or you have a problem with the disks themselves. You probably need more drives for the D: array. A raid 1 can only handle so much IO and to get more you need to add more spindles which means a RAID 10. Another thing to look at is the cache configuration and amount on the controller. If you have 256MB see if it will take 512MB. But the most important thing is to change the cache read / write ratio to 100% write back and 0% read. -- Andrew J. Kelly SQL MVP Solid Quality Mentors "george" <george@discussions.microsoft.com> wrote in message news:9EF6F459-0591-427E-BFF2-7280EF504542@microsoft.com... > > Dear Andrew, > > Thank you for your response. > > > I don’t have any anti-virus software running on the server. > > Task manager shows sqlservr has performed the most write operations. > > There really isn’t anything else running on the server other than windows > os. > > > > Check Point Pages Per Second Counter > ------------------------------------ > > During normal running state, the counter is: 0 > > When the problem occurs, the counter jumps up and down to maximum of: 1875 > > > > SQL Profiler > ------------ > > Tried running the profiler to return only queries with writes >= 100 > > During normal running state: None > > When the problem occurs: None > > > Any ideas? > The server is windows 2003R2 , has 4 Gb of RAM (with 1.8 GB free). > > > "Andrew J. Kelly" wrote: > >> Have you checked the perfmon counters for Check Point Pages Per Second >> during that time? If not have a look. If this spikes at the same time it >> is >> due to a checkpoint processes. But 2 to 10 minutes is extreme unless you >> are >> doing lots of transactions. What about Anti-Virus software? Have you >> looked >> at task manager to see what is using CPU & Reads or Writes at that time? >> And >> with profiler don't look so much for the duration but rather the Reads or >> Writes. It could be lots of low duration tasks doing lots of IO. >> >> -- >> Andrew J. Kelly SQL MVP >> Solid Quality Mentors >> >> >> "george" <george@discussions.microsoft.com> wrote in message >> news:334B6CCB-15D2-4294-A1EE-4EB478AED901@microsoft.com... >> >I am experiencing a strange problem with SQL Server 2K. >> > >> > Every so often our website application will lock up because it is >> > waiting >> > for SQL server. >> > >> > We run SQL2K on a dedicated server. The server has 3 RAID1 drives >> > (total >> > of >> > 6 disks) setup in the following way: >> > >> > C: System Drive (RAID1) – Contains OS Files >> > D: Data Drive (RAID1) – Contains SQL Databases >> > E: Log Drive (RAID1) – Contains the Log files >> > >> > If I run performance counters on the database server, I can see that >> > everything runs smoothly for a period of time, say 2-10 minutes, and >> > then >> > all >> > of a sudden, the Disk Write Counter for drive D: (Data files) jumps to >> > 100% - >> > this is when everything grinds to a halt. After a few moments, the >> > server >> > recovers again. >> > >> > Please see the screen grabs below I have uploaded to flickr: >> > >> > Normal State - This screen grab shows how the performance counters >> > normally >> > look: >> > http://www.flickr.com/photos/30394576@N03/2848279856/ >> > >> > Problem State - This screen grab shows how the performance counters >> > look >> > when the problem occurs: >> > http://www.flickr.com/photos/30394576@N03/2848279806/ >> > >> > You can see from the second screen grab that the disk write time of >> > drive >> > D: >> > (the data files drive) jumps to 100% yet the processor usage remains >> > consistent through the problem period at about 10% usage. The log files >> > drive >> > E: is also very active during this time. >> > >> > The database constantly does this throughout the day ever 2-10 minutes; >> > it >> > is very annoying as our website application locks up every time this >> > happens. >> > >> > >> > Here is what I have tried so far: >> > >> > 1) I have disabled all maintenance tasks, scheduled jobs etc. but I >> > still >> > get the problem. >> > >> > 2) I have made sure the Auto Grow is not set to a percentage and >> > monitored >> > the size of the data and log files to see if they change size when the >> > problem happens – but they do not change filesize size, so I don’t >> > think >> > this >> > is cause of the problem. >> > >> > 3) I have run the profiler to try and find any long running queries >> > that >> > could be causing the problem, but all I can see are queries that >> > normally >> > run >> > very quickly are taking much longer to run when the problem happens. >> > >> > 4) I have tried changing the Fill factor of clustered indexes of >> > frequently >> > updated tables to 70%. This has not solved the problem. >> > >> > I am now at a loss to what could be causing the problem. >> > >> > Has anybody experienced this type of problem before? Any would be >> > greatly appreciated. >> > >> > Kind Regards >> > George >> > >> > >> >> |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
The strange thing is that the problem happens even during off-peek hours.
For example, I am seeing this happening when total transactions/sec = 300 (I don't expect 300 per/sec to be a busy server) Your response made me think that it could be a hardware disk error/problem. So I swapped the data and log files to opposite HD - the data files are now on drive E: and the logs are on drive D: ; this has made no difference, the problem has now swapped from drive D: to drive E: I will have to contact my hosting company to see if they can change the controller cache configuration (I guess this would be done at bios level?) I dont understand why I would be getting the same problem during busy and very quiet times with similar frequency. Could it be anything else? Do you think that it could be too many updates to an indexed table? Would changing the recovery model from full stop the checkpoints? The server hardware is quite substantial: PowerEdge 2950 2 x Duel Xeon Quad Core Processors 4GB RAM 6 x SCSI HD 15K (SAS) Dell \ 10G Drive Controller - SAS/SATA RAID \ PERC 6/i "Andrew J. Kelly" wrote: > sound like it is a disk issue. If the checkpoints are happening during this > time it means your disks are not configured properly to handle the load or > you have a problem with the disks themselves. You probably need more drives > for the D: array. A raid 1 can only handle so much IO and to get more you > need to add more spindles which means a RAID 10. Another thing to look at > is the cache configuration and amount on the controller. If you have 256MB > see if it will take 512MB. But the most important thing is to change the > cache read / write ratio to 100% write back and 0% read. > > > -- > Andrew J. Kelly SQL MVP > Solid Quality Mentors > > > "george" <george@discussions.microsoft.com> wrote in message > news:9EF6F459-0591-427E-BFF2-7280EF504542@microsoft.com... > > > > Dear Andrew, > > > > Thank you for your response. > > > > > > I don’t have any anti-virus software running on the server. > > > > Task manager shows sqlservr has performed the most write operations. > > > > There really isn’t anything else running on the server other than windows > > os. > > > > > > > > Check Point Pages Per Second Counter > > ------------------------------------ > > > > During normal running state, the counter is: 0 > > > > When the problem occurs, the counter jumps up and down to maximum of: 1875 > > > > > > > > SQL Profiler > > ------------ > > > > Tried running the profiler to return only queries with writes >= 100 > > > > During normal running state: None > > > > When the problem occurs: None > > > > > > Any ideas? > > The server is windows 2003R2 , has 4 Gb of RAM (with 1.8 GB free). > > > > > > "Andrew J. Kelly" wrote: > > > >> Have you checked the perfmon counters for Check Point Pages Per Second > >> during that time? If not have a look. If this spikes at the same time it > >> is > >> due to a checkpoint processes. But 2 to 10 minutes is extreme unless you > >> are > >> doing lots of transactions. What about Anti-Virus software? Have you > >> looked > >> at task manager to see what is using CPU & Reads or Writes at that time? > >> And > >> with profiler don't look so much for the duration but rather the Reads or > >> Writes. It could be lots of low duration tasks doing lots of IO. > >> > >> -- > >> Andrew J. Kelly SQL MVP > >> Solid Quality Mentors > >> > >> > >> "george" <george@discussions.microsoft.com> wrote in message > >> news:334B6CCB-15D2-4294-A1EE-4EB478AED901@microsoft.com... > >> >I am experiencing a strange problem with SQL Server 2K. > >> > > >> > Every so often our website application will lock up because it is > >> > waiting > >> > for SQL server. > >> > > >> > We run SQL2K on a dedicated server. The server has 3 RAID1 drives > >> > (total > >> > of > >> > 6 disks) setup in the following way: > >> > > >> > C: System Drive (RAID1) – Contains OS Files > >> > D: Data Drive (RAID1) – Contains SQL Databases > >> > E: Log Drive (RAID1) – Contains the Log files > >> > > >> > If I run performance counters on the database server, I can see that > >> > everything runs smoothly for a period of time, say 2-10 minutes, and > >> > then > >> > all > >> > of a sudden, the Disk Write Counter for drive D: (Data files) jumps to > >> > 100% - > >> > this is when everything grinds to a halt. After a few moments, the > >> > server > >> > recovers again. > >> > > >> > Please see the screen grabs below I have uploaded to flickr: > >> > > >> > Normal State - This screen grab shows how the performance counters > >> > normally > >> > look: > >> > http://www.flickr.com/photos/30394576@N03/2848279856/ > >> > > >> > Problem State - This screen grab shows how the performance counters > >> > look > >> > when the problem occurs: > >> > http://www.flickr.com/photos/30394576@N03/2848279806/ > >> > > >> > You can see from the second screen grab that the disk write time of > >> > drive > >> > D: > >> > (the data files drive) jumps to 100% yet the processor usage remains > >> > consistent through the problem period at about 10% usage. The log files > >> > drive > >> > E: is also very active during this time. > >> > > >> > The database constantly does this throughout the day ever 2-10 minutes; > >> > it > >> > is very annoying as our website application locks up every time this > >> > happens. > >> > > >> > > >> > Here is what I have tried so far: > >> > > >> > 1) I have disabled all maintenance tasks, scheduled jobs etc. but I > >> > still > >> > get the problem. > >> > > >> > 2) I have made sure the Auto Grow is not set to a percentage and > >> > monitored > >> > the size of the data and log files to see if they change size when the > >> > problem happens – but they do not change filesize size, so I don’t > >> > think > >> > this > >> > is cause of the problem. > >> > > >> > 3) I have run the profiler to try and find any long running queries > >> > that > >> > could be causing the problem, but all I can see are queries that > >> > normally > >> > run > >> > very quickly are taking much longer to run when the problem happens. > >> > > >> > 4) I have tried changing the Fill factor of clustered indexes of > >> > frequently > >> > updated tables to 70%. This has not solved the problem. > >> > > >> > I am now at a loss to what could be causing the problem. > >> > > >> > Has anybody experienced this type of problem before? Any would be > >> > greatly appreciated. > >> > > >> > Kind Regards > >> > George > >> > > >> > > >> > >> > > |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
1) do you have replication enabled, or database mirroring?
2) wat about backups? 3) Can you get more RAM for the box? If you have large databases 4GB just doesn't cut it. 300xactions/sec if they hit lots of data (oh, especially if that data isn't properly indexed) can cause lots of IOs when you have low ram amount. And as Andrew pointed out 2 spindles serving up data and 2 serving up tlog just isn't much at all. 4) I would check for poorly written queries and indexing opportunities. 5) Check out sys.dm_io_virtual_file_stats to see if IO delays are happening from within sql server. 6) There are a TON of other things. Consider getting a performance expert to remote into your system to give it a review. You will solve your problem MUCH quicker this way, and probably learn a lot in the process. -- Kevin G. Boles Indicium Resources, Inc. SQL Server MVP kgboles a earthlink dt net "george" <george@discussions.microsoft.com> wrote in message news:5E9C5FB6-7995-40BA-9428-C7299D092A66@microsoft.com... > The strange thing is that the problem happens even during off-peek hours. > > For example, I am seeing this happening when total transactions/sec = 300 > (I don't expect 300 per/sec to be a busy server) > > Your response made me think that it could be a hardware disk > error/problem. > > So I swapped the data and log files to opposite HD - the data files are > now > on drive E: and the logs are on drive D: ; this has made no difference, > the > problem has now swapped from drive D: to drive E: > > I will have to contact my hosting company to see if they can change the > controller cache configuration (I guess this would be done at bios level?) > > > I dont understand why I would be getting the same problem during busy and > very quiet times with similar frequency. > > > Could it be anything else? > Do you think that it could be too many updates to an indexed table? > > Would changing the recovery model from full stop the checkpoints? > > The server hardware is quite substantial: > PowerEdge 2950 > 2 x Duel Xeon Quad Core Processors > 4GB RAM > 6 x SCSI HD 15K (SAS) > Dell \ 10G Drive Controller - SAS/SATA RAID \ PERC 6/i > > > "Andrew J. Kelly" wrote: > >> sound like it is a disk issue. If the checkpoints are happening during >> this >> time it means your disks are not configured properly to handle the load >> or >> you have a problem with the disks themselves. You probably need more >> drives >> for the D: array. A raid 1 can only handle so much IO and to get more you >> need to add more spindles which means a RAID 10. Another thing to look >> at >> is the cache configuration and amount on the controller. If you have >> 256MB >> see if it will take 512MB. But the most important thing is to change the >> cache read / write ratio to 100% write back and 0% read. >> >> >> -- >> Andrew J. Kelly SQL MVP >> Solid Quality Mentors >> >> >> "george" <george@discussions.microsoft.com> wrote in message >> news:9EF6F459-0591-427E-BFF2-7280EF504542@microsoft.com... >> > >> > Dear Andrew, >> > >> > Thank you for your response. >> > >> > >> > I don't have any anti-virus software running on the server. >> > >> > Task manager shows sqlservr has performed the most write operations. >> > >> > There really isn't anything else running on the server other than >> > windows >> > os. >> > >> > >> > >> > Check Point Pages Per Second Counter >> > ------------------------------------ >> > >> > During normal running state, the counter is: 0 >> > >> > When the problem occurs, the counter jumps up and down to maximum of: >> > 1875 >> > >> > >> > >> > SQL Profiler >> > ------------ >> > >> > Tried running the profiler to return only queries with writes >= 100 >> > >> > During normal running state: None >> > >> > When the problem occurs: None >> > >> > >> > Any ideas? >> > The server is windows 2003R2 , has 4 Gb of RAM (with 1.8 GB free). >> > >> > >> > "Andrew J. Kelly" wrote: >> > >> >> Have you checked the perfmon counters for Check Point Pages Per Second >> >> during that time? If not have a look. If this spikes at the same time >> >> it >> >> is >> >> due to a checkpoint processes. But 2 to 10 minutes is extreme unless >> >> you >> >> are >> >> doing lots of transactions. What about Anti-Virus software? Have you >> >> looked >> >> at task manager to see what is using CPU & Reads or Writes at that >> >> time? >> >> And >> >> with profiler don't look so much for the duration but rather the Reads >> >> or >> >> Writes. It could be lots of low duration tasks doing lots of IO. >> >> >> >> -- >> >> Andrew J. Kelly SQL MVP >> >> Solid Quality Mentors >> >> >> >> >> >> "george" <george@discussions.microsoft.com> wrote in message >> >> news:334B6CCB-15D2-4294-A1EE-4EB478AED901@microsoft.com... >> >> >I am experiencing a strange problem with SQL Server 2K. >> >> > >> >> > Every so often our website application will lock up because it is >> >> > waiting >> >> > for SQL server. >> >> > >> >> > We run SQL2K on a dedicated server. The server has 3 RAID1 drives >> >> > (total >> >> > of >> >> > 6 disks) setup in the following way: >> >> > >> >> > C: System Drive (RAID1) - Contains OS Files >> >> > D: Data Drive (RAID1) - Contains SQL Databases >> >> > E: Log Drive (RAID1) - Contains the Log files >> >> > >> >> > If I run performance counters on the database server, I can see that >> >> > everything runs smoothly for a period of time, say 2-10 minutes, and >> >> > then >> >> > all >> >> > of a sudden, the Disk Write Counter for drive D: (Data files) jumps >> >> > to >> >> > 100% - >> >> > this is when everything grinds to a halt. After a few moments, the >> >> > server >> >> > recovers again. >> >> > >> >> > Please see the screen grabs below I have uploaded to flickr: >> >> > >> >> > Normal State - This screen grab shows how the performance counters >> >> > normally >> >> > look: >> >> > http://www.flickr.com/photos/30394576@N03/2848279856/ >> >> > >> >> > Problem State - This screen grab shows how the performance counters >> >> > look >> >> > when the problem occurs: >> >> > http://www.flickr.com/photos/30394576@N03/2848279806/ >> >> > >> >> > You can see from the second screen grab that the disk write time of >> >> > drive >> >> > D: >> >> > (the data files drive) jumps to 100% yet the processor usage remains >> >> > consistent through the problem period at about 10% usage. The log >> >> > files >> >> > drive >> >> > E: is also very active during this time. >> >> > >> >> > The database constantly does this throughout the day ever 2-10 >> >> > minutes; >> >> > it >> >> > is very annoying as our website application locks up every time this >> >> > happens. >> >> > >> >> > >> >> > Here is what I have tried so far: >> >> > >> >> > 1) I have disabled all maintenance tasks, scheduled jobs etc. but I >> >> > still >> >> > get the problem. >> >> > >> >> > 2) I have made sure the Auto Grow is not set to a percentage and >> >> > monitored >> >> > the size of the data and log files to see if they change size when >> >> > the >> >> > problem happens - but they do not change filesize size, so I don't >> >> > think >> >> > this >> >> > is cause of the problem. >> >> > >> >> > 3) I have run the profiler to try and find any long running queries >> >> > that >> >> > could be causing the problem, but all I can see are queries that >> >> > normally >> >> > run >> >> > very quickly are taking much longer to run when the problem happens. >> >> > >> >> > 4) I have tried changing the Fill factor of clustered indexes of >> >> > frequently >> >> > updated tables to 70%. This has not solved the problem. >> >> > >> >> > I am now at a loss to what could be causing the problem. >> >> > >> >> > Has anybody experienced this type of problem before? Any would >> >> > be >> >> > greatly appreciated. >> >> > >> >> > Kind Regards >> >> > George >> >> > >> >> > >> >> >> >> >> >> |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
As Kevin pointed out while you have 8 cores which is great the rest of the
hardware isn't at the same level. 4GB for 8 cores is pretty trivial these days as is only a Raid 1 for the data drives. 300 trans per second can surely make for a large amount of dirty pages that will need to be written at one time during the checkpoint and I would expect a bottleneck when it happens. If the drives you mentioned are truly different physical Raid 1's then swapping the log with the data files tells a lot. If it were the drive the bottleneck would have stayed where it was. As for the issue still occurring during slow times that is a little strange if is is really at the same frequency. I can see it still occurring but at a less frequent interval. Unless it does have something to do with backups as Kevin also mentioned. A backup will force a checkpoint. And if your backups are going to the same drive as the data that is a problem as well. -- Andrew J. Kelly SQL MVP Solid Quality Mentors "george" <george@discussions.microsoft.com> wrote in message news:5E9C5FB6-7995-40BA-9428-C7299D092A66@microsoft.com... > The strange thing is that the problem happens even during off-peek hours. > > For example, I am seeing this happening when total transactions/sec = 300 > (I don't expect 300 per/sec to be a busy server) > > Your response made me think that it could be a hardware disk > error/problem. > > So I swapped the data and log files to opposite HD - the data files are > now > on drive E: and the logs are on drive D: ; this has made no difference, > the > problem has now swapped from drive D: to drive E: > > I will have to contact my hosting company to see if they can change the > controller cache configuration (I guess this would be done at bios level?) > > > I dont understand why I would be getting the same problem during busy and > very quiet times with similar frequency. > > > Could it be anything else? > Do you think that it could be too many updates to an indexed table? > > Would changing the recovery model from full stop the checkpoints? > > The server hardware is quite substantial: > PowerEdge 2950 > 2 x Duel Xeon Quad Core Processors > 4GB RAM > 6 x SCSI HD 15K (SAS) > Dell \ 10G Drive Controller - SAS/SATA RAID \ PERC 6/i > > > "Andrew J. Kelly" wrote: > >> sound like it is a disk issue. If the checkpoints are happening during >> this >> time it means your disks are not configured properly to handle the load >> or >> you have a problem with the disks themselves. You probably need more >> drives >> for the D: array. A raid 1 can only handle so much IO and to get more you >> need to add more spindles which means a RAID 10. Another thing to look >> at >> is the cache configuration and amount on the controller. If you have >> 256MB >> see if it will take 512MB. But the most important thing is to change the >> cache read / write ratio to 100% write back and 0% read. >> >> >> -- >> Andrew J. Kelly SQL MVP >> Solid Quality Mentors >> >> >> "george" <george@discussions.microsoft.com> wrote in message >> news:9EF6F459-0591-427E-BFF2-7280EF504542@microsoft.com... >> > >> > Dear Andrew, >> > >> > Thank you for your response. >> > >> > >> > I don’t have any anti-virus software running on the server. >> > >> > Task manager shows sqlservr has performed the most write operations. >> > >> > There really isn’t anything else running on the server other than >> > windows >> > os. >> > >> > >> > >> > Check Point Pages Per Second Counter >> > ------------------------------------ >> > >> > During normal running state, the counter is: 0 >> > >> > When the problem occurs, the counter jumps up and down to maximum of: >> > 1875 >> > >> > >> > >> > SQL Profiler >> > ------------ >> > >> > Tried running the profiler to return only queries with writes >= 100 >> > >> > During normal running state: None >> > >> > When the problem occurs: None >> > >> > >> > Any ideas? >> > The server is windows 2003R2 , has 4 Gb of RAM (with 1.8 GB free). >> > >> > >> > "Andrew J. Kelly" wrote: >> > >> >> Have you checked the perfmon counters for Check Point Pages Per Second >> >> during that time? If not have a look. If this spikes at the same time >> >> it >> >> is >> >> due to a checkpoint processes. But 2 to 10 minutes is extreme unless >> >> you >> >> are >> >> doing lots of transactions. What about Anti-Virus software? Have you >> >> looked >> >> at task manager to see what is using CPU & Reads or Writes at that >> >> time? >> >> And >> >> with profiler don't look so much for the duration but rather the Reads >> >> or >> >> Writes. It could be lots of low duration tasks doing lots of IO. >> >> >> >> -- >> >> Andrew J. Kelly SQL MVP >> >> Solid Quality Mentors >> >> >> >> >> >> "george" <george@discussions.microsoft.com> wrote in message >> >> news:334B6CCB-15D2-4294-A1EE-4EB478AED901@microsoft.com... >> >> >I am experiencing a strange problem with SQL Server 2K. >> >> > >> >> > Every so often our website application will lock up because it is >> >> > waiting >> >> > for SQL server. >> >> > >> >> > We run SQL2K on a dedicated server. The server has 3 RAID1 drives >> >> > (total >> >> > of >> >> > 6 disks) setup in the following way: >> >> > >> >> > C: System Drive (RAID1) – Contains OS Files >> >> > D: Data Drive (RAID1) – Contains SQL Databases >> >> > E: Log Drive (RAID1) – Contains the Log files >> >> > >> >> > If I run performance counters on the database server, I can see that >> >> > everything runs smoothly for a period of time, say 2-10 minutes, and >> >> > then >> >> > all >> >> > of a sudden, the Disk Write Counter for drive D: (Data files) jumps >> >> > to >> >> > 100% - >> >> > this is when everything grinds to a halt. After a few moments, the >> >> > server >> >> > recovers again. >> >> > >> >> > Please see the screen grabs below I have uploaded to flickr: >> >> > >> >> > Normal State - This screen grab shows how the performance counters >> >> > normally >> >> > look: >> >> > http://www.flickr.com/photos/30394576@N03/2848279856/ >> >> > >> >> > Problem State - This screen grab shows how the performance counters >> >> > look >> >> > when the problem occurs: >> >> > http://www.flickr.com/photos/30394576@N03/2848279806/ >> >> > >> >> > You can see from the second screen grab that the disk write time of >> >> > drive >> >> > D: >> >> > (the data files drive) jumps to 100% yet the processor usage remains >> >> > consistent through the problem period at about 10% usage. The log >> >> > files >> >> > drive >> >> > E: is also very active during this time. >> >> > >> >> > The database constantly does this throughout the day ever 2-10 >> >> > minutes; >> >> > it >> >> > is very annoying as our website application locks up every time this >> >> > happens. >> >> > >> >> > >> >> > Here is what I have tried so far: >> >> > >> >> > 1) I have disabled all maintenance tasks, scheduled jobs etc. but I >> >> > still >> >> > get the problem. >> >> > >> >> > 2) I have made sure the Auto Grow is not set to a percentage and >> >> > monitored >> >> > the size of the data and log files to see if they change size when >> >> > the >> >> > problem happens – but they do not change filesize size, so I don’t >> >> > think >> >> > this >> >> > is cause of the problem. >> >> > >> >> > 3) I have run the profiler to try and find any long running queries >> >> > that >> >> > could be causing the problem, but all I can see are queries that >> >> > normally >> >> > run >> >> > very quickly are taking much longer to run when the problem happens. >> >> > >> >> > 4) I have tried changing the Fill factor of clustered indexes of >> >> > frequently >> >> > updated tables to 70%. This has not solved the problem. >> >> > >> >> > I am now at a loss to what could be causing the problem. >> >> > >> >> > Has anybody experienced this type of problem before? Any would >> >> > be >> >> > greatly appreciated. >> >> > >> >> > Kind Regards >> >> > George >> >> > >> >> > >> >> >> >> >> >> |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
Thanks so much for any !
To answer some of your questions: 1) no replication or mirroring 2) backup runs once every 24 hours, with transaction log backups every hour. 3) The box has 4 gb or ram but not all is being used, in fact box has 1.8gb free. I am using windows 2003 R2 and SQL serevr 2000 with service pack 4. I have added the /3GB switch to windows startup but I guess this hasnt ed as 1.8 gb is still not being used!! 4) Most transactions are reading data. would poorly written queries and indexes create these checkpoint problems? can dirty pages be created from reads or are they only created when data is updated or written? 5) I tried your suggestion and got: DbId FileId TimeStamp NumRd NumWr BytsRd BytsWr IoStallMS 1 1 145851906 5620 7 59613184 57344 56915 1 2 145851906 20 16 424960 142848 0 2 1 145851906 205120 269707 11070660608 11313397760 454895 2 2 145851906 168 782334 1732608 44177854464 12645 3 1 145851906 76 2 1703936 16384 154 3 2 145851906 13 9 396288 190976 16 4 1 145851906 8089 2695 91742208 24977408 44197 4 2 145851906 32 23115 292352 19606528 126 5 1 145851906 144 50 2482176 475136 1500 5 2 145851906 14 19 376832 217088 16 6 1 145851906 242 120 4833280 1638400 2171 6 2 145851906 9 28 401408 495104 16 7 1 145851906 16700029 6997988 814219427840 68674912256 1823298898 7 2 145851906 16239 3890965 10593645056 7459493376 686 8 1 145851906 1168 1 12083200 8192 18244 8 2 145851906 17 11 412672 500736 16 The main Database is ID 7. Not sure what all this data means. any ideas? 6) Will consider this option, but data is sensitive so will have to ask around for a recomendation. thanks again! "TheSQLGuru" wrote: > 1) do you have replication enabled, or database mirroring? > > 2) wat about backups? > > 3) Can you get more RAM for the box? If you have large databases 4GB just > doesn't cut it. 300xactions/sec if they hit lots of data (oh, especially if > that data isn't properly indexed) can cause lots of IOs when you have low > ram amount. And as Andrew pointed out 2 spindles serving up data and 2 > serving up tlog just isn't much at all. > > 4) I would check for poorly written queries and indexing opportunities. > > 5) Check out sys.dm_io_virtual_file_stats to see if IO delays are happening > from within sql server. > > 6) There are a TON of other things. Consider getting a performance expert > to remote into your system to give it a review. You will solve your problem > MUCH quicker this way, and probably learn a lot in the process. > > > -- > Kevin G. Boles > Indicium Resources, Inc. > SQL Server MVP > kgboles a earthlink dt net > > > "george" <george@discussions.microsoft.com> wrote in message > news:5E9C5FB6-7995-40BA-9428-C7299D092A66@microsoft.com... > > The strange thing is that the problem happens even during off-peek hours. > > > > For example, I am seeing this happening when total transactions/sec = 300 > > (I don't expect 300 per/sec to be a busy server) > > > > Your response made me think that it could be a hardware disk > > error/problem. > > > > So I swapped the data and log files to opposite HD - the data files are > > now > > on drive E: and the logs are on drive D: ; this has made no difference, > > the > > problem has now swapped from drive D: to drive E: > > > > I will have to contact my hosting company to see if they can change the > > controller cache configuration (I guess this would be done at bios level?) > > > > > > I dont understand why I would be getting the same problem during busy and > > very quiet times with similar frequency. > > > > > > Could it be anything else? > > Do you think that it could be too many updates to an indexed table? > > > > Would changing the recovery model from full stop the checkpoints? > > > > The server hardware is quite substantial: > > PowerEdge 2950 > > 2 x Duel Xeon Quad Core Processors > > 4GB RAM > > 6 x SCSI HD 15K (SAS) > > Dell \ 10G Drive Controller - SAS/SATA RAID \ PERC 6/i > > > > > > "Andrew J. Kelly" wrote: > > > >> sound like it is a disk issue. If the checkpoints are happening during > >> this > >> time it means your disks are not configured properly to handle the load > >> or > >> you have a problem with the disks themselves. You probably need more > >> drives > >> for the D: array. A raid 1 can only handle so much IO and to get more you > >> need to add more spindles which means a RAID 10. Another thing to look > >> at > >> is the cache configuration and amount on the controller. If you have > >> 256MB > >> see if it will take 512MB. But the most important thing is to change the > >> cache read / write ratio to 100% write back and 0% read. > >> > >> > >> -- > >> Andrew J. Kelly SQL MVP > >> Solid Quality Mentors > >> > >> > >> "george" <george@discussions.microsoft.com> wrote in message > >> news:9EF6F459-0591-427E-BFF2-7280EF504542@microsoft.com... > >> > > >> > Dear Andrew, > >> > > >> > Thank you for your response. > >> > > >> > > >> > I don't have any anti-virus software running on the server. > >> > > >> > Task manager shows sqlservr has performed the most write operations. > >> > > >> > There really isn't anything else running on the server other than > >> > windows > >> > os. > >> > > >> > > >> > > >> > Check Point Pages Per Second Counter > >> > ------------------------------------ > >> > > >> > During normal running state, the counter is: 0 > >> > > >> > When the problem occurs, the counter jumps up and down to maximum of: > >> > 1875 > >> > > >> > > >> > > >> > SQL Profiler > >> > ------------ > >> > > >> > Tried running the profiler to return only queries with writes >= 100 > >> > > >> > During normal running state: None > >> > > >> > When the problem occurs: None > >> > > >> > > >> > Any ideas? > >> > The server is windows 2003R2 , has 4 Gb of RAM (with 1.8 GB free). > >> > > >> > > >> > "Andrew J. Kelly" wrote: > >> > > >> >> Have you checked the perfmon counters for Check Point Pages Per Second > >> >> during that time? If not have a look. If this spikes at the same time > >> >> it > >> >> is > >> >> due to a checkpoint processes. But 2 to 10 minutes is extreme unless > >> >> you > >> >> are > >> >> doing lots of transactions. What about Anti-Virus software? Have you > >> >> looked > >> >> at task manager to see what is using CPU & Reads or Writes at that > >> >> time? > >> >> And > >> >> with profiler don't look so much for the duration but rather the Reads > >> >> or > >> >> Writes. It could be lots of low duration tasks doing lots of IO. > >> >> > >> >> -- > >> >> Andrew J. Kelly SQL MVP > >> >> Solid Quality Mentors > >> >> > >> >> > >> >> "george" <george@discussions.microsoft.com> wrote in message > >> >> news:334B6CCB-15D2-4294-A1EE-4EB478AED901@microsoft.com... > >> >> >I am experiencing a strange problem with SQL Server 2K. > >> >> > > >> >> > Every so often our website application will lock up because it is > >> >> > waiting > >> >> > for SQL server. > >> >> > > >> >> > We run SQL2K on a dedicated server. The server has 3 RAID1 drives > >> >> > (total > >> >> > of > >> >> > 6 disks) setup in the following way: > >> >> > > >> >> > C: System Drive (RAID1) - Contains OS Files > >> >> > D: Data Drive (RAID1) - Contains SQL Databases > >> >> > E: Log Drive (RAID1) - Contains the Log files > >> >> > > >> >> > If I run performance counters on the database server, I can see that > >> >> > everything runs smoothly for a period of time, say 2-10 minutes, and > >> >> > then > >> >> > all > >> >> > of a sudden, the Disk Write Counter for drive D: (Data files) jumps > >> >> > to > >> >> > 100% - > >> >> > this is when everything grinds to a halt. After a few moments, the > >> >> > server > >> >> > recovers again. > >> >> > > >> >> > Please see the screen grabs below I have uploaded to flickr: > >> >> > > >> >> > Normal State - This screen grab shows how the performance counters > >> >> > normally > >> >> > look: > >> >> > http://www.flickr.com/photos/30394576@N03/2848279856/ > >> >> > > >> >> > Problem State - This screen grab shows how the performance counters > >> >> > look > >> >> > when the problem occurs: > >> >> > http://www.flickr.com/photos/30394576@N03/2848279806/ > >> >> > > >> >> > You can see from the second screen grab that the disk write time of > >> >> > drive > >> >> > D: > >> >> > (the data files drive) jumps to 100% yet the processor usage remains > >> >> > consistent through the problem period at about 10% usage. The log > >> >> > files > >> >> > drive > >> >> > E: is also very active during this time. > >> >> > > >> >> > The database constantly does this throughout the day ever 2-10 > >> >> > minutes; > >> >> > it > >> >> > is very annoying as our website application locks up every time this > >> >> > happens. > >> >> > > >> >> > > >> >> > Here is what I have tried so far: > >> >> > > >> >> > 1) I have disabled all maintenance tasks, scheduled jobs etc. but I > >> >> > still > >> >> > get the problem. > >> >> > > >> >> > 2) I have made sure the Auto Grow is not set to a percentage and > >> >> > monitored > >> >> > the size of the data and log files to see if they change size when > >> >> > the > >> >> > problem happens - but they do not change filesize size, so I don't > >> >> > think > >> >> > this > >> >> > is cause of the problem. > >> >> > > >> >> > 3) I have run the profiler to try and find any long running queries > >> >> > that > >> >> > could be causing the problem, but all I can see are queries that > >> >> > normally > >> >> > run > >> >> > very quickly are taking much longer to run when the problem happens. > >> >> > > >> >> > 4) I have tried changing the Fill factor of clustered indexes of > >> >> > frequently > >> >> > updated tables to 70%. This has not solved the problem. > >> >> > > >> >> > I am now at a loss to what could be causing the problem. > >> >> > > >> >> > Has anybody experienced this type of problem before? Any would > >> >> > be > >> >> > greatly appreciated. > >> >> > > >> >> > Kind Regards > >> >> > George > >> >> > > >> >> > > >> >> > >> >> > >> > >> > > > |
|
|
|
#9 |
|
Messages: n/a
Hébergeur: |
Thanks andrew
after the server settled, the problem seemed to be happening with less frequency , more towards the every 10 minute mark. The backups go to the C: drive. The data file is backed up once every 24 hours and logs once every hour so I can’t imagine this being the problem. We are running SQL Server 2000 Service pack 4. We were going to upgrade to 2005 but have been waiting for 2008 instead. Would upgrading to 2008 fix this checkpoint issue? I have spoken with our server hosting company and they say the RAID controller card is configured at max performance. They suggested turning 4 of the 6 drives into a RAID10 array to speed up write performance. The server on has 6 drives, so this suggestion would leave me with: 2 drives configured as raid1 - C: drive 4 drives configured as raid10 - D: drive I am reluctant to run the OS from 1 drive, from past bad experiences, I would rather the OS be mirrored. With this setup I would put the logs on drive C: along with the OS and the backups (RAID1) The data file would then go on drive D: (RAID10) Personally I don’t like the sound of this, but it would be a cheaper option than replacing the whole server. If I were to replace the whole server , my hosting company can provide the following server: Windows 2003 Enterprise (so i can add more than 4GB) 8 GB RAM 2 x 73 GB 15k RPM SCSI/SAS HDD 8 x 300 GB 10K RPM SCSI/SAS HDD Initially we would need to run sql2000 until we upgraded to the 2008 version. I would appreciate any suggestions on the best way to configure the drives in the above system? Thanks again george "Andrew J. Kelly" wrote: > As Kevin pointed out while you have 8 cores which is great the rest of the > hardware isn't at the same level. 4GB for 8 cores is pretty trivial these > days as is only a Raid 1 for the data drives. 300 trans per second can > surely make for a large amount of dirty pages that will need to be written > at one time during the checkpoint and I would expect a bottleneck when it > happens. If the drives you mentioned are truly different physical Raid 1's > then swapping the log with the data files tells a lot. If it were the drive > the bottleneck would have stayed where it was. As for the issue still > occurring during slow times that is a little strange if is is really at the > same frequency. I can see it still occurring but at a less frequent > interval. Unless it does have something to do with backups as Kevin also > mentioned. A backup will force a checkpoint. And if your backups are going > to the same drive as the data that is a problem as well. > > -- > Andrew J. Kelly SQL MVP > Solid Quality Mentors > > > "george" <george@discussions.microsoft.com> wrote in message > news:5E9C5FB6-7995-40BA-9428-C7299D092A66@microsoft.com... > > The strange thing is that the problem happens even during off-peek hours. > > > > For example, I am seeing this happening when total transactions/sec = 300 > > (I don't expect 300 per/sec to be a busy server) > > > > Your response made me think that it could be a hardware disk > > error/problem. > > > > So I swapped the data and log files to opposite HD - the data files are > > now > > on drive E: and the logs are on drive D: ; this has made no difference, > > the > > problem has now swapped from drive D: to drive E: > > > > I will have to contact my hosting company to see if they can change the > > controller cache configuration (I guess this would be done at bios level?) > > > > > > I dont understand why I would be getting the same problem during busy and > > very quiet times with similar frequency. > > > > > > Could it be anything else? > > Do you think that it could be too many updates to an indexed table? > > > > Would changing the recovery model from full stop the checkpoints? > > > > The server hardware is quite substantial: > > PowerEdge 2950 > > 2 x Duel Xeon Quad Core Processors > > 4GB RAM > > 6 x SCSI HD 15K (SAS) > > Dell \ 10G Drive Controller - SAS/SATA RAID \ PERC 6/i > > > > > > "Andrew J. Kelly" wrote: > > > >> sound like it is a disk issue. If the checkpoints are happening during > >> this > >> time it means your disks are not configured properly to handle the load > >> or > >> you have a problem with the disks themselves. You probably need more > >> drives > >> for the D: array. A raid 1 can only handle so much IO and to get more you > >> need to add more spindles which means a RAID 10. Another thing to look > >> at > >> is the cache configuration and amount on the controller. If you have > >> 256MB > >> see if it will take 512MB. But the most important thing is to change the > >> cache read / write ratio to 100% write back and 0% read. > >> > >> > >> -- > >> Andrew J. Kelly SQL MVP > >> Solid Quality Mentors > >> > >> > >> "george" <george@discussions.microsoft.com> wrote in message > >> news:9EF6F459-0591-427E-BFF2-7280EF504542@microsoft.com... > >> > > >> > Dear Andrew, > >> > > >> > Thank you for your response. > >> > > >> > > >> > I don’t have any anti-virus software running on the server. > >> > > >> > Task manager shows sqlservr has performed the most write operations. > >> > > >> > There really isn’t anything else running on the server other than > >> > windows > >> > os. > >> > > >> > > >> > > >> > Check Point Pages Per Second Counter > >> > ------------------------------------ > >> > > >> > During normal running state, the counter is: 0 > >> > > >> > When the problem occurs, the counter jumps up and down to maximum of: > >> > 1875 > >> > > >> > > >> > > >> > SQL Profiler > >> > ------------ > >> > > >> > Tried running the profiler to return only queries with writes >= 100 > >> > > >> > During normal running state: None > >> > > >> > When the problem occurs: None > >> > > >> > > >> > Any ideas? > >> > The server is windows 2003R2 , has 4 Gb of RAM (with 1.8 GB free). > >> > > >> > > >> > "Andrew J. Kelly" wrote: > >> > > >> >> Have you checked the perfmon counters for Check Point Pages Per Second > >> >> during that time? If not have a look. If this spikes at the same time > >> >> it > >> >> is > >> >> due to a checkpoint processes. But 2 to 10 minutes is extreme unless > >> >> you > >> >> are > >> >> doing lots of transactions. What about Anti-Virus software? Have you > >> >> looked > >> >> at task manager to see what is using CPU & Reads or Writes at that > >> >> time? > >> >> And > >> >> with profiler don't look so much for the duration but rather the Reads > >> >> or > >> >> Writes. It could be lots of low duration tasks doing lots of IO. > >> >> > >> >> -- > >> >> Andrew J. Kelly SQL MVP > >> >> Solid Quality Mentors > >> >> > >> >> > >> >> "george" <george@discussions.microsoft.com> wrote in message > >> >> news:334B6CCB-15D2-4294-A1EE-4EB478AED901@microsoft.com... > >> >> >I am experiencing a strange problem with SQL Server 2K. > >> >> > > >> >> > Every so often our website application will lock up because it is > >> >> > waiting > >> >> > for SQL server. > >> >> > > >> >> > We run SQL2K on a dedicated server. The server has 3 RAID1 drives > >> >> > (total > >> >> > of > >> >> > 6 disks) setup in the following way: > >> >> > > >> >> > C: System Drive (RAID1) – Contains OS Files > >> >> > D: Data Drive (RAID1) – Contains SQL Databases > >> >> > E: Log Drive (RAID1) – Contains the Log files > >> >> > > >> >> > If I run performance counters on the database server, I can see that > >> >> > everything runs smoothly for a period of time, say 2-10 minutes, and > >> >> > then > >> >> > all > >> >> > of a sudden, the Disk Write Counter for drive D: (Data files) jumps > >> >> > to > >> >> > 100% - > >> >> > this is when everything grinds to a halt. After a few moments, the > >> >> > server > >> >> > recovers again. > >> >> > > >> >> > Please see the screen grabs below I have uploaded to flickr: > >> >> > > >> >> > Normal State - This screen grab shows how the performance counters > >> >> > normally > >> >> > look: > >> >> > http://www.flickr.com/photos/30394576@N03/2848279856/ > >> >> > > >> >> > Problem State - This screen grab shows how the performance counters > >> >> > look > >> >> > when the problem occurs: > >> >> > http://www.flickr.com/photos/30394576@N03/2848279806/ > >> >> > > >> >> > You can see from the second screen grab that the disk write time of > >> >> > drive > >> >> > D: > >> >> > (the data files drive) jumps to 100% yet the processor usage remains > >> >> > consistent through the problem period at about 10% usage. The log > >> >> > files > >> >> > drive > >> >> > E: is also very active during this time. > >> >> > > >> >> > The database constantly does this throughout the day ever 2-10 > >> >> > minutes; > >> >> > it > >> >> > is very annoying as our website application locks up every time this > >> >> > happens. > >> >> > > >> >> > > >> >> > Here is what I have tried so far: > >> >> > > >> >> > 1) I have disabled all maintenance tasks, scheduled jobs etc. but I > >> >> > still > >> >> > get the problem. > >> >> > > >> >> > 2) I have made sure the Auto Grow is not set to a percentage and > >> >> > monitored > >> >> > the size of the data and log files to see if they change size when > >> >> > the > >> >> > problem happens – but they do not change filesize size, so I don’t > >> >> > think > >> >> > this > >> >> > is cause of the problem. > >> >> > > >> >> > 3) I have run the profiler to try and find any long running queries > >> >> > that > >> >> > could be causing the problem, but all I can see are queries that > >> >> > normally > >> >> > run > >> >> > very quickly are taking much longer to run when the problem happens. > >> >> > > >> >> > 4) I have tried changing the Fill factor of clustered indexes of > >> >> > frequently > >> >> > updated tables to 70%. This has not solved the problem. > >> >> > > >> >> > I am now at a loss to what could be causing the problem. > >> >> > > >> >> > Has anybody experienced this type of problem before? Any would > >> >> > be > >> >> > greatly appreciated. > >> >> > > >> >> > Kind Regards > >> >> > George > >> >> > > >> >> > > >> >> > >> >> > >> > >> > > |
|
|
|
#10 |
|
Messages: n/a
Hébergeur: |
Well a backup will force a checkpoint but one will also occur when you get enough activity in the log to trigger one in between as well. I see from the other thread you say you have 1.8GB free. This is because you re running 2000 Standard edition which will top out at 2GB for SQL Server. Upgrading to 2008 std will let you use the server limit. If you are going to upgrade I would suggest you make the move to x64 bit OS and SQL server when you do so you don't have to bother with AWE and use that 8GB more efficiently. Check pointing has changed since 2000 and while it may be a little better if you have a disk bottleneck you will still see issues around it. Making the data array a Raid 10 instead of a Raid 1 should with the IO but that will force you to put the log files on the C: drive. That may work out OK but it is not a good idea to put backups on any drive you have the data or log files on. Does the hosting company have an option to allow you to backup to a share somewhere? Where is your tempdb log file located? If it is on the same drive as the data files you might want to try one thing first. Place the Tempdb log file on the same drive as the user db log files and see if that s. If tempdb is heavily used you can often get better performance by separating the tempdb logs from the data files as well. -- Andrew J. Kelly SQL MVP Solid Quality Mentors "george" <george@discussions.microsoft.com> wrote in message news:F7230539-70E4-40CD-AB1B-410D85C95457@microsoft.com... > Thanks andrew > > after the server settled, the problem seemed to be happening with less > frequency , more towards the every 10 minute mark. > > The backups go to the C: drive. The data file is backed up once every 24 > hours and logs once every hour so I can’t imagine this being the problem. > > We are running SQL Server 2000 Service pack 4. We were going to upgrade to > 2005 but have been waiting for 2008 instead. Would upgrading to 2008 fix > this > checkpoint issue? > > I have spoken with our server hosting company and they say the RAID > controller card is configured at max performance. They suggested turning 4 > of > the 6 drives into a RAID10 array to speed up write performance. > > The server on has 6 drives, so this suggestion would leave me with: > 2 drives configured as raid1 - C: drive > 4 drives configured as raid10 - D: drive > I am reluctant to run the OS from 1 drive, from past bad experiences, I > would rather the OS be mirrored. > > With this setup I would put the logs on drive C: along with the OS and the > backups (RAID1) > The data file would then go on drive D: (RAID10) > > Personally I don’t like the sound of this, but it would be a cheaper > option > than replacing the whole server. > > If I were to replace the whole server , my hosting company can provide the > following server: > Windows 2003 Enterprise (so i can add more than 4GB) > 8 GB RAM > 2 x 73 GB 15k RPM SCSI/SAS HDD > 8 x 300 GB 10K RPM SCSI/SAS HDD > Initially we would need to run |