|
|
|
|
||||||
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
(csimam@gmail.com) writes:
>> The problem is that I ultimately only want the top 25 stores in total >> sales BUT with their subtotals as well... So I used ROLLUP to generate >> the totals and subtotals... and in the same query I generated a >> ranking of the data... and there I found my problem: the ranking does >> not distinguish between totals and subtotals. So the results look >> like: >> >> store1, realTotal, 1 >> store1, subTotal, 2 >> store2, realTotal, 3 >> store7, realTotal, 4 >> store10, subTotal, 5 >> >> Clearly, the rows with subtotals should NOT be ranked... but I am >> struggling with SQL to figure out how to get exactly what I want. An interesting challenge to do in one query and only hitting the table once, and I was not able to think of a solution without using the imaginary Previous() function that we have discussed internally among us MVPs. You can easily do in one query, if you hit the table twice, simply one query with GROUP BY store and one with GROUP BY store, month. Rank the first and join. And it's possible to hit the table once, if you first get the monthly aggregations into a temp table. But in a single query? I don't think it's possible is you want the format: Store1, Grand Total, 1000000 Store1, Jan 2000, 12000 Store1, Feb 2000, 10000, Store2, Grand Total, .... But I will have to see if I get Itzik Ben-Gan to tackle this one. > Well... I have a solution. But I'm not sure about it. It involves > something I learned about only recently: a user-defined aggregate > function that concatenates values from across rows. Using this > aggregate function, stragg, I can do another pass through the results > mentioned above while grouping by store, and produce: > > store1, 'realTotal subTotal', 1 > store2, 'realTotal subTotal', 3 > store7, 'realTotal subTotal', 4 > store10, 'realTotal subTotal', 5 One problem is that a UDA is limited to 8000 characters, so if there are many subtotals, you query will bomb. > ...but... a worry: is this REALLY better than doing a nested query? I > wonder if the use of the user-defined aggregate function is very > expensive... The way to find out is to run both queries a couple of times on an idle server. Just don't forget about the effects of caching. Either first run the queries once, so that you know that table is entirely in cache. If it's too big for that, run DBCC DROPCLEANBUFFERS between each run, so that both queries run under the same condition. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
Erland,
Thanks for pointing the MVPs to this thread. I think the following solution is efficient and accesses the table only once. The query below uses the Northwind sample database data. It requires a permanent table of Years (year ranges, actually, with one range for each year and one "grand total" range) that can work generally for all queries like this, and it also uses aspects of the data model in Northwind, specifically that there is a Customers table. It can probably also be done with some sort of ROLLUP and PIVOT/UNPIVOT, as queries with cross joins and MAX's often can. Also, I only tested it on the SQL Server 2008 preview, because at the moment, no other server is handy. Here's the query, preceded by the created Years table (here with inserts only sufficient for Northwind. You will have to add more.) Steve Kass Drew University http://www.stevekass.com -- Create a permanent table like this that will work for -- the entire range of time you will ever need CREATE TABLE Years( y0 datetime primary key, y1 datetime, tag varchar(20), ord int ); insert into Years values ('1996','1997','1996',1996); insert into Years values ('1997','1998','1997',1997); insert into Years values ('1998','1999','1998',1998); insert into Years values ('1900','2100','Total',0); go -- The query: with T(CustomerID, Tag, Ord, Total, drk) as ( select C.CustomerID, max(case when O.OrderDate >= y0 and O.OrderDate < y1 then tag end ) as Tag, max(case when O.OrderDate >= y0 and O.OrderDate < y1 then ord end ) as Ord, sum( case when O.OrderDate >= y0 and O.OrderDate < y1 then OD.Quantity * OD.UnitPrice end ) as Total, dense_rank() over (order by sum(OD.Quantity * OD.UnitPrice) desc) as drk from Orders O join Customers C on C.CustomerID = O.CustomerID join [Order Details] OD on O.OrderID = OD.OrderID cross join Years group by C.CustomerID, tag ) select CustomerID, Tag, Total from T where drk <= 25 order by drk, Ord ; go -- DROP TABLE Years; Erland Sommarskog wrote: > (csimam@gmail.com) writes: > >>>The problem is that I ultimately only want the top 25 stores in total >>>sales BUT with their subtotals as well... So I used ROLLUP to generate >>>the totals and subtotals... and in the same query I generated a >>>ranking of the data... and there I found my problem: the ranking does >>>not distinguish between totals and subtotals. So the results look >>>like: >>> >>>store1, realTotal, 1 >>>store1, subTotal, 2 >>>store2, realTotal, 3 >>>store7, realTotal, 4 >>>store10, subTotal, 5 >>> >>>Clearly, the rows with subtotals should NOT be ranked... but I am >>>struggling with SQL to figure out how to get exactly what I want. > > > An interesting challenge to do in one query and only hitting the table > once, and I was not able to think of a solution without using the > imaginary Previous() function that we have discussed internally among > us MVPs. > > You can easily do in one query, if you hit the table twice, simply > one query with GROUP BY store and one with GROUP BY store, month. > Rank the first and join. > > And it's possible to hit the table once, if you first get the > monthly aggregations into a temp table. > > But in a single query? I don't think it's possible is you want the > format: > > Store1, Grand Total, 1000000 > Store1, Jan 2000, 12000 > Store1, Feb 2000, 10000, > Store2, Grand Total, .... > > But I will have to see if I get Itzik Ben-Gan to tackle this one. > > >>Well... I have a solution. But I'm not sure about it. It involves >>something I learned about only recently: a user-defined aggregate >>function that concatenates values from across rows. Using this >>aggregate function, stragg, I can do another pass through the results >>mentioned above while grouping by store, and produce: >> >>store1, 'realTotal subTotal', 1 >>store2, 'realTotal subTotal', 3 >>store7, 'realTotal subTotal', 4 >>store10, 'realTotal subTotal', 5 > > > One problem is that a UDA is limited to 8000 characters, so if there > are many subtotals, you query will bomb. > > >>...but... a worry: is this REALLY better than doing a nested query? I >>wonder if the use of the user-defined aggregate function is very >>expensive... > > > The way to find out is to run both queries a couple of times on an > idle server. Just don't forget about the effects of caching. Either > first run the queries once, so that you know that table is entirely > in cache. If it's too big for that, run DBCC DROPCLEANBUFFERS between > each run, so that both queries run under the same condition. > > |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
Erland Sommarskog (esquel@sommarskog.se) writes:
> An interesting challenge to do in one query and only hitting the table > once, and I was not able to think of a solution without using the > imaginary Previous() function that we have discussed internally among > us MVPs. > > You can easily do in one query, if you hit the table twice, simply > one query with GROUP BY store and one with GROUP BY store, month. > Rank the first and join. > > And it's possible to hit the table once, if you first get the > monthly aggregations into a temp table. > > But in a single query? I don't think it's possible is you want the > format: > > Store1, Grand Total, 1000000 > Store1, Jan 2000, 12000 > Store1, Feb 2000, 10000, > Store2, Grand Total, .... > > But I will have to see if I get Itzik Ben-Gan to tackle this one. Indeed I was, and I also got a suggestion from Steve Kass that you have already seen, and one from Umachandar Jaychandran, a former MVP who is now with the Storage Engine team. All three are really bright guys when it comes to writing queries. Both Umachandar and Itzik composed query that logically visit the table only once. Note that the query Steve posted is a cheat: it assumes that you know all involved subheadings in advance. So in practice, Steve's solution is a two- pass solution, although retrieving all subheadings is fairly cheap if there is a non-clustered index on that column. But as I ran all proposals, save one from Itzik that would require dynamic SQL to get the subheadings dynamically, I got a big surprise that calls me to revisit what I said earlier in the thread. To wit, the fastest solution was one that I wrote that uses a temp table to hold the subtotals, and then I compute the grand totals from that one. But the really big surprise was that a fairly naïve query where I scan the table twice, once for the subtotals and one for the grand totals was a lot faster than the queries that were one-pass without using a temp table. The problem they had, is that the execution plan included a spool operator, that is a worktable introduced by the optimizer. The moral: you know what is best until you have benchmarked. To run the test, I used an inflated version of the Northwind database, that I call Northgale. You find the script for it at http://www.sommarskog.se/dynsearch/Northgale.sql. The test script with the six queries that I tested is attached. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
|
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
Erland Sommarskog (esquel@sommarskog.se) writes:
> To wit, the fastest solution was one that I wrote that uses a temp > table to hold the subtotals, and then I compute the grand totals from > that one. But the really big surprise was that a fairly naïve query > where I scan the table twice, once for the subtotals and one for the > grand totals was a lot faster than the queries that were one-pass > without using a temp table. The problem they had, is that the execution > plan included a spool operator, that is a worktable introduced by the > optimizer. But that was on SQL 2005. On SQL 2008 (currently in beta) solutions that used WITH ROLLUP or GROUPING SETS were twice as fast as my naïve query, and about the same speed as the temp table batch. Also, antother query that was fairly slow, was just a tad slower than the temp table on SQL 2008. > The moral[e]: you know [don't] what is best until you have benchmarked. Another morale: whatever conclusions you draw about performance applies only to that version of SQL Server. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx |
|
![]() |
| Outils de la discussion | |
|
|