|
|
|
|
||||||
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
I haven't found the information I seek in the manual yet, but I
continue to look. I am hoping that someone will give me a clue in the mean time; hoping to shorten the time I need to fix this. I tried the following: SELECT *, max(price_date) FROM prices GROUP BY product_id WHERE product_id = 1 AND price_date = max(price_date); There is no knowing what the latest date is for a given item in the DB. But if I don't have the GROUP BY clause I get an error about misusing max(), and using it, I still get an uninformative error about my WHERE clause. What I want ought to be obvious: the latest price for a given product. How do I fix this SELECT statement to get what I want? Thanks Ted |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
Ted wrote:
> I haven't found the information I seek in the manual yet, but I > continue to look. I am hoping that someone will give me a clue in the > mean time; hoping to shorten the time I need to fix this. > > I tried the following: > > SELECT *, max(price_date) FROM prices GROUP BY product_id WHERE > product_id = 1 AND price_date = max(price_date); > > There is no knowing what the latest date is for a given item in the > DB. > > But if I don't have the GROUP BY clause I get an error about misusing > max(), and using it, I still get an uninformative error about my WHERE > clause. > > What I want ought to be obvious: the latest price for a given > product. How do I fix this SELECT statement to get what I want? > > Thanks > > Ted Search the archives of this group for "strawberry query" |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
On Oct 30, 3:43 pm, "Paul Lautman" <paul.laut...@btinternet.com>
wrote: > Ted wrote: > > I haven't found the information I seek in the manual yet, but I > > continue to look. I am hoping that someone will give me a clue in the > > mean time; hoping to shorten the time I need to fix this. > > > I tried the following: > > > SELECT *, max(price_date) FROM prices GROUP BY product_id WHERE > > product_id = 1 AND price_date = max(price_date); > > > There is no knowing what the latest date is for a given item in the > > DB. > > > But if I don't have the GROUP BY clause I get an error about misusing > > max(), and using it, I still get an uninformative error about my WHERE > > clause. > > > What I want ought to be obvious: the latest price for a given > > product. How do I fix this SELECT statement to get what I want? > > > Thanks > > > Ted > > Search the archives of this group for "strawberry query"- Hide quoted text - > > - Show quoted text - Thanks Paul, My simple distillation of that, and a number of related threads, resulted in the following: SELECT * FROM prices A WHERE price_date = (SELECT MAX(price_date) FROM prices B WHERE A.product_id = B.product_id AND A.product_id = 1); This seems to work fine. Thanks again Ted |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
On Tue, 30 Oct 2007 21:17:01 +0100, Ted <r.ted.byers@rogers.com> wrote:
> On Oct 30, 3:43 pm, "Paul Lautman" <paul.laut...@btinternet.com> > wrote: >> Ted wrote: >> > I haven't found the information I seek in the manual yet, but I >> > continue to look. I am hoping that someone will give me a clue in the >> > mean time; hoping to shorten the time I need to fix this. >> >> > I tried the following: >> >> > SELECT *, max(price_date) FROM prices GROUP BY product_id WHERE >> > product_id = 1 AND price_date = max(price_date); >> >> > There is no knowing what the latest date is for a given item in the >> > DB. >> >> > But if I don't have the GROUP BY clause I get an error about misusing >> > max(), and using it, I still get an uninformative error about my WHERE >> > clause. >> >> > What I want ought to be obvious: the latest price for a given >> > product. How do I fix this SELECT statement to get what I want? >> >> Search the archives of this group for "strawberry query" > > My simple distillation of that, and a number of related threads, > resulted in the following: > > SELECT * FROM prices A > WHERE price_date = > (SELECT MAX(price_date) FROM prices B > WHERE A.product_id = B.product_id AND A.product_id = 1); > This seems to work fine. Hmmm, you searched allright, but you've come to the answer that actually is the one being rectified (well, it works, it can work better though). SELECT A.* FROM prices A LEFT JOIN prices B ON A.product_id = B.product_id AND B.price_date > A.price_date WHERE B.price_date IS NULL AND A.product_id = 1; (allthough this way has more advantages when not retrieving a single product_id, but all 'last prices' from prices: SELECT A.* FROM prices A LEFT JOIN prices B ON A.product_id = B.product_id AND B.price_date > A.price_date WHERE B.price_date IS NULL ORDER BY product_id ![]() -- Rik Wasmus |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
On Oct 30, 6:11 pm, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:
> On Tue, 30 Oct 2007 21:17:01 +0100, Ted <r.ted.by...@rogers.com> wrote: > > On Oct 30, 3:43 pm, "Paul Lautman" <paul.laut...@btinternet.com> > > wrote: > >> Ted wrote: > >> > I haven't found the information I seek in the manual yet, but I > >> > continue to look. I am hoping that someone will give me a clue in the > >> > mean time; hoping to shorten the time I need to fix this. > > >> > I tried the following: > > >> > SELECT *, max(price_date) FROM prices GROUP BY product_id WHERE > >> > product_id = 1 AND price_date = max(price_date); > > >> > There is no knowing what the latest date is for a given item in the > >> > DB. > > >> > But if I don't have the GROUP BY clause I get an error about misusing > >> > max(), and using it, I still get an uninformative error about my WHERE > >> > clause. > > >> > What I want ought to be obvious: the latest price for a given > >> > product. How do I fix this SELECT statement to get what I want? > > >> Search the archives of this group for "strawberry query" > > > My simple distillation of that, and a number of related threads, > > resulted in the following: > > > SELECT * FROM prices A > > WHERE price_date = > > (SELECT MAX(price_date) FROM prices B > > WHERE A.product_id = B.product_id AND A.product_id = 1); > > This seems to work fine. > > Hmmm, you searched allright, but you've come to the answer that actually > is the one being rectified (well, it works, it can work better though). > > SELECT A.* > FROM prices A > LEFT JOIN prices B > ON A.product_id = B.product_id > AND B.price_date > A.price_date > WHERE B.price_date IS NULL AND A.product_id = 1; > > (allthough this way has more advantages when not retrieving a single > product_id, but all 'last prices' from prices: > SELECT A.* > FROM prices A > LEFT JOIN prices B > ON A.product_id = B.product_id > AND B.price_date > A.price_date > WHERE B.price_date IS NULL > ORDER BY product_id ![]() > -- > Rik Wasmus- Hide quoted text - > > - Show quoted text - OK, but I don't see how your modification can work if price_date can not be NULL. And here is a related problem: I have the following stored procedures: DROP PROCEDURE IF EXISTS `get_etf_stocks`; CREATE PROCEDURE `get_etf_stocks`(id INTEGER) BEGIN SELECT stocks.stock_id AS stock_id, stocks.symbol AS symbol, stocks.`name` AS stock_name FROM `etf_stocks` LEFT JOIN stocks ON etf_stocks.stock_id = stocks.stock_id WHERE etf_id = id; END // CREATE PROCEDURE `get_latest_stock_prices`(id INTEGER) BEGIN SELECT `close` AS close_price FROM stockprices A WHERE price_date = (SELECT MAX(price_date) FROM stockprices B WHERE A.stock_id = B.stock_id AND A.stock_id = id); END // I am finding that the latter stored procedure is a bottle neck when I need to invoke it on each of the stock_ids returned by the former. I am guessing (hoping) that if I can combine them, performance will be considerably improved. Based on what you wrote, I'd hazard a guess that what I need is a modification of your second variant. Right? Thanks, Ted |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
On 31 Oct, 05:22, Ted <r.ted.by...@rogers.com> wrote:
> On Oct 30, 6:11 pm, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote: > > > > > > > On Tue, 30 Oct 2007 21:17:01 +0100, Ted <r.ted.by...@rogers.com> wrote: > > > On Oct 30, 3:43 pm, "Paul Lautman" <paul.laut...@btinternet.com> > > > wrote: > > >> Ted wrote: > > >> > I haven't found the information I seek in the manual yet, but I > > >> > continue to look. I am hoping that someone will give me a clue in the > > >> > mean time; hoping to shorten the time I need to fix this. > > > >> > I tried the following: > > > >> > SELECT *, max(price_date) FROM prices GROUP BY product_id WHERE > > >> > product_id = 1 AND price_date = max(price_date); > > > >> > There is no knowing what the latest date is for a given item in the > > >> > DB. > > > >> > But if I don't have the GROUP BY clause I get an error about misusing > > >> > max(), and using it, I still get an uninformative error about my WHERE > > >> > clause. > > > >> > What I want ought to be obvious: the latest price for a given > > >> > product. How do I fix this SELECT statement to get what I want? > > > >> Search the archives of this group for "strawberry query" > > > > My simple distillation of that, and a number of related threads, > > > resulted in the following: > > > > SELECT * FROM prices A > > > WHERE price_date = > > > (SELECT MAX(price_date) FROM prices B > > > WHERE A.product_id = B.product_id AND A.product_id = 1); > > > This seems to work fine. > > > Hmmm, you searched allright, but you've come to the answer that actually > > is the one being rectified (well, it works, it can work better though). > > > SELECT A.* > > FROM prices A > > LEFT JOIN prices B > > ON A.product_id = B.product_id > > AND B.price_date > A.price_date > > WHERE B.price_date IS NULL AND A.product_id = 1; > > > (allthough this way has more advantages when not retrieving a single > > product_id, but all 'last prices' from prices: > > SELECT A.* > > FROM prices A > > LEFT JOIN prices B > > ON A.product_id = B.product_id > > AND B.price_date > A.price_date > > WHERE B.price_date IS NULL > > ORDER BY product_id ![]() > > -- > > Rik Wasmus- Hide quoted text - > > > - Show quoted text - > > OK, but I don't see how your modification can work if price_date can > not be NULL. price_date will be NULL when the LEFT JOIN fails to find a match. See: http://dev.mysql.com/doc/refman/5.0/...group-row.html for the full explanation. |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
On 31 Oct, 05:22, Ted <r.ted.by...@rogers.com> wrote:
> I am finding that the latter stored procedure is a bottle neck when I > need to invoke it on each of the stock_ids returned by the former. I > am guessing (hoping) that if I can combine them, performance will be > considerably improved. Based on what you wrote, I'd hazard a guess > that what I need is a modification of your second variant. Right? You need to use the proper solution that Rik pointed out. The other way is far too inefficient. |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
On Oct 31, 5:29 am, Captain Paralytic <paul_laut...@yahoo.com> wrote:
> On 31 Oct, 05:22, Ted <r.ted.by...@rogers.com> wrote:> I am finding that the latter stored procedure is a bottle neck when I > > need to invoke it on each of the stock_ids returned by the former. I > > am guessing (hoping) that if I can combine them, performance will be > > considerably improved. Based on what you wrote, I'd hazard a guess > > that what I need is a modification of your second variant. Right? > > You need to use the proper solution that Rik pointed out. The other > way is far too inefficient. Define "proper". I define it as whatever works, and then recognise that there may be several "proper" solutions, some of which being faster than others. I spend most of my time developing C++ or Java code, and generally work towards getting a given function provably correct, and only then follow it by getting it fast. Anyway, using my stored procedure, I get the correct answer in 0.0011s, while with the solution you and Rik recommend, it takes 0.0013s (about 15% slower). In both cases, the number in the following parentheses is about 59s. This is with a table that has more than 112 MB of data. Now, this doesn't seem good enough since the application is quite unresponsive while the stored procedure is called on a few dozen stock_ids. There are about 500 etfs in the database and about 500 stocks, and each etf may have a few dozen stocks. Now, with my combined query: SELECT stocks.stock_id AS stock_id, stocks.symbol AS symbol, stocks.name AS stock_name, A.`close` AS close_price, A.price_date AS price_date FROM etf_stocks LEFT JOIN stocks ON etf_stocks.stock_id = stocks.stock_id LEFT JOIN stockprices A ON stocks.stock_id = A.stock_id LEFT JOIN stockprices B ON A.stock_id = B.stock_id AND B.price_date > A.price_date WHERE B.price_date IS NULL AND etf_id = 397; the MySQL Query Browser says it takes 0.0034s to get the correct 28 rows (but the number within the parentheses that follow that time, is 467s - almost 8 minutes). Now, seeing the restriction on etfid at the end, I thought maybe the following would be quicker. SELECT stocks.stock_id AS stock_id, stocks.symbol AS symbol, stocks.name AS stock_name, A.`close` AS close_price, A.price_date AS price_date FROM etf_stocks LEFT JOIN stocks ON etf_stocks.stock_id = stocks.stock_id AND etf_id = 397 LEFT JOIN stockprices A ON stocks.stock_id = A.stock_id LEFT JOIN stockprices B ON A.stock_id = B.stock_id AND B.price_date > A.price_date WHERE B.price_date IS NULL; The reason I thought this was that I'd thought that putting the restriction earlier in the query would have the last join work on only the 28 stocks in the etf instead of all 500 in the DB. But no, the number within the parentheses is almost exactly the same, and MySQL Query Browser says it took 0.0038s to get the same 28 rows. I don't know if it seems these last queries are taking forever because of the query or the Query Browser, but I can't have my users sitting around for 5 to 10 minutes waiting for results. What makes this all the more puzzling is that in other queries, the amount of data retrieved is substantially greater (e.g. for stock_id = 1, two thirds of a megabyte as opposed to 8 bytes; and some time series have 5 to 10 megabytes of data) and the time to retrieve that numeric data and construct time series plots is generally less than 20s. (e.g. a simple SELECT * FROM stockprices WHERE stock_id = 1, plus the overhead required to create a properly labelled time series chart de novo, is less than 20s). This is in a Java application using JDBC, and Java does not have a reputation for being fast (in contrast to C++). The data from the last two queries I show here are correct, but I need to know if there is a way to make them faster. Embedding the query in a stored procedure does not appear to improve things. In fact, doing so makes it worse, taking 0.0044s (with the value within parentheses being about the same). One oddity is that when I execute the select statement, I get only 28 rows (where etf_id=397), which is correct. When I embed it in a stored procedure, I get the same 28 rows PLUS an additional 466 rows containing NULLs in all fields. The SELECT statement within the stored procedure is identical to that executed directly, so somehow it is handled differently within the stored procedure. I suspect this is part of the reason the stored procedure is about 15% slower. BUT WHY is this happening? Thanks for your Rik and Captain Paralytic! Cheers, Ted |
|
|
|
#9 |
|
Messages: n/a
Hébergeur: |
On Wed, 31 Oct 2007 16:30:29 +0100, Ted <r.ted.byers@rogers.com> wrote:
> Anyway, using my stored procedure, I get the correct answer in > 0.0011s, while with the solution you and Rik recommend, it takes > 0.0013s (about 15% slower). In both cases, the number in the > following parentheses is about 59s. This is with a table that has > more than 112 MB of data. Now, this doesn't seem good enough since > the application is quite unresponsive while the stored procedure is > called on a few dozen stock_ids. > > There are about 500 etfs in the database and about 500 stocks, and > each etf may have a few dozen stocks. Now, with my combined query: > > SELECT stocks.stock_id AS stock_id, > stocks.symbol AS symbol, > stocks.name AS stock_name, > A.`close` AS close_price, > A.price_date AS price_date > FROM etf_stocks LEFT JOIN stocks ON etf_stocks.stock_id = > stocks.stock_id What is the reason for choosing a LEFT JOIN instead of a JOIN here? > LEFT JOIN stockprices A ON stocks.stock_id = A.stock_id If you can rely on the fact that there certainly is a row for every stock_id in etf_stocks in stocks, and there is at least one price for every stock from stocks in stockprices, I'd forget about the 'LEFT' bit in the last two joins. > LEFT JOIN stockprices B ON A.stock_id = B.stock_id > AND B.price_date > A.price_date > WHERE B.price_date IS NULL > AND etf_id = 397; > > the MySQL Query Browser says it takes 0.0034s to get the correct 28 > rows (but the number within the parentheses that follow that time, is > 467s - almost 8 minutes). I've never used that tool. What is the actual time taken, and what does the other one mean? > Now, seeing the restriction on etfid at the end, I thought maybe the > following would be quicker. > > SELECT stocks.stock_id AS stock_id, > stocks.symbol AS symbol, > stocks.name AS stock_name, > A.`close` AS close_price, > A.price_date AS price_date > FROM etf_stocks LEFT JOIN stocks ON etf_stocks.stock_id = > stocks.stock_id AND etf_id = 397 > LEFT JOIN stockprices A ON stocks.stock_id = A.stock_id > LEFT JOIN stockprices B ON A.stock_id = B.stock_id > AND B.price_date > A.price_date > WHERE B.price_date IS NULL; > > > The reason I thought this was that I'd thought that putting the > restriction earlier in the query would have the last join work on only > the 28 stocks in the etf instead of all 500 in the DB. But no, the > number within the parentheses is almost exactly the same, and MySQL > Query Browser says it took 0.0038s to get the same 28 rows. > The data from the last two queries I show here are correct, but I need > to know if there is a way to make them faster. Do an explain on the queries, and check wether it can use proper indexes.. > Embedding the query in > a stored procedure does not appear to improve things. In fact, doing > so makes it worse, taking 0.0044s (with the value within parentheses > being about the same). One oddity is that when I execute the select > statement, I get only 28 rows (where etf_id=397), which is correct. > When I embed it in a stored procedure, I get the same 28 rows PLUS an > additional 466 rows containing NULLs in all fields. (..) BUT WHY is this > happening? No idea, can't see your procedure. -- Rik Wasmus |
|
|
|
#10 |
|
Messages: n/a
Hébergeur: |
On Oct 31, 11:56 am, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:
> On Wed, 31 Oct 2007 16:30:29 +0100, Ted <r.ted.by...@rogers.com> wrote: > > Embedding the query in > > a stored procedure does not appear to improve things. In fact, doing > > so makes it worse, taking 0.0044s (with the value within parentheses > > being about the same). One oddity is that when I execute the select > > statement, I get only 28 rows (where etf_id=397), which is correct. > > When I embed it in a stored procedure, I get the same 28 rows PLUS an > > additional 466 rows containing NULLs in all fields. (..) BUT WHY is this > > happening? > I have primary keys on all the tables in question, and a couple extra indeces. Explain says that in both cases, all are used. Should I take that to mean that it can't be made faster, or are there options involving temporary tables and multiple SQL statements within the stored procedure that may produce faste results (in the same sense that fortran or C++ loops can often be made faster by breaking up complex statements and moving loop invariant expressions to occur prior to entry into the loop)? If so, any thoughts on options to examine? OK, here are the results for explain for the two select statements. Indeces are used throughout. The only difference between the two is that when I have the etf_id clause early in the statement (i.e. on the line with the first left join) it seems it processes all 494 rows in etf_stocks table while if the etf_id clause is last in the WHERE clause, it looks only at the 28 rows with stock_ids for stocks in the specified etf. This makes the odd result for the stored procedure make sense, in that it was the first form I used in that procedure, and I am guessing that MySQL Query Browser ignores rows in which all columns are null. i don't know how else to explain it. I am not sure why restricting the value of etf_id in the first ON clause wought to generate so many rows with all null values, though. Thanks for your . Ted |
|
|
|
#11 |
|
Messages: n/a
Hébergeur: |
On Oct 31, 11:56 am, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:
> On Wed, 31 Oct 2007 16:30:29 +0100, Ted <r.ted.by...@rogers.com> wrote: > > Anyway, using my stored procedure, I get the correct answer in > > 0.0011s, while with the solution you and Rik recommend, it takes > > 0.0013s (about 15% slower). In both cases, the number in the > > following parentheses is about 59s. This is with a table that has > > more than 112 MB of data. Now, this doesn't seem good enough since > > the application is quite unresponsive while the stored procedure is > > called on a few dozen stock_ids. > > > There are about 500 etfs in the database and about 500 stocks, and > > each etf may have a few dozen stocks. Now, with my combined query: > > > SELECT stocks.stock_id AS stock_id, > > stocks.symbol AS symbol, > > stocks.name AS stock_name, > > A.`close` AS close_price, > > A.price_date AS price_date > > FROM etf_stocks LEFT JOIN stocks ON etf_stocks.stock_id = > > stocks.stock_id > > What is the reason for choosing a LEFT JOIN instead of a JOIN here? > > > LEFT JOIN stockprices A ON stocks.stock_id = A.stock_id > > If you can rely on the fact that there certainly is a row for every > stock_id in etf_stocks in stocks, and there is at least one price for > every stock from stocks in stockprices, I'd forget about the 'LEFT' bit in > the last two joins. > > > LEFT JOIN stockprices B ON A.stock_id = B.stock_id > > AND B.price_date > A.price_date > > WHERE B.price_date IS NULL > > AND etf_id = 397; > > > the MySQL Query Browser says it takes 0.0034s to get the correct 28 > > rows (but the number within the parentheses that follow that time, is > > 467s - almost 8 minutes). > > I've never used that tool. What is the actual time taken, and what does > the other one mean? > > > > > > > Now, seeing the restriction on etfid at the end, I thought maybe the > > following would be quicker. > > > SELECT stocks.stock_id AS stock_id, > > stocks.symbol AS symbol, > > stocks.name AS stock_name, > > A.`close` AS close_price, > > A.price_date AS price_date > > FROM etf_stocks LEFT JOIN stocks ON etf_stocks.stock_id = > > stocks.stock_id AND etf_id = 397 > > LEFT JOIN stockprices A ON stocks.stock_id = A.stock_id > > LEFT JOIN stockprices B ON A.stock_id = B.stock_id > > AND B.price_date > A.price_date > > WHERE B.price_date IS NULL; > > > The reason I thought this was that I'd thought that putting the > > restriction earlier in the query would have the last join work on only > > the 28 stocks in the etf instead of all 500 in the DB. But no, the > > number within the parentheses is almost exactly the same, and MySQL > > Query Browser says it took 0.0038s to get the same 28 rows. > > The data from the last two queries I show here are correct, but I need > > to know if there is a way to make them faster. > > Do an explain on the queries, and check wether it can use proper indexes. > > > Embedding the query in > > a stored procedure does not appear to improve things. In fact, doing > > so makes it worse, taking 0.0044s (with the value within parentheses > > being about the same). One oddity is that when I execute the select > > statement, I get only 28 rows (where etf_id=397), which is correct. > > When I embed it in a stored procedure, I get the same 28 rows PLUS an > > additional 466 rows containing NULLs in all fields. (..) BUT WHY is this > > happening? > > No idea, can't see your procedure. > -- > Rik Wasmus- Hide quoted text - > > - Show quoted text -- Hide quoted text - > > - Show quoted text - The LEFT JOIN was used following the example you gave. I suspect the times given first represent CPU time and the times in parentheses represent clock on the wall time (which fits how responsive, or not, the system is when the these queries are executed. My original stored procedure give the best result, of 0.0011 s, but the time in the parentheses is 59s, which best fits how long it takes for the data returned to be displayed (or how long the MySQL Query browser is frozen). For the really long stored procedure, I will recreate it and post the source for it after lunch. Thanks Ted |
|
![]() |
| Outils de la discussion | |
|
|