|
|
|
|
||||||
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Hello:
I have the following table select * from addressbook +----+--------------------+ | id | email | +----+--------------------+ | 1 | yasheshb@gmail.com | | 2 | yasheshb@gmail.com | | 3 | yasheshb@gmail.com | +----+--------------------+ 3 rows in set (0.00 sec) Now i wanted to find a list of duplicate contacts wherein i can get the 1st contact with same email and merge with the others. so in the above case id 1 has duplicates 2 and 3, 2 has 1 & 3 and 3 and 1 & 2. I'm only interested in getting the first set of duplicates i.e. "1 has duplicates 2 & 3". So i tried the query select t1.id as id1, t2.id as id2, t1.email as email1, t2.email as email2 from addressbook t1, addressbook t2 where t1.email = t2.email and t1.id != t2.id order by t1.id +-----+-----+--------------------+--------------------+ | id1 | id2 | email1 | email2 | +-----+-----+--------------------+--------------------+ | 1 | 2 | yasheshb@gmail.com | yasheshb@gmail.com | | 1 | 3 | yasheshb@gmail.com | yasheshb@gmail.com | | 2 | 1 | yasheshb@gmail.com | yasheshb@gmail.com | | 2 | 3 | yasheshb@gmail.com | yasheshb@gmail.com | | 3 | 1 | yasheshb@gmail.com | yasheshb@gmail.com | | 3 | 2 | yasheshb@gmail.com | yasheshb@gmail.com | +-----+-----+--------------------+--------------------+ then i tried the query similar to the one suggested in the MySQL Cookbook Recipe 14.5 select DISTINCT if(t1.id < t2.id, t1.id, t2.id) as id1, if(t1.id < t2.id, t1.email, t2.email) as email1, if(t1.id < t2.id, t2.id, t1.id) as id2, if(t1.id < t2.id, t2.email, t1.email) as email2 from addressbook t1, addressbook t2 where t1.email = t2.email and t1.id != t2.id order by t1.id +-----+--------------------+-----+--------------------+ | id1 | email1 | id2 | email2 | +-----+--------------------+-----+--------------------+ | 1 | yasheshb@gmail.com | 2 | yasheshb@gmail.com | | 1 | yasheshb@gmail.com | 3 | yasheshb@gmail.com | | 2 | yasheshb@gmail.com | 3 | yasheshb@gmail.com | +-----+--------------------+-----+--------------------+ I'm stuck trying to get a query that will give me only +-----+--------------------+-----+--------------------+ | id1 | email1 | id2 | email2 | +-----+--------------------+-----+--------------------+ | 1 | yasheshb@gmail.com | 2 | yasheshb@gmail.com | | 1 | yasheshb@gmail.com | 3 | yasheshb@gmail.com | +-----+--------------------+-----+--------------------+ Any , feeback is deeply appreciated. Thanks a bunch in advance. Yashesh Bhatia |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
Hi Yashesh, all !
Yashesh Bhatia wrote: > Hello: > > I have the following table > > select * from addressbook > +----+--------------------+ > | id | email | > +----+--------------------+ > | 1 | yasheshb@gmail.com | > | 2 | yasheshb@gmail.com | > | 3 | yasheshb@gmail.com | > +----+--------------------+ > 3 rows in set (0.00 sec) > > Now i wanted to find a list of duplicate contacts wherein i can get > the 1st contact with same email > and merge with the others. so in the above case id 1 has duplicates 2 > and 3, 2 has 1 & 3 and 3 and 1 & 2. > > I'm only interested in getting the first set of duplicates i.e. "1 has > duplicates 2 & 3". > > [[...]] > > select DISTINCT if(t1.id < t2.id, t1.id, t2.id) as id1, > if(t1.id < t2.id, t1.email, t2.email) as email1, > if(t1.id < t2.id, t2.id, t1.id) as id2, > if(t1.id < t2.id, t2.email, t1.email) as email2 > from addressbook t1, addressbook t2 > where t1.email = t2.email and t1.id != t2.id > order by t1.id Using and t1.id < t2.id would make your life a lot easier, allow you to drop the "if" and even the "distinct" (assuming your "id" values are unique). > +-----+--------------------+-----+--------------------+ > | id1 | email1 | id2 | email2 | > +-----+--------------------+-----+--------------------+ > | 1 | yasheshb@gmail.com | 2 | yasheshb@gmail.com | > | 1 | yasheshb@gmail.com | 3 | yasheshb@gmail.com | > | 2 | yasheshb@gmail.com | 3 | yasheshb@gmail.com | > +-----+--------------------+-----+--------------------+ > > I'm stuck trying to get a query that will give me only > +-----+--------------------+-----+--------------------+ > | id1 | email1 | id2 | email2 | > +-----+--------------------+-----+--------------------+ > | 1 | yasheshb@gmail.com | 2 | yasheshb@gmail.com | > | 1 | yasheshb@gmail.com | 3 | yasheshb@gmail.com | > +-----+--------------------+-----+--------------------+ > > Any , feeback is deeply appreciated. Thanks a bunch in advance. Tasks related to duplicate values often might be solved by using grouping and groupwise counting: "... GROUP BY ... HAVING COUNT(*) ..." If you don't insist on getting it in this tabular form, you might use "group_concat()". Try along these lines (untested): SELECT MIN(id), email, GROUP_CONCAT(id) FROM addressbook GROUP BY email HAVING COUNT(*) > 1 This would repeat the minimum id ("1" in your example data) in the concatenated list. For tabular form, try along these lines (untested): SELECT MIN(id), email, id FROM addressbook WHERE id > MIN(id) GROUP BY email HAVING COUNT(*) > 1 I wish you success, Joerg -- Joerg Bruehe, Senior Production Engineer MySQL AB, www.mysql.com |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
Taking it step by step, this query will give you all the lowest ids,
for those records with duplicates. SELECT min(id), email, count(*) AS cnt FROM addressbook GROUP BY email HAVING cnt>1 Now think of that query as an already existing table, which you can do, you just need to name the query result, in this case I'm calling it t1. Then you LEFT JOIN it with the addressbook table, but filtering out the ids you already have. SELECT id1, email1, t2.id AS id2, t2.email AS email2 FROM ( SELECT min(id) AS id1, email AS email1, count(*) AS cnt FROM addressbook GROUP BY email HAVING cnt>1 ) AS t1 LEFT JOIN addressbook AS t2 ON t1.email1=t2.email AND t1.id1!=t2.id ORDER BY email1 I haven't tested it, but that query should work and give you the output you want. I don't recall if it works in v4.0, but v4.1 and above should work fine. Brent On Dec 12, 2007, at 8:35 AM, Yashesh Bhatia wrote: > Hello: > > I have the following table > > select * from addressbook > +----+--------------------+ > | id | email | > +----+--------------------+ > | 1 | yasheshb@gmail.com | > | 2 | yasheshb@gmail.com | > | 3 | yasheshb@gmail.com | > +----+--------------------+ > 3 rows in set (0.00 sec) > > Now i wanted to find a list of duplicate contacts wherein i can get > the 1st contact with same email > and merge with the others. so in the above case id 1 has duplicates 2 > and 3, 2 has 1 & 3 and 3 and 1 & 2. > > I'm only interested in getting the first set of duplicates i.e. "1 has > duplicates 2 & 3". > > So i tried the query > select t1.id as id1, t2.id as id2, t1.email as email1, t2.email as > email2 > from addressbook t1, addressbook t2 > where t1.email = t2.email and t1.id != t2.id > order by t1.id > +-----+-----+--------------------+--------------------+ > | id1 | id2 | email1 | email2 | > +-----+-----+--------------------+--------------------+ > | 1 | 2 | yasheshb@gmail.com | yasheshb@gmail.com | > | 1 | 3 | yasheshb@gmail.com | yasheshb@gmail.com | > | 2 | 1 | yasheshb@gmail.com | yasheshb@gmail.com | > | 2 | 3 | yasheshb@gmail.com | yasheshb@gmail.com | > | 3 | 1 | yasheshb@gmail.com | yasheshb@gmail.com | > | 3 | 2 | yasheshb@gmail.com | yasheshb@gmail.com | > +-----+-----+--------------------+--------------------+ > > then i tried the query similar to the one suggested in the MySQL > Cookbook Recipe 14.5 > > select DISTINCT if(t1.id < t2.id, t1.id, t2.id) as id1, > if(t1.id < t2.id, t1.email, t2.email) as email1, > if(t1.id < t2.id, t2.id, t1.id) as id2, > if(t1.id < t2.id, t2.email, t1.email) as email2 > from addressbook t1, addressbook t2 > where t1.email = t2.email and t1.id != t2.id > order by t1.id > +-----+--------------------+-----+--------------------+ > | id1 | email1 | id2 | email2 | > +-----+--------------------+-----+--------------------+ > | 1 | yasheshb@gmail.com | 2 | yasheshb@gmail.com | > | 1 | yasheshb@gmail.com | 3 | yasheshb@gmail.com | > | 2 | yasheshb@gmail.com | 3 | yasheshb@gmail.com | > +-----+--------------------+-----+--------------------+ > > I'm stuck trying to get a query that will give me only > +-----+--------------------+-----+--------------------+ > | id1 | email1 | id2 | email2 | > +-----+--------------------+-----+--------------------+ > | 1 | yasheshb@gmail.com | 2 | yasheshb@gmail.com | > | 1 | yasheshb@gmail.com | 3 | yasheshb@gmail.com | > +-----+--------------------+-----+--------------------+ > > Any , feeback is deeply appreciated. Thanks a bunch in advance. > > Yashesh Bhatia > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/mysql? > unsub=brenttech@gmail.com > |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
Hi all,
I have to follow up on my own mail: Joerg Bruehe wrote: > [[...]] > > For tabular form, try along these lines (untested): > > SELECT MIN(id), email, id > FROM addressbook > WHERE id > MIN(id) > GROUP BY email > HAVING COUNT(*) > 1 I cannot explain what made me write that, other than lack of concentration: This *cannot* achieve what Yashesh asked for, because it creates only one line per group ("email") and so can report only one "id" value. MySQL allows such a statement (an extension over the standard, which does not allow "id" in the column list because it is neither the grouping column nor a group function), but for an "email" that occurs with three (or more) "id" values it will report only one of these duplicates. If such an iterative approach is ok (report one duplicate, handle it by deleting one; repeat that), the standard SQL way would be: SELECT MIN(id), email, MAX(id) FROM addressbook GROUP BY email HAVING COUNT(*) > 1 But if one report should show all duplicates, either the join proposed in another reply or the group_concat() in my first mail should be used. Jörg -- Joerg Bruehe, Senior Production Engineer MySQL AB, www.mysql.com |
|
![]() |
| Outils de la discussion | |
|
|