PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Autres forums > Forum Programmation & Conception > mysql.general > Eliminating duplicates from self join results
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
Eliminating duplicates from self join results

Réponse
 
LinkBack Outils de la discussion
Vieux 12/12/2007, 14h35   #1
Yashesh Bhatia
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Eliminating duplicates from self join results

Hello:

I have the following table

select * from addressbook
+----+--------------------+
| id | email |
+----+--------------------+
| 1 | yasheshb@gmail.com |
| 2 | yasheshb@gmail.com |
| 3 | yasheshb@gmail.com |
+----+--------------------+
3 rows in set (0.00 sec)

Now i wanted to find a list of duplicate contacts wherein i can get
the 1st contact with same email
and merge with the others. so in the above case id 1 has duplicates 2
and 3, 2 has 1 & 3 and 3 and 1 & 2.

I'm only interested in getting the first set of duplicates i.e. "1 has
duplicates 2 & 3".

So i tried the query
select t1.id as id1, t2.id as id2, t1.email as email1, t2.email as email2
from addressbook t1, addressbook t2
where t1.email = t2.email and t1.id != t2.id
order by t1.id
+-----+-----+--------------------+--------------------+
| id1 | id2 | email1 | email2 |
+-----+-----+--------------------+--------------------+
| 1 | 2 | yasheshb@gmail.com | yasheshb@gmail.com |
| 1 | 3 | yasheshb@gmail.com | yasheshb@gmail.com |
| 2 | 1 | yasheshb@gmail.com | yasheshb@gmail.com |
| 2 | 3 | yasheshb@gmail.com | yasheshb@gmail.com |
| 3 | 1 | yasheshb@gmail.com | yasheshb@gmail.com |
| 3 | 2 | yasheshb@gmail.com | yasheshb@gmail.com |
+-----+-----+--------------------+--------------------+

then i tried the query similar to the one suggested in the MySQL
Cookbook Recipe 14.5

select DISTINCT if(t1.id < t2.id, t1.id, t2.id) as id1,
if(t1.id < t2.id, t1.email, t2.email) as email1,
if(t1.id < t2.id, t2.id, t1.id) as id2,
if(t1.id < t2.id, t2.email, t1.email) as email2
from addressbook t1, addressbook t2
where t1.email = t2.email and t1.id != t2.id
order by t1.id
+-----+--------------------+-----+--------------------+
| id1 | email1 | id2 | email2 |
+-----+--------------------+-----+--------------------+
| 1 | yasheshb@gmail.com | 2 | yasheshb@gmail.com |
| 1 | yasheshb@gmail.com | 3 | yasheshb@gmail.com |
| 2 | yasheshb@gmail.com | 3 | yasheshb@gmail.com |
+-----+--------------------+-----+--------------------+

I'm stuck trying to get a query that will give me only
+-----+--------------------+-----+--------------------+
| id1 | email1 | id2 | email2 |
+-----+--------------------+-----+--------------------+
| 1 | yasheshb@gmail.com | 2 | yasheshb@gmail.com |
| 1 | yasheshb@gmail.com | 3 | yasheshb@gmail.com |
+-----+--------------------+-----+--------------------+

Any , feeback is deeply appreciated. Thanks a bunch in advance.

Yashesh Bhatia
  Réponse avec citation
Vieux 12/12/2007, 17h39   #2
Joerg Bruehe
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Eliminating duplicates from self join results

Hi Yashesh, all !


Yashesh Bhatia wrote:
> Hello:
>
> I have the following table
>
> select * from addressbook
> +----+--------------------+
> | id | email |
> +----+--------------------+
> | 1 | yasheshb@gmail.com |
> | 2 | yasheshb@gmail.com |
> | 3 | yasheshb@gmail.com |
> +----+--------------------+
> 3 rows in set (0.00 sec)
>
> Now i wanted to find a list of duplicate contacts wherein i can get
> the 1st contact with same email
> and merge with the others. so in the above case id 1 has duplicates 2
> and 3, 2 has 1 & 3 and 3 and 1 & 2.
>
> I'm only interested in getting the first set of duplicates i.e. "1 has
> duplicates 2 & 3".
>
> [[...]]
>
> select DISTINCT if(t1.id < t2.id, t1.id, t2.id) as id1,
> if(t1.id < t2.id, t1.email, t2.email) as email1,
> if(t1.id < t2.id, t2.id, t1.id) as id2,
> if(t1.id < t2.id, t2.email, t1.email) as email2
> from addressbook t1, addressbook t2
> where t1.email = t2.email and t1.id != t2.id
> order by t1.id


Using and t1.id < t2.id
would make your life a lot easier, allow you to drop the "if" and even
the "distinct" (assuming your "id" values are unique).

> +-----+--------------------+-----+--------------------+
> | id1 | email1 | id2 | email2 |
> +-----+--------------------+-----+--------------------+
> | 1 | yasheshb@gmail.com | 2 | yasheshb@gmail.com |
> | 1 | yasheshb@gmail.com | 3 | yasheshb@gmail.com |
> | 2 | yasheshb@gmail.com | 3 | yasheshb@gmail.com |
> +-----+--------------------+-----+--------------------+
>
> I'm stuck trying to get a query that will give me only
> +-----+--------------------+-----+--------------------+
> | id1 | email1 | id2 | email2 |
> +-----+--------------------+-----+--------------------+
> | 1 | yasheshb@gmail.com | 2 | yasheshb@gmail.com |
> | 1 | yasheshb@gmail.com | 3 | yasheshb@gmail.com |
> +-----+--------------------+-----+--------------------+
>
> Any , feeback is deeply appreciated. Thanks a bunch in advance.


Tasks related to duplicate values often might be solved by using
grouping and groupwise counting: "... GROUP BY ... HAVING COUNT(*) ..."

If you don't insist on getting it in this tabular form, you might use
"group_concat()". Try along these lines (untested):

SELECT MIN(id), email, GROUP_CONCAT(id)
FROM addressbook
GROUP BY email
HAVING COUNT(*) > 1

This would repeat the minimum id ("1" in your example data) in the
concatenated list.


For tabular form, try along these lines (untested):

SELECT MIN(id), email, id
FROM addressbook
WHERE id > MIN(id)
GROUP BY email
HAVING COUNT(*) > 1


I wish you success,
Joerg

--
Joerg Bruehe, Senior Production Engineer
MySQL AB, www.mysql.com
  Réponse avec citation
Vieux 13/12/2007, 03h21   #3
Brent Baisley
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Eliminating duplicates from self join results

Taking it step by step, this query will give you all the lowest ids,
for those records with duplicates.
SELECT min(id), email, count(*) AS cnt
FROM addressbook
GROUP BY email
HAVING cnt>1

Now think of that query as an already existing table, which you can
do, you just need to name the query result, in this case I'm calling
it t1.
Then you LEFT JOIN it with the addressbook table, but filtering out
the ids you already have.
SELECT id1, email1,
t2.id AS id2, t2.email AS email2
FROM
(
SELECT min(id) AS id1, email AS email1, count(*) AS cnt
FROM addressbook
GROUP BY email
HAVING cnt>1
) AS t1
LEFT JOIN
addressbook AS t2
ON t1.email1=t2.email AND t1.id1!=t2.id
ORDER BY email1

I haven't tested it, but that query should work and give you the
output you want. I don't recall if it works in v4.0, but v4.1 and
above should work fine.

Brent


On Dec 12, 2007, at 8:35 AM, Yashesh Bhatia wrote:

> Hello:
>
> I have the following table
>
> select * from addressbook
> +----+--------------------+
> | id | email |
> +----+--------------------+
> | 1 | yasheshb@gmail.com |
> | 2 | yasheshb@gmail.com |
> | 3 | yasheshb@gmail.com |
> +----+--------------------+
> 3 rows in set (0.00 sec)
>
> Now i wanted to find a list of duplicate contacts wherein i can get
> the 1st contact with same email
> and merge with the others. so in the above case id 1 has duplicates 2
> and 3, 2 has 1 & 3 and 3 and 1 & 2.
>
> I'm only interested in getting the first set of duplicates i.e. "1 has
> duplicates 2 & 3".
>
> So i tried the query
> select t1.id as id1, t2.id as id2, t1.email as email1, t2.email as
> email2
> from addressbook t1, addressbook t2
> where t1.email = t2.email and t1.id != t2.id
> order by t1.id
> +-----+-----+--------------------+--------------------+
> | id1 | id2 | email1 | email2 |
> +-----+-----+--------------------+--------------------+
> | 1 | 2 | yasheshb@gmail.com | yasheshb@gmail.com |
> | 1 | 3 | yasheshb@gmail.com | yasheshb@gmail.com |
> | 2 | 1 | yasheshb@gmail.com | yasheshb@gmail.com |
> | 2 | 3 | yasheshb@gmail.com | yasheshb@gmail.com |
> | 3 | 1 | yasheshb@gmail.com | yasheshb@gmail.com |
> | 3 | 2 | yasheshb@gmail.com | yasheshb@gmail.com |
> +-----+-----+--------------------+--------------------+
>
> then i tried the query similar to the one suggested in the MySQL
> Cookbook Recipe 14.5
>
> select DISTINCT if(t1.id < t2.id, t1.id, t2.id) as id1,
> if(t1.id < t2.id, t1.email, t2.email) as email1,
> if(t1.id < t2.id, t2.id, t1.id) as id2,
> if(t1.id < t2.id, t2.email, t1.email) as email2
> from addressbook t1, addressbook t2
> where t1.email = t2.email and t1.id != t2.id
> order by t1.id
> +-----+--------------------+-----+--------------------+
> | id1 | email1 | id2 | email2 |
> +-----+--------------------+-----+--------------------+
> | 1 | yasheshb@gmail.com | 2 | yasheshb@gmail.com |
> | 1 | yasheshb@gmail.com | 3 | yasheshb@gmail.com |
> | 2 | yasheshb@gmail.com | 3 | yasheshb@gmail.com |
> +-----+--------------------+-----+--------------------+
>
> I'm stuck trying to get a query that will give me only
> +-----+--------------------+-----+--------------------+
> | id1 | email1 | id2 | email2 |
> +-----+--------------------+-----+--------------------+
> | 1 | yasheshb@gmail.com | 2 | yasheshb@gmail.com |
> | 1 | yasheshb@gmail.com | 3 | yasheshb@gmail.com |
> +-----+--------------------+-----+--------------------+
>
> Any , feeback is deeply appreciated. Thanks a bunch in advance.
>
> Yashesh Bhatia
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?
> unsub=brenttech@gmail.com
>


  Réponse avec citation
Vieux 13/12/2007, 18h03   #4
Joerg Bruehe
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Eliminating duplicates from self join results

Hi all,


I have to follow up on my own mail:

Joerg Bruehe wrote:
> [[...]]
>
> For tabular form, try along these lines (untested):
>
> SELECT MIN(id), email, id
> FROM addressbook
> WHERE id > MIN(id)
> GROUP BY email
> HAVING COUNT(*) > 1


I cannot explain what made me write that, other than lack of concentration:
This *cannot* achieve what Yashesh asked for, because it creates only
one line per group ("email") and so can report only one "id" value.

MySQL allows such a statement
(an extension over the standard, which does not allow "id" in the column
list because it is neither the grouping column nor a group function),
but for an "email" that occurs with three (or more) "id" values it will
report only one of these duplicates.

If such an iterative approach is ok (report one duplicate, handle it by
deleting one; repeat that), the standard SQL way would be:
SELECT MIN(id), email, MAX(id)
FROM addressbook
GROUP BY email
HAVING COUNT(*) > 1
But if one report should show all duplicates, either the join proposed
in another reply or the group_concat() in my first mail should be used.


Jörg

--
Joerg Bruehe, Senior Production Engineer
MySQL AB, www.mysql.com

  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 04h15.


Édité par : vBulletin® version 3.7.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,20238 seconds with 12 queries