PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Autres forums > Forum Programmation & Conception > mysql.general > Two/more seperately unique columns in a table?
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
Two/more seperately unique columns in a table?

Réponse
 
LinkBack Outils de la discussion
Vieux 31/08/2007, 11h21   #1
Esbach, Brandon
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Two/more seperately unique columns in a table?

Hi all,

I'm guessing this is a fairly common question, but I've drawn a blank so
far with all workarounds possible.
Here's the rough scenario:
We have data streaming in constantly from various offsite locations.
This comprises of several details: a session, a unit of that session,
and a measurement of that unit.
Data is captured and transferred on a timed process, so often data is
repeated in the data transfer packets. This repeating is unavoidable as
the software used to capture dumps this data for each measurement and
each unit for the session it's working on.

Due to the volume, a bulk update is done using "values()" with an insert
statement.

Unfortunately, often there are repeats of either session, unit, or
measurement (as there could be new measurement or unit for the capture
that needs to be associated to the session).
The problem I've been experiencing is fairly straightforward (I hope): I
have two, sometimes three columns in any given record that need to
always be unique. This comprises an ID (the key column), and one
(depending on the table, sometimes two) GUIDs which should be unique at
all times for the entire table.

I've tried setting the additional columns to be a primary key (which in
turn sets them up to be "unique" when viewed under "Schema Indices" on
the MySQL Administrator tool); however this does not give an error (or
fail silently with "insert ignore") when I insert a duplicate - mySQL
seems quite happy to add the duplicate record.

At the moment, I'm running a process in the beginning which simply
gathers all guids from the database and compares them as it runs through
the data (then adds new ones as it runs).. This is hardly reliable, and
also means starting the service would take several hours to gather the
existing guids at current data levels... almost frightening to think,
what will end up happening as the data expands.

I'm hoping that I'm just missing something really daft and that there is
a much easier way to ensure several columns are always unique in my
table, while still benefitting from the bulk load insert?


Details on installation used:
MySQL version: v 4.1.14 (scheduled for upgrade in 2008)
OS: Windows Server 2003 (std edition)
Memory: 2GB



  Réponse avec citation
Vieux 31/08/2007, 14h27   #2
Baron Schwartz
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Two/more seperately unique columns in a table?

Hi,

Esbach, Brandon wrote:
> Hi all,
>
> I'm guessing this is a fairly common question, but I've drawn a blank so
> far with all workarounds possible.
> Here's the rough scenario:
> We have data streaming in constantly from various offsite locations.
> This comprises of several details: a session, a unit of that session,
> and a measurement of that unit.
> Data is captured and transferred on a timed process, so often data is
> repeated in the data transfer packets. This repeating is unavoidable as
> the software used to capture dumps this data for each measurement and
> each unit for the session it's working on.
>
> Due to the volume, a bulk update is done using "values()" with an insert
> statement.
>
> Unfortunately, often there are repeats of either session, unit, or
> measurement (as there could be new measurement or unit for the capture
> that needs to be associated to the session).
> The problem I've been experiencing is fairly straightforward (I hope): I
> have two, sometimes three columns in any given record that need to
> always be unique. This comprises an ID (the key column), and one
> (depending on the table, sometimes two) GUIDs which should be unique at
> all times for the entire table.
>
> I've tried setting the additional columns to be a primary key (which in
> turn sets them up to be "unique" when viewed under "Schema Indices" on
> the MySQL Administrator tool); however this does not give an error (or
> fail silently with "insert ignore") when I insert a duplicate - mySQL
> seems quite happy to add the duplicate record.
>
> At the moment, I'm running a process in the beginning which simply
> gathers all guids from the database and compares them as it runs through
> the data (then adds new ones as it runs).. This is hardly reliable, and
> also means starting the service would take several hours to gather the
> existing guids at current data levels... almost frightening to think,
> what will end up happening as the data expands.


It sounds like you need a separate primary key and unique index:

create table t (
id int not null,
guid char(32) not null,
unique key (guid),
primary key(id)
);

Then you can do REPLACE or IGNORE with the LOAD DATA INFILE. I can't
tell if you are actually using LOAD DATA INFILE or if your "bulk load"
is a big INSERT statement. If you're using an INSERT with multiple
VALUES() sections, you can also use ON DUPLICATE KEY UPDATE.

I agree the current strategy won't hold up well over time.

Baron
  Réponse avec citation
Vieux 31/08/2007, 14h46   #3
Esbach, Brandon
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut RE: Two/more seperately unique columns in a table?

Hi, and thanks Baron;
I should have been a bit clearer on the bulk insert - I am using a bulk
insert statement, as you assumed.
I'll put this onto the db server and check, I think that's a more future
proof method. Will this affect any of my linked tables (linked via the
row's primary key(id))?

-----Original Message-----
From: Baron Schwartz [mailto:baron@xaprb.com]
Sent: 31 August 2007 14:28
To: Esbach, Brandon
Cc: MySQL User Group
Subject: Re: Two/more seperately unique columns in a table?

Hi,

Esbach, Brandon wrote:
> Hi all,
>
> I'm guessing this is a fairly common question, but I've drawn a blank
> so far with all workarounds possible.
> Here's the rough scenario:
> We have data streaming in constantly from various offsite locations.
> This comprises of several details: a session, a unit of that session,
> and a measurement of that unit.
> Data is captured and transferred on a timed process, so often data is
> repeated in the data transfer packets. This repeating is unavoidable
> as the software used to capture dumps this data for each measurement
> and each unit for the session it's working on.
>
> Due to the volume, a bulk update is done using "values()" with an
> insert statement.
>
> Unfortunately, often there are repeats of either session, unit, or
> measurement (as there could be new measurement or unit for the capture


> that needs to be associated to the session).
> The problem I've been experiencing is fairly straightforward (I hope):


> I have two, sometimes three columns in any given record that need to
> always be unique. This comprises an ID (the key column), and one
> (depending on the table, sometimes two) GUIDs which should be unique
> at all times for the entire table.
>
> I've tried setting the additional columns to be a primary key (which
> in turn sets them up to be "unique" when viewed under "Schema Indices"


> on the MySQL Administrator tool); however this does not give an error
> (or fail silently with "insert ignore") when I insert a duplicate -
> mySQL seems quite happy to add the duplicate record.
>
> At the moment, I'm running a process in the beginning which simply
> gathers all guids from the database and compares them as it runs
> through the data (then adds new ones as it runs).. This is hardly
> reliable, and also means starting the service would take several hours


> to gather the existing guids at current data levels... almost
> frightening to think, what will end up happening as the data expands.


It sounds like you need a separate primary key and unique index:

create table t (
id int not null,
guid char(32) not null,
unique key (guid),
primary key(id)
);

Then you can do REPLACE or IGNORE with the LOAD DATA INFILE. I can't
tell if you are actually using LOAD DATA INFILE or if your "bulk load"
is a big INSERT statement. If you're using an INSERT with multiple
VALUES() sections, you can also use ON DUPLICATE KEY UPDATE.

I agree the current strategy won't hold up well over time.

Baron
  Réponse avec citation
Vieux 04/09/2007, 11h29   #4
Esbach, Brandon
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut RE: Two/more seperately unique columns in a table?

Baron,

Just feedback - worked perfectly.
Thanks again!

-----Original Message-----
From: Esbach, Brandon
Sent: 31 August 2007 14:46
To: Baron Schwartz
Cc: MySQL User Group
Subject: RE: Two/more seperately unique columns in a table?

Hi, and thanks Baron;
I should have been a bit clearer on the bulk insert - I am using a bulk
insert statement, as you assumed.
I'll put this onto the db server and check, I think that's a more future
proof method. Will this affect any of my linked tables (linked via the
row's primary key(id))?

-----Original Message-----
From: Baron Schwartz [mailto:baron@xaprb.com]
Sent: 31 August 2007 14:28
To: Esbach, Brandon
Cc: MySQL User Group
Subject: Re: Two/more seperately unique columns in a table?

Hi,

Esbach, Brandon wrote:
> Hi all,
>
> I'm guessing this is a fairly common question, but I've drawn a blank
> so far with all workarounds possible.
> Here's the rough scenario:
> We have data streaming in constantly from various offsite locations.
> This comprises of several details: a session, a unit of that session,
> and a measurement of that unit.
> Data is captured and transferred on a timed process, so often data is
> repeated in the data transfer packets. This repeating is unavoidable
> as the software used to capture dumps this data for each measurement
> and each unit for the session it's working on.
>
> Due to the volume, a bulk update is done using "values()" with an
> insert statement.
>
> Unfortunately, often there are repeats of either session, unit, or
> measurement (as there could be new measurement or unit for the capture


> that needs to be associated to the session).
> The problem I've been experiencing is fairly straightforward (I hope):


> I have two, sometimes three columns in any given record that need to
> always be unique. This comprises an ID (the key column), and one
> (depending on the table, sometimes two) GUIDs which should be unique
> at all times for the entire table.
>
> I've tried setting the additional columns to be a primary key (which
> in turn sets them up to be "unique" when viewed under "Schema Indices"


> on the MySQL Administrator tool); however this does not give an error
> (or fail silently with "insert ignore") when I insert a duplicate -
> mySQL seems quite happy to add the duplicate record.
>
> At the moment, I'm running a process in the beginning which simply
> gathers all guids from the database and compares them as it runs
> through the data (then adds new ones as it runs).. This is hardly
> reliable, and also means starting the service would take several hours


> to gather the existing guids at current data levels... almost
> frightening to think, what will end up happening as the data expands.


It sounds like you need a separate primary key and unique index:

create table t (
id int not null,
guid char(32) not null,
unique key (guid),
primary key(id)
);

Then you can do REPLACE or IGNORE with the LOAD DATA INFILE. I can't
tell if you are actually using LOAD DATA INFILE or if your "bulk load"
is a big INSERT statement. If you're using an INSERT with multiple
VALUES() sections, you can also use ON DUPLICATE KEY UPDATE.

I agree the current strategy won't hold up well over time.

Baron

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/mysql?unsub=e...lectronics.com
  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 05h56.


Édité par : vBulletin® version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,13579 seconds with 12 queries