PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Autres forums > Forum Programmation & Conception > comp.databases.mysql > Distance and database
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
Distance and database

Réponse
 
LinkBack Outils de la discussion
Vieux 07/10/2008, 18h40   #1
Alain Reymond
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Distance and database

Hello,

I have a small problem between maths and databases.

I have a table in a database with 55 columns of reals and at least 200.000 lines (but up to 2.000.000 if I upload everything). So, I am in a 55 dimensions space with as many points as lines in my database.

Here is my problem : I try to find the closest neighbour (limited to 1.000 points) of a point taken at random in the database. The distance is a classical euclidian distance in a 55 dimensions space.

The brutal method is to browse the entire database, calculate the distance to the point for each row, order it and limit to the first 1000 rows. This is a single SQL instruction but it consumes a lot of resources.

I suppose that it is possible to reduce the amount of calculus by eliminating points for which we know that they are to far away. Let's take an analogy : if you want to evaluate the closest cities to Chamonix - the famous French ski station! -, you will eliminate the cities situated in the south hemisphere as you know before evaluating anything that they are to far.

Are there techniques to manage that kind of problem ?

Best regards


Alain
  Réponse avec citation
Vieux 13/10/2008, 13h24   #2
sheldonlg
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Distance and database

Alain Reymond wrote:
> Hello,
>
> I have a small problem between maths and databases.
>
> I have a table in a database with 55 columns of reals and at least 200.000 lines (but up to 2.000.000 if I upload everything). So, I am in a 55 dimensions space with as many points as lines in my database.
>
> Here is my problem : I try to find the closest neighbour (limited to 1.000 points) of a point taken at random in the database. The distance is a classical euclidian distance in a 55 dimensions space.
>
> The brutal method is to browse the entire database, calculate the distance to the point for each row, order it and limit to the first 1000 rows. This is a single SQL instruction but it consumes a lot of resources.
>
> I suppose that it is possible to reduce the amount of calculus by eliminating points for which we know that they are to far away. Let's take an analogy : if you want to evaluate the closest cities to Chamonix - the famous French ski station! -, you will eliminate the cities situated in the south hemisphere as you know before evaluating anything that they are to far.
>
> Are there techniques to manage that kind of problem ?
>
> Best regards
>
>
> Alain


What is in the 55 columns? If you are talking about the distance
between places on the earth, then there are only two columns of
importance -- latitude and longitude.

What is the purpose of this calculation? Is this a homework problem,
perhaps?
  Réponse avec citation
Vieux 13/10/2008, 16h41   #3
Jerry Stuckle
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Distance and database

sheldonlg wrote:
> Alain Reymond wrote:
>> Hello,
>>
>> I have a small problem between maths and databases.
>>
>> I have a table in a database with 55 columns of reals and at least
>> 200.000 lines (but up to 2.000.000 if I upload everything). So, I am
>> in a 55 dimensions space with as many points as lines in my database.
>>
>> Here is my problem : I try to find the closest neighbour (limited to
>> 1.000 points) of a point taken at random in the database. The distance
>> is a classical euclidian distance in a 55 dimensions space.
>>
>> The brutal method is to browse the entire database, calculate the
>> distance to the point for each row, order it and limit to the first
>> 1000 rows. This is a single SQL instruction but it consumes a lot of
>> resources.
>> I suppose that it is possible to reduce the amount of calculus by
>> eliminating points for which we know that they are to far away. Let's
>> take an analogy : if you want to evaluate the closest cities to
>> Chamonix - the famous French ski station! -, you will eliminate the
>> cities situated in the south hemisphere as you know before evaluating
>> anything that they are to far.
>>
>> Are there techniques to manage that kind of problem ?
>>
>> Best regards
>>
>>
>> Alain

>
> What is in the 55 columns? If you are talking about the distance
> between places on the earth, then there are only two columns of
> importance -- latitude and longitude.
>
> What is the purpose of this calculation? Is this a homework problem,
> perhaps?
>


As he said - this is 55 dimension space. Sounds like an advanced
math/physics problem.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

  Réponse avec citation
Vieux 13/10/2008, 18h54   #4
Alain Reymond
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Distance and database

Jerry Stuckle a écrit :
> sheldonlg wrote:
>> Alain Reymond wrote:
>>> Hello,
>>>
>>> I have a small problem between maths and databases.
>>>
>>> I have a table in a database with 55 columns of reals and at least
>>> 200.000 lines (but up to 2.000.000 if I upload everything). So, I am
>>> in a 55 dimensions space with as many points as lines in my database.
>>>
>>> Here is my problem : I try to find the closest neighbour (limited to
>>> 1.000 points) of a point taken at random in the database. The
>>> distance is a classical euclidian distance in a 55 dimensions space.
>>>
>>> The brutal method is to browse the entire database, calculate the
>>> distance to the point for each row, order it and limit to the first
>>> 1000 rows. This is a single SQL instruction but it consumes a lot of
>>> resources.
>>> I suppose that it is possible to reduce the amount of calculus by
>>> eliminating points for which we know that they are to far away.
>>> Let's take an analogy : if you want to evaluate the closest cities
>>> to Chamonix - the famous French ski station! -, you will eliminate
>>> the cities situated in the south hemisphere as you know before
>>> evaluating anything that they are to far.
>>>
>>> Are there techniques to manage that kind of problem ?
>>>
>>> Best regards
>>>
>>>
>>> Alain

>>
>> What is in the 55 columns? If you are talking about the distance
>> between places on the earth, then there are only two columns of
>> importance -- latitude and longitude.
>>
>> What is the purpose of this calculation? Is this a homework problem,
>> perhaps?
>>

>
> As he said - this is 55 dimension space. Sounds like an advanced
> math/physics problem.


Thanks for your answer.
In fact, there is a solution using kd-trees.

Best regards

Alain
  Réponse avec citation
Vieux 13/10/2008, 21h01   #5
Gordon Burditt
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Distance and database

>> I have a small problem between maths and databases.
>>
>> I have a table in a database with 55 columns of reals and at least

>200.000 lines (but up to 2.000.000 if I upload everything). So, I am in
>a 55 dimensions space with as many points as lines in my database.
>>
>> Here is my problem : I try to find the closest neighbour (limited to

>1.000 points) of a point taken at random in the database. The distance
>is a classical euclidian distance in a 55 dimensions space.


If there's a limit on the distance then there's a limit on the
difference in each individual dimension (the "bounding box").
How many points can you eliminate with a clause:
WHERE x13 BETWEEN 23.15 AND 25.15
? Even if you don't do this on all 55 dimensions, try doing it on
a couple with the most variation in range. Put an index on each
of those columns. Hope that MySQL will use it before applying the
computationally expensive formula to the rows. I believe MySQL will
only use one index but it should pick the best one.

>> The brutal method is to browse the entire database, calculate the

>distance to the point for each row, order it and limit to the first 1000
>rows. This is a single SQL instruction but it consumes a lot of
>resources.



>> I suppose that it is possible to reduce the amount of calculus by

>eliminating points for which we know that they are to far away. Let's
>take an analogy : if you want to evaluate the closest cities to Chamonix
>- the famous French ski station! -, you will eliminate the cities
>situated in the south hemisphere as you know before evaluating anything
>that they are to far.


Exactly - but you need to know that your entire database isn't from
Australia before you do that, or have an explicit distance limit.

>> Are there techniques to manage that kind of problem ?

  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 01h30.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 1,98751 seconds with 13 queries