PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Autres forums > Forum Programmation & Conception > comp.lang.cplus > Best algorithm...
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
Best algorithm...

Réponse
 
LinkBack Outils de la discussion
Vieux 05/02/2008, 13h20   #1
ahmadcorp.m@googlemail.com
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Best algorithm...

Hi,

Need some ideas for this problem:

If you had a file containing boundaries [A,B] (up to 25 million) and a
file containing a list of points P (up to 10 billion). What would be
the fastest way to determine which of the listed boundaries contain a
point P?

Standard ugly std::count_if and map inserts and lookup are far to
slow for this problem. Any other ideas?

(problem is a simpler version of a derivative default map issue I'm
having)

Thanks for any .
  Réponse avec citation
Vieux 05/02/2008, 14h18   #2
Phil Endecott
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Best algorithm...

ahmadcorp.m@googlemail.com wrote:
> If you had a file containing boundaries [A,B] (up to 25 million) and a
> file containing a list of points P (up to 10 billion). What would be
> the fastest way to determine which of the listed boundaries contain a
> point P?


The 25 million points will happily fit in memory.

Make a binary tree where each node represents an interval min to max,
and the two sub-trees represent intervals min to mid and mid to max.
Then store intervals that cross mid in the node. Pseudo-code:

class Node {
point_t min;
point_t mid;
point_t max;

Node_ptr leftnode;
Node_ptr rightnode;

set< pair<point_t,point_t> > cross_mid;

public:
Node(point_t min_, point_t max_):
min(min_), mid((min_+max_)/2), max(max_),
leftnode(new Node(min,mid)), // you need to postpone this until
rightnode(new Node(mid,max)) // it's needed
{}

void insert(pair<point_t,point_t> b) {
if (b.second<mid) left->insert(b)
else if (b.first>mid) right->insert(b)
else cross_mid.insert(b);
}

set< pair<point_t,point_t> > find(point_t p) {
return cross_mid.find(p) // left as an exercise :-)
union ( (p<mid) ? leftnode : rightnode ) -> find(p);
}
};


Like all binary trees, this suffers if it becomes unbalanced. Consider
inserting in a random order, perhaps. (Are your files ordered?)

The subdivision will work best if the ranges are fairly evenly
distributed. In this case, the inserts should be O(N log N) and the
lookups each O(log N + M). If the ranges are strongly clustered some
other approach might be better.


Phil.
  Réponse avec citation
Vieux 05/02/2008, 15h29   #3
ahmadcorp.m@googlemail.com
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Best algorithm...

Thanks for the info. I have also considered a binary tree, however I
forgot to mention a key point that might not work for a binary tree.

We need to determine all applicable boundary values for a specific
point. Eg. point 3 is contained within 2000 of the 25 million
boundaries specified.

A binary tree would imply multiple searches to find all applicable
boundaries. Or some trickery based on the depth of node on a tree.
  Réponse avec citation
Vieux 05/02/2008, 18h07   #4
Phil Endecott
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Best algorithm...

ahmadcorp.m@googlemail.com wrote:
> Thanks for the info. I have also considered a binary tree, however I
> forgot to mention a key point that might not work for a binary tree.
>
> We need to determine all applicable boundary values for a specific
> point. Eg. point 3 is contained within 2000 of the 25 million
> boundaries specified.
>
> A binary tree would imply multiple searches to find all applicable
> boundaries.


No, why?

As you descend the tree, you accumulate all the ranges that include your
point.
  Réponse avec citation
Vieux 06/02/2008, 08h00   #5
Dario Saccavino
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Best algorithm...

On 5 Feb, 16:29, "ahmadcor...@googlemail.com"
<ahmadcor...@googlemail.com> wrote:
> Thanks for the info. I have also considered a binary tree, however I
> forgot to mention a key point that might not work for a binary tree.
>
> We need to determine all applicable boundary values for a specific
> point. Eg. point 3 is contained within 2000 of the 25 million
> boundaries specified.
>


After wiki-ing a while, I've found the following:

http://en.wikipedia.org/wiki/Interval_tree
http://en.wikipedia.org/wiki/Segment_tree

Phil's solution is very similar to "Alternative 1" in the first
article and I think it's the most appropriate for your problem.

Dario
  Réponse avec citation
Vieux 06/02/2008, 13h48   #6
ahmadcorp.m@googlemail.com
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Best algorithm...

On 6 Feb, 08:00, Dario Saccavino <kath...@gmail.com> wrote:
> On 5 Feb, 16:29, "ahmadcor...@googlemail.com"
>
> <ahmadcor...@googlemail.com> wrote:
> > Thanks for the info. I have also considered a binary tree, however I
> > forgot to mention a key point that might not work for a binary tree.

>
> > We need to determine all applicable boundary values for a specific
> > point. Eg. point 3 is contained within 2000 of the 25 million
> > boundaries specified.

>
> After wiki-ing a while, I've found the following:
>
> http://en.wikipedia.org/wiki/Interva...i/Segment_tree
>
> Phil's solution is very similar to "Alternative 1" in the first
> article and I think it's the most appropriate for your problem.
>
> Dario


Thanks for the references. The algorithm is much clearer now. Thanks
to all. Immensely appreciate it.
  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 11h21.


Édité par : vBulletin® version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,17324 seconds with 14 queries