|
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Maybe not specific related to C++ but inspired by the recent discussions
about unordered_set in the upcoming C++x0 release. When using universal hashing a new hash-function from a family of hash-functions is used to map a key into the table. Lets assume that hash-function H1 has been used to map key k1 to the table and H2 to map k2 to the table. I would now like to make a lookup on k1. But from the point of view of lookup how does it know which hash-function to use? |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
On Apr 7, 2:45 pm, desktop <asd...@asd.com> wrote:
> Maybe not specific related to C++ but inspired by the recent discussions > about unordered_set in the upcoming C++x0 release. > > When using universal hashing a new hash-function from a family of > hash-functions is used to map a key into the table. > > Lets assume that hash-function H1 has been used to map key k1 to the > table and H2 to map k2 to the table. > > I would now like to make a lookup on k1. But from the point of view of > lookup how does it know which hash-function to use? Presumably this hash-function would be a template parameter to the std::unordered_set, and would therefore be the same for all elements in the set. Using different hash-function when inserting elements defeats the purpose of hashing. The way that today's std::set<> and std::map<> work are by specifying a predicate operator for equivalence testing is very similar to the above argument. T |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
tommy.hinks@gmail.com wrote:
> On Apr 7, 2:45 pm, desktop <asd...@asd.com> wrote: >> Maybe not specific related to C++ but inspired by the recent discussions >> about unordered_set in the upcoming C++x0 release. >> >> When using universal hashing a new hash-function from a family of >> hash-functions is used to map a key into the table. >> >> Lets assume that hash-function H1 has been used to map key k1 to the >> table and H2 to map k2 to the table. >> >> I would now like to make a lookup on k1. But from the point of view of >> lookup how does it know which hash-function to use? > > Presumably this hash-function would be a template parameter to the > std::unordered_set, and would therefore be the same for all elements > in the set. Using different hash-function when inserting elements > defeats the purpose of hashing. The way that today's std::set<> and > std::map<> work are by specifying a predicate operator for equivalence > testing is very similar to the above argument. > > T Universal hashing as I understand is used to avoid that all elements (in theory possible) hash to the same "bucket". Its a method to distribute the keys as even as possible. This means that you don't provide a single hash-function as a template parameter but a family of hash-functions. But I still don't see how this can work when you make a lookup. |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
> Lets assume that hash-function H1 has been used to map key k1 to the
> table and H2 to map k2 to the table. > > I would now like to make a lookup on k1. But from the point of view of > lookup how does it know which hash-function to use? Read the TR1 specs here: http://www.open-std.org/jtc1/sc22/wg...2005/n1836.pdf Specifically page 98 has the unordered_set<> header: namespace std { namespace tr1 { // [6.3.4.3] Class template unordered_set template <class Value, class Hash = hash<Value>, class Pred = std::equal_to<Value>, class Alloc = std::allocator<Value> > class unordered_set; section 6.3.1 describes the hash as 3. Each unordered associative container is parameterized by Key, by a function object Hash that acts as a hash function for values of type Key, and by a binary predicate Pred that induces an equivalence relation on values of type Key. Additionally, unordered_map and unordered_multimap associate an arbitrary mapped type T with the Key. 4. A hash function is a function object that takes a single argument of type Key and returns a value of type std::size_t. So to answer your question, the container knows at compile time which hash function to use. marioc@computer.org http://securitymario.spaces.live.com/ |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
outlaw wrote:
>> Lets assume that hash-function H1 has been used to map key k1 to the >> table and H2 to map k2 to the table. >> >> I would now like to make a lookup on k1. But from the point of view of >> lookup how does it know which hash-function to use? > > > Read the TR1 specs here: http://www.open-std.org/jtc1/sc22/wg...2005/n1836.pdf > Specifically page 98 has the unordered_set<> header: > > > namespace std { > namespace tr1 { > // [6.3.4.3] Class template unordered_set > template <class Value, > class Hash = hash<Value>, > class Pred = std::equal_to<Value>, > class Alloc = std::allocator<Value> > > class unordered_set; > > section 6.3.1 describes the hash as > > 3. > Each unordered associative container is parameterized by Key, by a > function object Hash that acts as a hash function > for values of type Key, and by a binary predicate Pred that induces an > equivalence relation on values of type Key. > Additionally, unordered_map and unordered_multimap associate an > arbitrary mapped type T with the Key. > > 4. > A hash function is a function object that takes a single argument of > type Key and returns a value of type std::size_t. > > So to answer your question, the container knows at compile time which > hash function to use. > > marioc@computer.org > http://securitymario.spaces.live.com/ But thats not very universal. The point with universal hashing is to randomly select a hash-function each time an operation is made (like insert). This is described in Introduction to Algorithms in Cormen et. al. |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
desktop wrote:
> outlaw wrote: >>> Lets assume that hash-function H1 has been used to map key k1 to the >>> table and H2 to map k2 to the table. >>> >>> I would now like to make a lookup on k1. But from the point of view of >>> lookup how does it know which hash-function to use? >> >> >> Read the TR1 specs here: >> http://www.open-std.org/jtc1/sc22/wg...2005/n1836.pdf >> Specifically page 98 has the unordered_set<> header: >> >> >> namespace std { >> namespace tr1 { >> // [6.3.4.3] Class template unordered_set >> template <class Value, >> class Hash = hash<Value>, >> class Pred = std::equal_to<Value>, >> class Alloc = std::allocator<Value> > >> class unordered_set; >> >> section 6.3.1 describes the hash as >> >> 3. >> Each unordered associative container is parameterized by Key, by a >> function object Hash that acts as a hash function >> for values of type Key, and by a binary predicate Pred that induces an >> equivalence relation on values of type Key. >> Additionally, unordered_map and unordered_multimap associate an >> arbitrary mapped type T with the Key. >> >> 4. >> A hash function is a function object that takes a single argument of >> type Key and returns a value of type std::size_t. >> >> So to answer your question, the container knows at compile time which >> hash function to use. >> >> marioc@computer.org >> http://securitymario.spaces.live.com/ > > But thats not very universal. The point with universal hashing is to > randomly select a hash-function each time an operation is made (like > insert). This is described in Introduction to Algorithms in Cormen et. al. > I think you misunderstand universal hashing. You don't choose a new hash function for each insertion. Each time you instantiate a new hash table (e.g. each time the program is run), you choose a new hash function. The general idea, in somewhat less than precise terms, is that any given hash function (even one belonging to a universal class) has a set of "bad" inputs. However, by choosing a random hash function each time we create a hash table, we can say that the probability is low that any particular input will be bad. Choosing from a universal class just lets us do some math to show what exactly what is mean by "the probability is low". As a concrete example, compilers commonly store the symbol table for the code they are compiling in a hash table. If a deterministic hash function was used, this would mean there would be some program that would compile very slowly because the variable names the programmer chose happen to have a high number of collisions. Choosing a random hash function, though, means that no such program exists. It is possible that the hash function that is randomly chosen happens to be poor for a particular program that is being compiled, but by carefully selecting the family of hash functions we are choosing from (a universal class) we limit the probability of that occurring. -- Alan Johnson |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
> You don't choose a new hash function for each insertion. Each time you
> instantiate a new hash table (e.g. each time the program is run), you > choose a new hash function. No, but the type has to be known at compile time with C++ generic programming. Now, What you do in your datatype at runtime is up to you, if you wanted to simply "return 1;" from the hash function each time, or return BCryptGenRandom() each time, so be it. You'd have a linear time ordered container with the former; and constant time inserts with the latter, without an easy way of retrieving it afterwards. The OP seemed to want to do something akin to std::tr1::unordered_set<HisObjs, new HisHash<HisObjs> >; which isn't syntactically correct. |
|
![]() |
| Outils de la discussion | |
|
|