|
|
|
#1 |
|
Messages: n/a
Hébergeur: |
I am reading TC++PL3 and on page 685 it is written:
"Producing a good random-number generator isn't easy, and unfortunately not all systems deliver a good rand(). In particular, the low-order bits of a random number are often suspect, so rand() %n is not a good portable way of generating a random number between 0 and n-1. Often, int((double(rand())/RAND_MAX) *n) gives acceptable results. However, to seriously use that formula, we must take care of the minuscule probability that the result will be n". Q1: What are the "low-order bits"? Q2: "we must take care of the minuscule probability that the result will be n". This means the result of rand() call, or the result of the formula? And why? |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
Ioannis Vranos a écrit :
> I am reading TC++PL3 and on page 685 it is written: > > "Producing a good random-number generator isn't easy, and unfortunately > not all systems deliver a good rand(). In particular, the low-order bits > of a random number are often suspect, so rand() %n is not a good > portable way of generating a random number between 0 and n-1. Often, > int((double(rand())/RAND_MAX) *n) gives acceptable results. However, to > seriously use that formula, we must take care of the minuscule > probability that the result will be n". > > > Q1: What are the "low-order bits"? They are the least significant bits of the integer. > > Q2: "we must take care of the minuscule probability that the result will > be n". This means the result of rand() call, or the result of the > formula? And why? rand() can yield RAND_MAX so rand()/RAND_MAX may be one and int((double(rand())/RAND_MAX) *n) may be n. What is wanted is a number in the range [0;n) therefore n should be excluded. Michael |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
Ioannis Vranos wrote:
> I am reading TC++PL3 and on page 685 it is written: > > "Producing a good random-number generator isn't easy, and unfortunately > not all systems deliver a good rand(). In particular, the low-order bits > of a random number are often suspect, so rand() %n is not a good > portable way of generating a random number between 0 and n-1. Often, > int((double(rand())/RAND_MAX) *n) gives acceptable results. However, to > seriously use that formula, we must take care of the minuscule > probability that the result will be n". > > > Q1: What are the "low-order bits"? The low order bits are the ones that determine the congruence classes mod 2^n for small n. The lowest bit determines whether a number es even or odd. Linear congruence RNGs can suffer from a small period in low-order bits. Some are so weak that they produce even and odd results alternatingly. (I have, however, never seen a rand() that does that, but I then again, I didn't search for one either:-) > Q2: "we must take care of the minuscule probability that the result will > be n". This means the result of rand() call, or the result of the > formula? And why? The formula. The reason is that rand() might return RAND_MAX. BTW: no simple rescaling formula is going to give you a mathematically correct uniform sample in [0,n) for an n that does not divide evenly into RAND_MAX+1. In order to do it right (and not just approximately right), you will have to discard some values and throw the dice again. Also, it is a little tricky to upscale things for the case n > RAND_MAX+1. Whether the difference between "right" and "approximately right" matters depends on your application. The next version of the standard will have uniform_int_distribution, which is supposed to take care of this kind of problem. Currently, there is Boost::random, but its uniform_int has a bug (wasn't fixed last time I checked) that makes it non-uniform in some cases. Best Kai-Uwe Bux |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
On Wed, 16 Jan 2008 07:45:30 -0500, jkherciueh wrote:
> > Linear congruence RNGs can suffer from a small period in low-order bits. > Some are so weak that they produce even and odd results alternatingly. > (I have, however, never seen a rand() that does that, but I then again, > I didn't search for one either:-) FYI, the "quick and dirty" PRNG "ranqd1" in "Numerical Recipes in C" does this. It's a one-liner linear congruential generator which uses one multiply and one add (32-bit unsigned int) and is thus about as fast as it gets. Despite the (very) weak low-order randomness, I'd hazard that it not much worse than many system-supplied PRNGs - so long as you know what you're dealing with and, of course you don't mod it! -- Lionel B |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
On Jan 16, 1:45 pm, jkherci...@gmx.net wrote:
> Ioannis Vranos wrote: > > I am reading TC++PL3 and on page 685 it is written: > > "Producing a good random-number generator isn't easy, and unfortunately > > not all systems deliver a good rand(). In particular, the low-order bits > > of a random number are often suspect, so rand() %n is not a good > > portable way of generating a random number between 0 and n-1. Often, > > int((double(rand())/RAND_MAX) *n) gives acceptable results. However, to > > seriously use that formula, we must take care of the minuscule > > probability that the result will be n". > > Q1: What are the "low-order bits"? > The low order bits are the ones that determine the congruence > classes mod 2^n for small n. The lowest bit determines whether > a number es even or odd. > Linear congruence RNGs can suffer from a small period in > low-order bits. Some are so weak that they produce even and > odd results alternatingly. (I have, however, never seen a > rand() that does that, but I then again, I didn't search for > one either:-) They can, but they don't have to. (And I've seen many that do. ) IMHO, I don't think the advice above is really that good. If the random number generator is bad, it's bad. Changing the way you exploit the value may hide the badness superficially, but it doesn't make the results any more random. > > Q2: "we must take care of the minuscule probability that the > > result will be n". This means the result of rand() call, or > > the result of the formula? And why? > The formula. The reason is that rand() might return RAND_MAX. > BTW: no simple rescaling formula is going to give you a mathematically > correct uniform sample in [0,n) for an n that does not divide evenly into > RAND_MAX+1. In order to do it right (and not just approximately right), you > will have to discard some values and throw the dice again. Also, it is a > little tricky to upscale things for the case n > RAND_MAX+1. Whether the > difference between "right" and "approximately right" matters depends on > your application. > The next version of the standard will have uniform_int_distribution, which > is supposed to take care of this kind of problem. Currently, there is > Boost::random, but its uniform_int has a bug (wasn't fixed last time I > checked) that makes it non-uniform in some cases. Handling the problem you mention isn't that hard, and has been discussed here several times. What's good in the random generators from Boost (and TR1?) is that they have guaranteed behavior. The characteristics are part of the specification. -- James Kanze (GABI Software) email:james.kanze@gmail.com Conseils en informatique orientée objet/ Beratung in objektorientierter Datenverarbeitung 9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34 |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
James Kanze wrote:
> On Jan 16, 1:45 pm, jkherci...@gmx.net wrote: >> Ioannis Vranos wrote: >> > I am reading TC++PL3 and on page 685 it is written: > >> > "Producing a good random-number generator isn't easy, and unfortunately >> > not all systems deliver a good rand(). In particular, the low-order >> > bits of a random number are often suspect, so rand() %n is not a good >> > portable way of generating a random number between 0 and n-1. Often, >> > int((double(rand())/RAND_MAX) *n) gives acceptable results. However, to >> > seriously use that formula, we must take care of the minuscule >> > probability that the result will be n". > >> > Q1: What are the "low-order bits"? > >> The low order bits are the ones that determine the congruence >> classes mod 2^n for small n. The lowest bit determines whether >> a number es even or odd. > >> Linear congruence RNGs can suffer from a small period in >> low-order bits. Some are so weak that they produce even and >> odd results alternatingly. (I have, however, never seen a >> rand() that does that, but I then again, I didn't search for >> one either:-) > > They can, but they don't have to. (And I've seen many that do. ) > > IMHO, I don't think the advice above is really that good. If > the random number generator is bad, it's bad. Changing the way > you exploit the value may hide the badness superficially, but it > doesn't make the results any more random. That is not entirely correct. A linear congruence RNG with modulus 2^n can be a perfectly sound implementation of a fair coin if you use the highest bit as the result of the toss. If you use the lowest bit instead, you are very likely to get a rather poor random bit. And this is not superficially hiding problems: the first implementation can (and likely will) pass statistical tests whereas the second can (and likely will) fail. In some sense the lower order bits of linear congruence RNGS could be considered part of their internal state that should not really be exposed to the client. >> > Q2: "we must take care of the minuscule probability that the >> > result will be n". This means the result of rand() call, or >> > the result of the formula? And why? > >> The formula. The reason is that rand() might return RAND_MAX. > >> BTW: no simple rescaling formula is going to give you a mathematically >> correct uniform sample in [0,n) for an n that does not divide evenly into >> RAND_MAX+1. In order to do it right (and not just approximately right), >> you will have to discard some values and throw the dice again. Also, it >> is a little tricky to upscale things for the case n > RAND_MAX+1. Whether >> the difference between "right" and "approximately right" matters depends >> on your application. > >> The next version of the standard will have uniform_int_distribution, >> which is supposed to take care of this kind of problem. Currently, there >> is Boost::random, but its uniform_int has a bug (wasn't fixed last time I >> checked) that makes it non-uniform in some cases. > > Handling the problem you mention isn't that hard, and has been > discussed here several times. The only problem of which I claimed it is non-trivial is to find a random number in [0,n) using calls to an underlying RNG that gives values in [0,m) for some fixed m < n. In that case you need to collect several samples and assemble the result from there (and still you might need to discard the outcome and start over). The problem that has been discussed on this list several times is for m > n. > What's good in the random > generators from Boost (and TR1?) is that they have guaranteed > behavior. The characteristics are part of the specification. More or less. This goes for the engines (the building blocks). As far as adaptors are concerned, their specifications are a little more vague (as one would expect). E.g., it is not specified whether uniform_int used will be more sensitive to low-order bits or to high-order bits of the underlying engine. It is also not specified how many calls (on average) of the underlying engine will be needed. Also, what happens if you use unform_int with an underlying URNG that has a floating point type as a result_type is somewhat beyond comprehension. It is a good thing that C++0X moved away from the specifications in TR1 and reworked <random> entirely. Of course, the flip-side of your observation that Boost and TR1 provide specified characteristics is that you have no guarantees at all for rand(). If you are coding to a given platform, the implementation should document the characteristics of rand(), but if you are writing code that is meant to be portable, things get tricky: the library vendor has to implement rand() based on the best guesses about the needs of the customers and different library vendors will cater to different needs. Thus, you have no idea which RNG you will get when using rand(). Of course, whether this matters depends on the application. Best Kai-Uwe Bux |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
jkherci...@gmx.net wrote:
> James Kanze wrote: > > On Jan 16, 1:45 pm, jkherci...@gmx.net wrote: > >> Ioannis Vranos wrote: > >> > I am reading TC++PL3 and on page 685 it is written: > >> > "Producing a good random-number generator isn't easy, and unfortunately > >> > not all systems deliver a good rand(). In particular, the low-order > >> > bits of a random number are often suspect, so rand() %n is not a good > >> > portable way of generating a random number between 0 and n-1. Often, > >> > int((double(rand())/RAND_MAX) *n) gives acceptable results. However, to > >> > seriously use that formula, we must take care of the minuscule > >> > probability that the result will be n". > >> > Q1: What are the "low-order bits"? > >> The low order bits are the ones that determine the congruence > >> classes mod 2^n for small n. The lowest bit determines whether > >> a number es even or odd. > >> Linear congruence RNGs can suffer from a small period in > >> low-order bits. Some are so weak that they produce even and > >> odd results alternatingly. (I have, however, never seen a > >> rand() that does that, but I then again, I didn't search for > >> one either:-) > > They can, but they don't have to. (And I've seen many that do. ) > > IMHO, I don't think the advice above is really that good. If > > the random number generator is bad, it's bad. Changing the way > > you exploit the value may hide the badness superficially, but it > > doesn't make the results any more random. > That is not entirely correct. A linear congruence RNG with modulus 2^n can > be a perfectly sound implementation of a fair coin if you use the highest > bit as the result of the toss. If you use the lowest bit instead, you are > very likely to get a rather poor random bit. And this is not superficially > hiding problems: the first implementation can (and likely will) pass > statistical tests whereas the second can (and likely will) fail. > In some sense the lower order bits of linear congruence RNGS could be > considered part of their internal state that should not really be exposed > to the client. I based my statement on the article "Random Number Generators: Good Ones Are Hard to Find", by Park and Miller (CACM, Oct. 1988). Could you point me to some further analysis to support what you are saying? > >> > Q2: "we must take care of the minuscule probability that the > >> > result will be n". This means the result of rand() call, or > >> > the result of the formula? And why? > >> The formula. The reason is that rand() might return RAND_MAX. > >> BTW: no simple rescaling formula is going to give you a > >> mathematically correct uniform sample in [0,n) for an n > >> that does not divide evenly into RAND_MAX+1. In order to do > >> it right (and not just approximately right), you will have > >> to discard some values and throw the dice again. Also, it > >> is a little tricky to upscale things for the case n > > >> RAND_MAX+1. Whether the difference between "right" and > >> "approximately right" matters depends on your application. > >> The next version of the standard will have > >> uniform_int_distribution, which is supposed to take care of > >> this kind of problem. Currently, there is Boost::random, > >> but its uniform_int has a bug (wasn't fixed last time I > >> checked) that makes it non-uniform in some cases. > > Handling the problem you mention isn't that hard, and has > > been discussed here several times. > The only problem of which I claimed it is non-trivial is to > find a random number in [0,n) using calls to an underlying RNG > that gives values in [0,m) for some fixed m < n. In that case > you need to collect several samples and assemble the result > from there (and still you might need to discard the outcome > and start over). > The problem that has been discussed on this list several times > is for m > n. OK. I misunderstood that. > > What's good in the random generators from Boost (and TR1?) > > is that they have guaranteed behavior. The characteristics > > are part of the specification. > More or less. This goes for the engines (the building blocks). > As far as adaptors are concerned, their specifications are a > little more vague (as one would expect). E.g., it is not > specified whether uniform_int used will be more sensitive to > low-order bits or to high-order bits of the underlying engine. > It is also not specified how many calls (on average) of the > underlying engine will be needed. Yes, but if you use a good engine, it doesn't matter. > Also, what happens if you use unform_int with an underlying > URNG that has a floating point type as a result_type is > somewhat beyond comprehension. It is a good thing that C++0X > moved away from the specifications in TR1 and reworked > <random> entirely. > Of course, the flip-side of your observation that Boost and > TR1 provide specified characteristics is that you have no > guarantees at all for rand(). Exactly. As far as the standard goes, int rand() { return 0 ; } is perfectly compliant (although obviously not conform with the intent). Practically, the C standard gives an example implementation which corresponds to a very poor generated (at least according to Park and Miller), and a lot of implementations of the C library use (or used) this example. > If you are coding to a given platform, the implementation > should document the characteristics of rand(), but if you are > writing code that is meant to be portable, things get tricky: > the library vendor has to implement rand() based on the best > guesses about the needs of the customers and different library > vendors will cater to different needs. Thus, you have no idea > which RNG you will get when using rand(). Of course, whether > this matters depends on the application. It's probably best to suppose that rand() isn't very good, and try for something better, if it matters. -- James Kanze (GABI Software) email:james.kanze@gmail.com Conseils en informatique orientée objet/ Beratung in objektorientierter Datenverarbeitung 9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34 |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
James Kanze wrote:
> jkherci...@gmx.net wrote: >> James Kanze wrote: > >> > On Jan 16, 1:45 pm, jkherci...@gmx.net wrote: >> >> Ioannis Vranos wrote: >> >> > I am reading TC++PL3 and on page 685 it is written: > >> >> > "Producing a good random-number generator isn't easy, and >> >> > unfortunately not all systems deliver a good rand(). In particular, >> >> > the low-order bits of a random number are often suspect, so rand() >> >> > %n is not a good portable way of generating a random number between >> >> > 0 and n-1. Often, int((double(rand())/RAND_MAX) *n) gives acceptable >> >> > results. However, to seriously use that formula, we must take care >> >> > of the minuscule probability that the result will be n". > >> >> > Q1: What are the "low-order bits"? > >> >> The low order bits are the ones that determine the congruence >> >> classes mod 2^n for small n. The lowest bit determines whether >> >> a number es even or odd. > >> >> Linear congruence RNGs can suffer from a small period in >> >> low-order bits. Some are so weak that they produce even and >> >> odd results alternatingly. (I have, however, never seen a >> >> rand() that does that, but I then again, I didn't search for >> >> one either:-) > >> > They can, but they don't have to. (And I've seen many that do. ) > >> > IMHO, I don't think the advice above is really that good. If >> > the random number generator is bad, it's bad. Changing the way >> > you exploit the value may hide the badness superficially, but it >> > doesn't make the results any more random. > >> That is not entirely correct. A linear congruence RNG with modulus 2^n >> can be a perfectly sound implementation of a fair coin if you use the >> highest bit as the result of the toss. If you use the lowest bit instead, >> you are very likely to get a rather poor random bit. And this is not >> superficially hiding problems: the first implementation can (and likely >> will) pass statistical tests whereas the second can (and likely will) >> fail. > >> In some sense the lower order bits of linear congruence RNGS could be >> considered part of their internal state that should not really be exposed >> to the client. > > I based my statement on the article "Random Number Generators: > Good Ones Are Hard to Find", by Park and Miller (CACM, Oct. > 1988). That paper mostly concerns itself with choosing the right modulus and multiplier. It does, however, in passing, also address the question how to turn the result into a random bit: on page 1200 left-column it points out that the k-th bit it Grogono's RNG cycles with period 2^k. And they quote Knuth to point out that this is typical for mixed linear congruence RNGs with modulus 2^n. We shall see below what the reason is. > Could you point me to some further analysis to support what you are > saying? What I was thinking of is mostly the Table 1 in Section 3.3.4 of Knuth TAOCP (I have the third edition, published several years after Park and Miller). It lists the results of the spectral test for various linear congruence RNGs and shows that there are some pretty decent ones even with modulus 2^n. Knuth also points out a reason as to why the spectral test is relevant: "all generators known to be bad actually fail it" [beginning of 3.3.4]. Note that Park and Miller also acknowledge that with a carefully chosen multiplier linear congruence RNGs with modulus 2^n can have period 2^n/4 and mixed linear congruence RNGs can have full period 2^n: in fact they do if the multiplier is congruent 1 mod 4 and the shift constant is odd [Knuth, 3.2.1.2 Theorem A]. Now for the claim that any (mixed) linear congruence RNG with modulus 2^n will have weak low order bits: consider a mixed linear congruence RNG x = ( a * x + c ) % 2^n that has full period (i.e., a is cong 1 mod 4 and c is odd). Then the lowest k bits obey the recursion y = ( a * y + c ) % 2^k. which also has full period 2^k (since a and c did not change!). In particular, the lowest-order bit always alternates. On the other hand, the highest-order bit alone has full period 2^n. For contradiction, assume it had period 2^m with m < n (note that the period must be a divisor of 2^n and hence a power of 2). Then, since the first n-1 bits have period 2^(n-1), the overall period would be LCM(2^m, 2^(n-1)) which is 2^(n-1) contradicting the assumption that the RNG has full period 2^n. As for the quality of high-order bits, one also should consider what results of the spectral tests (and many others) mean. The spectral test by and in itself applies to a sequence of real numbers in [0,1) that are presented as a sequence of independent uniformly distributed values. The results from the RNG are translated into real numbers by dividing each by the modulus (2^n). Note that the highest-order bit tells you which half of the unit-interval contains the number. Thus, if tests say that the sequence is good, they also say (at least morally) that the high-order bits are good whereas the lowest order bits are insignificant since they only amount to small perturbations of the real number. My remark that the low-order bits should be considered part of the internal state derives largely from this. E.g., in Table 1 [mentioned above], line 26 there is a good multiplier for the modulus 2^64. I would not hesitate to use that with an unsigned long long as the internal state. However, I would have the generator return only and unsigned long exposing just the highest 32 bits of the state to the client. Then, even the lowest bit of the observable sequence will have period 2^33 and the resulting RNG can be expected to pass all tests with flying colors (at least it will pass the spectral test). Best Kai-Uwe Bux |
|
|
|
#9 |
|
Messages: n/a
Hébergeur: |
Michael DOUBEZ wrote:
> Ioannis Vranos a écrit : >> I am reading TC++PL3 and on page 685 it is written: >> >> "Producing a good random-number generator isn't easy, and >> unfortunately not all systems deliver a good rand(). In particular, >> the low-order bits of a random number are often suspect, so rand() %n >> is not a good portable way of generating a random number between 0 and >> n-1. Often, >> int((double(rand())/RAND_MAX) *n) gives acceptable results. However, >> to seriously use that formula, we must take care of the minuscule >> probability that the result will be n". >> >> Q2: "we must take care of the minuscule probability that the result >> will be n". This means the result of rand() call, or the result of the >> formula? And why? > > rand() can yield RAND_MAX so rand()/RAND_MAX may be one and > int((double(rand())/RAND_MAX) *n) may be n. > > What is wanted is a number in the range [0;n) therefore n should be > excluded. I suppose the range [0, n) is [0, n-1] here. So what if we considered the formula to be for range [0, n]? Is there some flaw in this? |
|
|
|
#10 |
|
Messages: n/a
Hébergeur: |
Ioannis Vranos wrote:
> Michael DOUBEZ wrote: >> Ioannis Vranos a écrit : >>> I am reading TC++PL3 and on page 685 it is written: >>> >>> "Producing a good random-number generator isn't easy, and >>> unfortunately not all systems deliver a good rand(). In particular, >>> the low-order bits of a random number are often suspect, so rand() %n >>> is not a good portable way of generating a random number between 0 and >>> n-1. Often, >>> int((double(rand())/RAND_MAX) *n) gives acceptable results. However, >>> to seriously use that formula, we must take care of the minuscule >>> probability that the result will be n". >>> >>> Q2: "we must take care of the minuscule probability that the result >>> will be n". This means the result of rand() call, or the result of the >>> formula? And why? >> >> rand() can yield RAND_MAX so rand()/RAND_MAX may be one and >> int((double(rand())/RAND_MAX) *n) may be n. >> >> What is wanted is a number in the range [0;n) therefore n should be >> excluded. > > I suppose the range [0, n) is [0, n-1] here. So what if we considered > the formula to be for range [0, n]? Is there some flaw in this? You mean other than that it grossly non-uniform? Suppose n = 9. If you use the formula above to create a random decimal digit in {0,1,2,3,4,5,6,7,8,9}, the probability of getting 9 is 1/(1+RAND_MAX) which will not be 0.1 unless RAND_MAX=9 (which is rather unlikely). Best Kai-Uwe Bux |
|
![]() |
| Outils de la discussion | |
|
|