|
|
|
|
||||||
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
A piece of code I am working on uses qsort() to sort buffers containing
packed binary fixed length words, however the number of bytes in a word will vary from one call of qsort() to another. We've discussed here before how qsort() doesn't provide any nice way to pass in extra data, so here the number of bytes is smuggled into the comparison function using the global integer variable gbl_bws (gbl_bws == GloBaL Binary Word Size). The word size varies but isn't likely to be huge in any case, with a maximum of probably around 8 bytes. The current code is this: static int compare_bw1(const void *p1, const void *p2){ unsigned char *uc1 = (unsigned char *)p1; unsigned char *uc2 = (unsigned char *)p2; int i; for(i=0; i < gbl_bws; i++){ if( uc1[i] > uc2[i] ){ return 1; } else if( uc1[i] < uc2[i] ){ return -1;} } return 0; } Another way of doing this is: static int compare_bw2(const void *p1, const void *p2){ return(memcmp(p1,p2,gbl_bws); } Any thoughts on whether the second function form would usually be faster, slower, or the same? It seems to me to be pretty compiler dependent in that if the compiler actually makes a function call to memcmp() that call overhead is going to add up in a hurry when sorting many billions of words. On the other hand, if the compiler expands memcmp() in place that code is likely to be heavily optimized. I know that generally using memcpy() ends up being faster than using my own code, is that going to be generally true for memcmp() as well? On a related note, what does the standard say memcmp() will return for the following 4 cases? My guesses are shown in the last column. p1 p2 Guess (why) 1 null null 0 (because they are identical, albeit undefined) 2 null !null -1 (nothing < something ) 3 !null null 1 (something > nothing ) ----------------------------------------------- 4 n == 0 0 {0 bytes of anything is always nothing) Since memcmp() only returns a comparison value, and not an error value which is really what's called for here, it presumably has these "comparisons" defined in the standard. The man pages I have seen do not address this point. Thanks, David Mathog |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
In article <fnqnln$8bg$1@naig.caltech.edu>,
David Mathog <mathog@caltech.edu> wrote: >On a related note, what does the standard say memcmp() will return for >the following 4 cases? My guesses are shown in the last column. > > p1 p2 Guess (why) >1 null null 0 (because they are identical, albeit undefined) >2 null !null -1 (nothing < something ) >3 !null null 1 (something > nothing ) >----------------------------------------------- >4 n == 0 0 {0 bytes of anything is always nothing) >Since memcmp() only returns a comparison value, and not an error value >which is really what's called for here, it presumably has these >"comparisons" defined in the standard. The man pages I have seen do >not address this point. The standard does not specifically say what happens for null pointers or for n == 0. I would think it likely that using a null pointer would be considered to be outside the range of valid parameters and so that the behaviour would be undefined. Returning 0 for n == 0 sounds like proper behaviour to me, but on different grounds: that all empty buffers are identical. Returning non-zero would imply that there was a difference found between the empty buffers, which could not be the case. -- "No one has the right to destroy another person's belief by demanding empirical evidence." -- Ann Landers |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
Walter Roberson wrote:
> In article <fnqnln$8bg$1@naig.caltech.edu>, > David Mathog <mathog@caltech.edu> wrote: > >> On a related note, what does the standard say memcmp() will return for >> the following 4 cases? My guesses are shown in the last column. >> >> p1 p2 Guess (why) >> 1 null null 0 (because they are identical, albeit undefined) >> 2 null !null -1 (nothing < something ) >> 3 !null null 1 (something > nothing ) >> ----------------------------------------------- >> 4 n == 0 0 {0 bytes of anything is always nothing) > >> Since memcmp() only returns a comparison value, and not an error value >> which is really what's called for here, it presumably has these >> "comparisons" defined in the standard. The man pages I have seen do >> not address this point. > > The standard does not specifically say what happens for null pointers > or for n == 0. 7.21.1p1 seems pretty specific: Where an argument declared as size_t n specifies the length of the array for a function, n can have the value zero on a call to that function. Unless explicitly stated otherwise in the description of a particular function in this subclause, pointer arguments on such a call shall still have valid values, as described in 7.1.4. Over in 7.1.4p1, we find: [...] unless explicitly stated otherwise [...] If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, [...]) [...] the behavior is undefined. So, since the description of memcmp() mentions no exceptions, it's legal for the third argument to be zero but not legal for either of the first two to be NULL. -- Eric.Sosman@sun.com |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
rober...@ibd.nrc-cnrc.gc.ca (Walter Roberson) wrote:
> David Mathog <mat...@caltech.edu> wrote: > > On a related note, what does the standard say memcmp() > > will return for the following 4 cases? My guesses are > > shown in the last column. > > > > p1 p2 Guess (why) > > 1 null null 0 (because they are identical, > > albeit undefined) > > 2 null !null -1 (nothing < something ) > > 3 !null null 1 (something > nothing ) > > ----------------------------------------------- > > 4 n == 0 0 {0 bytes of anything is always > > nothing) > > Since memcmp() only returns a comparison value, and not > > an error alue which is really what's called for here, it > > presumably has these "comparisons" defined in the > > standard. The man pages I have seen do not address this > > point. > > The standard does not specifically say what happens for > null pointers or for n == 0. 7.21.1p2: Where an argument declared as size_t n specifies the length of the array for a function, n can have the value zero on a call to that function. Unless explicitly stated otherwise in the description of a particular function in this subclause, pointer arguments on such a call shall still have valid values, as described in 7.1.4. On such a call, a function that locates a character finds no occurrence, a function that compares two character sequences returns zero, and a function that copies characters copies zero characters. 7.21.4.1p2 The memcmp function compares the first n characters of the object pointed to by s1 to the first n characters of the object pointed to by s2. So, if either pointer is null, the behaviour is undefined. If 0 is passed (and both pointers are valid,) 0 will be returned. -- Peter |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
David Mathog wrote:
> > Any thoughts on whether the second function form would usually be > faster, slower, or the same? I forgot to mention, on gcc 4.1.2 on an opteron system the memcmp() form runs about 25% faster on a test with a sort of 125M words of 5 bytes each. So I knew that it worked better with _some_ compiler, I was just curious how common that was likely to be. Regards, David Mathog |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
David Mathog wrote:
> David Mathog wrote: >> >> Any thoughts on whether the second function form would usually be >> faster, slower, or the same? > > I forgot to mention, on gcc 4.1.2 on an opteron system the memcmp() form > runs about 25% faster on a test with a sort of 125M words of 5 bytes > each. So I knew that it worked better with _some_ compiler, I was just > curious how common that was likely to be. Turns out the fastest variant so far is neither of the above (all tests at gcc -03). Using memcmp() ed, dropping the run time to 164 seconds. Still, the one below ran in just 144 seconds. The program does a lot of IO, unclear how much of the time was in qsort() and how much in IO, but the IO sections were unchanged and should have run in exactly the same time. In the tests gbl_bws was 5 and there were around 120M words in the sort buffer. /* based on the memcpm() implementation here: http://mia.ece.uic.edu/cgi-bin/lxr/h...=openvpn-1.4.2 */ int compare_bw(const void *s1, const void *s2){ register unsigned const char *p1 = s1, *p2 = s2; register int d; register int n; if (gbl_bws > 0){ n = gbl_bws; while (n-- > 0){ d = *p1++ - *p2++; if (d != 0)return(d); } } return 0; } Probably this program shouldn't be using the qsort() from the dynamically linked library anyway, since that can't inline the comparison function into the sort function. No big deal to pick up a qsort source and put it into this code, still, why should the programmer have to do this? Do any C compilers offer an option to compile and statically link their qsort() in order to achieve better optimization? In other words, rather than the end user having to physically paste the qsort code in somewhere, can some compilers do it automagically? This pretty much only comes up with qsort() since it makes a lot of calls to the comparison function, and so is ripe for some optimization at compile time. Regards, David Mathog |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
On Wed, 30 Jan 2008 16:43:48 -0600, David Mathog wrote
(in article <fnquj3$b30$1@naig.caltech.edu>): > David Mathog wrote: >> >> Any thoughts on whether the second function form would usually be >> faster, slower, or the same? > > I forgot to mention, on gcc 4.1.2 on an opteron system the memcmp() form > runs about 25% faster on a test with a sort of 125M words of 5 bytes > each. So I knew that it worked better with _some_ compiler, I was just > curious how common that was likely to be. memcmp() was hideously slow in some versions of Microsoft's library, it may have improved recently, I don't use it anymore. -- Randy Howard (2reply remove FOOBAR) "The power of accurate observation is called cynicism by those who have not got it." - George Bernard Shaw |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
David Mathog wrote:
> > David Mathog wrote: > > David Mathog wrote: > >> > >> Any thoughts on whether the second function form would usually be > >> faster, slower, or the same? > > > > I forgot to mention, > > on gcc 4.1.2 on an opteron system the memcmp() form > > runs about 25% faster on a test with a sort of 125M words of 5 bytes > > each. So I knew that it worked better with _some_ compiler, > > I was just curious how common that was likely to be. > > Turns out the fastest variant > so far is neither of the above (all tests at gcc -03). > > Using memcmp() ed, dropping the run time to 164 seconds. > Still, the > one below ran in just 144 seconds. The program does a lot of IO, > unclear how much of the time was in qsort() and how much in IO, > but the > IO sections were unchanged and > should have run in exactly the same time. > In the tests gbl_bws was 5 and there were around 120M words in the > sort buffer. > > /* based on the memcpm() implementation here: > http://mia.ece.uic.edu/cgi-bin/lxr/h...=openvpn-1.4.2 > */ > int compare_bw(const void *s1, const void *s2){ > register unsigned const char *p1 = s1, *p2 = s2; > register int d; > register int n; > > if (gbl_bws > 0){ > n = gbl_bws; > while (n-- > 0){ > d = *p1++ - *p2++; > if (d != 0)return(d); > } > } > return 0; > } > > Probably this program shouldn't be using the qsort() from the > dynamically linked library anyway, since that can't inline the > comparison function into the sort function. No big deal to pick > up a qsort source and put it into this code, still, why should > the programmer have to do this? I don't know. If you're sorting arrays of pointers, then you could try ucptrsort. If you're sorting arrays of arrays, then I'd have to write a MOV(A,B) macro. /* BEGIN ucptrsort.h */ #ifndef H_UCPTRSORT_H #define H_UCPTRSORT_H #include <stddef.h> typedef unsigned char * e_type; extern size_t gbl_bws; void ucptrsort(e_type *array, size_t nmemb); void hsortm(e_type *array, size_t nmemb); void i_sort(e_type *array, size_t nmemb); /* unstable insertionsort */ #endif /* END ucptrsort.h */ /* BEGIN ucptrsort.c */ #include <assert.h> #include <string.h> #include "ucptrsort.h" #define GT(A, B) (memcmp(A, B, gbl_bws) > 0) #define SMALL_PARTITION 50 /* ** Reduce SMALL_PARTITION for slower moving data types. ** for example: 50 for unsigned, 20 for long double. ** SMALL_PARTITION must be made to be greater than or equal to 4. ** assert(SMALL_PARTITION >= 4); ** Use i_sort instead of ucptrsort, for sorting small arrays. */ #define LU_RAND_SEED 123456789LU #define LU_RAND(S) ((S) * 69069 + 362437) /* ** Use: ** #define LU_RAND(S) ((S) * 69069 + 362437) ** for better sorting ** Use: ** #define LU_RAND(S) ((S) * 69069 + 362437 & 0XFFFFFFFF) ** to avoid implementation defined behavior */ #define SWAP(A, B, T) ((void)((T) = *(A), *(A) = *(B), *(B) = (T))) #define SORT_3(A, B, C, T) \ if (GT((A), (B))) { \ (T) = *(A); \ if (GT((C), (B))) { \ *(A) = *(B); \ if (GT(&(T), (C))) { \ *(B) = *(C); \ *(C) = (T); \ } else { \ *(B) = (T); \ } \ } else { \ *(A) = *(C); \ *(C) = (T); \ } \ } else { \ if (GT((B), (C))) { \ if (GT((A), (C))) { \ (T) = *(A); \ *(A) = *(C); \ *(C) = *(B); \ *(B) = (T); \ } else { \ SWAP((B), (C), (T)); \ } \ } \ } #define SIFTDOWNM(A, I, N, P, C, T) \ { \ (P) = (I); \ (T) = (A)[(P)]; \ (C) = (P) * 2; \ while ((N) > (C)) { \ if (GT((A) + (C) + 1, (A) + (C))) { \ ++(C); \ } \ if (GT((A) + (C), &(T))) { \ (A)[(P)] = (A)[(C)]; \ (P) = (C); \ (C) *= 2; \ } else { \ break; \ } \ } \ if ((N) == (C) && GT((A) + (C), &(T))) { \ (A)[(P)] = (A)[(C)]; \ (P) = (C); \ } \ (A)[(P)] = (T); \ } static int sorted(e_type *array, size_t nmemb); static int rev_sorted(e_type *array, size_t nmemb); static void rev_array(e_type *array, size_t nmemb); static long unsigned ucptrloop(e_type *array, size_t nmemb, size_t d, long unsigned seed); void ucptrsort(e_type *array, size_t nmemb) { size_t d, n; if (nmemb > 1) { if (!rev_sorted(array, nmemb)) { d = 2; for (n = nmemb / 4; n; n /= 2) { ++d; } ucptrloop(array, nmemb, d * 2, LU_RAND_SEED); } else { rev_array(array, nmemb); } } } void hsortm(e_type *array, size_t nmemb) { size_t i, child, parent; e_type temp; if (nmemb > 1) { i = --nmemb / 2; do { SIFTDOWNM(array, i, nmemb, parent, child, temp); } while (i-- != 0); SWAP(array, array + nmemb, temp); while (--nmemb != 0) { SIFTDOWNM(array, 0, nmemb, parent, child, temp); SWAP(array, array + nmemb, temp); } } } void i_sort(e_type *array, size_t nmemb) { e_type temp, *first, *middle; size_t counter; if (nmemb > 1) { first = middle = 1 + array; counter = --nmemb; while (--counter != 0) { ++first; if (GT(middle, first)) { middle = first; } } if (GT(array, middle)) { SWAP(array, middle, temp); } ++array; while (--nmemb != 0) { first = array++; if (GT(first, array)) { middle = array; temp = *middle; do { *middle-- = *first--; } while (GT(first, &temp)); *middle = temp; } } } } static int sorted(e_type *array, size_t nmemb) { while (--nmemb != 0 && !GT(array, array + 1)) { ++array; } return !nmemb; } static int rev_sorted(e_type *array, size_t nmemb) { while (--nmemb != 0 && !GT(array + 1, array)) { ++array; } return !nmemb; } static void rev_array(e_type *array, size_t nmemb) { e_type temp, *end; for (end = array + nmemb; --end > array; ++array) { SWAP(array, end, temp); } } static long unsigned ucptrloop(e_type *array, size_t nmemb, size_t d, long unsigned seed) { e_type temp, *first, *last; assert(SMALL_PARTITION >= 4); while (nmemb > SMALL_PARTITION) { if (sorted(array, nmemb)) { return seed; } if (!d--) { hsortm(array, nmemb); return seed; } seed = LU_RAND(seed); first = seed % --nmemb + array; SWAP(array, first, temp); first = 1 + array; last = nmemb + array; SORT_3(first, array, last, temp); do { ++first; } while (GT(array, first)); do { --last; } while (GT(last, array)); while (last > first) { SWAP(last, first, temp); do { ++first; } while (GT(array, first)); do { --last; } while (GT(last, array)); } SWAP(array, last, temp); seed = ucptrloop(last + 1, nmemb + array - last, d, seed); nmemb = last - array; } i_sort(array, nmemb); return seed; } /* END ucptrsort.c */ -- pete |
|
![]() |
| Outils de la discussion | |
|
|