|
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Hello:
Consider the following nested for loop: uint64 TABLE[8][256]; for (i=0; i<=7; i++) for (j=1; j<=7; j++) for (k=1; k<=(1<<j)-1; k++) TABLE [i][((1<<j)+k)] = (TABLE[i][(1<<j)]) ^ (TABLE[i][k]); I understand that unrolling just the inner most for-loop would give me best performance and this is required for my project. But I'm unable to figure out how to unroll just the innermost for-loop. Can someone please ! Thanks! |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
V wrote:
> Hello: > > Consider the following nested for loop: > > uint64 TABLE[8][256]; > Better to use standard types, uint64_t. All caps for variable names isn't a good idea, all caps is usually used for macros. > for (i=0; i<=7; i++) > for (j=1; j<=7; j++) > for (k=1; k<=(1<<j)-1; k++) > TABLE [i][((1<<j)+k)] = (TABLE[i][(1<<j)]) ^ (TABLE[i][k]); > > I understand that unrolling just the inner most for-loop would give me > best performance and this is required for my project. But I'm unable > to figure out how to unroll just the innermost for-loop. > Have you profiled the code? > Can someone please ! > Your compiler probably can. Loop unrolling is a common optimisation. Check the assembly output for your code with various optimisation settings. -- Ian Collins. |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
I have tried to use compile time option to unroll this and checked its
assembly code, but it doesn't seem to have unrolled the loop. Any suggestions? Thanks. |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
V wrote:
> I have tried to use compile time option to unroll this and checked its > assembly code, but it doesn't seem to have unrolled the loop. > > Any suggestions? > A better compiler? The most important thing to do is to profile your application and see if there really is a bottleneck in this code. At the moment, manual unrolling of this loop looks like premature micro-optimisation. If there is an issue and your compiler isn't up the job, just repeat the inner assignment N times and change the inner loop to increment its counter by N. This assumes the limit is a multiple of N. Something like (untested) for (k=1; k<=(1<<j)-1; k += 2) { TABLE [i][((1<<j)+k)] = (TABLE[i][(1<<j)]) ^ (TABLE[i][k]); TABLE [i][((1<<j)+k+1)] = (TABLE[i][(1<<j)]) ^ (TABLE[i][k+1]); } -- Ian Collins. |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
On 10 May 2008 at 8:02, Ian Collins wrote:
> This assumes the limit is a multiple of N. Something like (untested) > > for (k=1; k<=(1<<j)-1; k += 2) > { > TABLE [i][((1<<j)+k)] = (TABLE[i][(1<<j)]) ^ (TABLE[i][k]); > TABLE [i][((1<<j)+k+1)] = (TABLE[i][(1<<j)]) ^ (TABLE[i][k+1]); > } This fails when j=1. The problem is that the number of iterations is variable, and ranges between 1 and 127. Finding common divisors of (powers of 2 minus 1) as an unrolling factor is problematic... (Since the number of iterations is growing exponentially, even just unrolling the last loop is attractive, but 127 is prime, so there'll be some fiddling needed. Just unpacking what on earth the table looks like is an effort. The outer index i is irrelevant, as there's no interaction between these outer "layers". The inner 256-long arrays are arranged like this: j -1 | 0 | 0 | 1 | 1 | 2 | 3 | 2 | 4 | 5 | 6 | 7 | 3 | 8 | 9 |10 |11 |12 |13 |14 |15 | 4 ... 5 ... 6 ... 7 |128|129| ... |255| The left-hand column is an input column. To fill in a given row, look at the value in the left-hand column and xor it in turn with the elements earlier in the array, in the order 0,1,2,3,... Given the way these dependencies work, it is also possible to fill in columns in turn from the left, rather than working row-wise. This offers some potential for unrolling, but would still be complicated and the resulting code would be large. |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
On May 10, 7:44am, V <vishal.st...@gmail.com> wrote:
> I have tried to use compile time option to unroll this and checked its > assembly code, but it doesn't seem to have unrolled the loop. > > Any suggestions? > > Thanks. I'm not surprised. The inner loop is very complicated. Is there any way of simplifying it? Perhaps a rethink of what you're trying to achieve might . What factor of speedup do you need? -- Bartc |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
Bart wrote:
> On May 10, 7:44 am, V <vishal.st...@gmail.com> wrote: >> I have tried to use compile time option to unroll this and checked its >> assembly code, but it doesn't seem to have unrolled the loop. >> >> Any suggestions? >> >> Thanks. > > I'm not surprised. The inner loop is very complicated. Is there any > way of simplifying it? > The first compiler I tried was happy to unroll the loop, it's not that hard (for a machine!). -- Ian Collins. |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
Antoninus Twink wrote:
> On 10 May 2008 at 8:02, Ian Collins wrote: >> This assumes the limit is a multiple of N. Something like (untested) >> >> for (k=1; k<=(1<<j)-1; k += 2) >> { >> TABLE [i][((1<<j)+k)] = (TABLE[i][(1<<j)]) ^ (TABLE[i][k]); >> TABLE [i][((1<<j)+k+1)] = (TABLE[i][(1<<j)]) ^ (TABLE[i][k+1]); >> } > > This fails when j=1. > I know, that why I said "This assumes the limit is a multiple of N", N == 2 in this case. -- Ian Collins. |
|
![]() |
| Outils de la discussion | |
|
|