|
|
|
#26 |
|
Messages: n/a
Hébergeur: |
On May 10, 1:18pm, moi <r...@invalid.address.org> wrote:
> On Sat, 10 May 2008 09:49:58 -0700, kumarchi wrote: > > guys: > > i have zeroed in and created a simple test program. This progrma just > > has floating point addition and integer addition. it does 20 loops x > > 1million times. in relase version of visual c it takes 0 time. gcc O3 > > takes 6 secs in my machine. > > : > > this cannot be rocket science; there seems to some fundamental > > deficiency in gcc. i will treat this as a bug. This should have serious > > implications for linux platforms > > > here is the code; test it for yourself > > In gcc 4.1.2, with -O3 , on a 686, > the whole function is elimated and inlined, > leading to : > > $ time ./a.out > times=1000000000 loops=20 dtime = 0 > > real 0m0.001s > user 0m0.000s > sys 0m0.003s > $ > In this case, there is no difference in generated code when > -march=i686 -msse2 are added to the -O3 flag. > > I guess, you'll have to invent a better benchmark :-) > > AvK you are right; my ubuntu box has gcc 4.2.1 and the gcc in cygwin is 3.4.4. but what a difference!! so my alarm hopefully is a false alarm. I was also confused my laptop is inter core do t2400 and my hardy box is amr 4200 x2. and the lapt top + visual c beat out the ubuntu + gcc in my linux box also. if gcc 4.2.2 can hold up against visual c, then that means intel beats out amd!! i will still do more test. and hoep gcc 4.2.2 is a champ. I do not want to be on teh dark side (=!! |
|
|
|
#27 |
|
Messages: n/a
Hébergeur: |
On Fri, 09 May 2008 17:59:22 -0700, kumarchi wrote:
> I cannot believe such a blatant difference will go unnoticed for long Isn't it possible to try out a recent GCC release? After all, GCC 3.4.4 was released way back in may 2005 and the latest GCC is at 4.3.0. Rui Maciel |
|
|
|
#28 |
|
Messages: n/a
Hébergeur: |
kumarchi@gmail.com wrote:
<snip> > thanx all of you for responding. I was totally unprepared for such a > vast performance difference (2x msvc vs gcc) and my code is not at all > special(no UI, complicated classes etc). Those are not the types of things which cause modern compilers to stumble. They seem complex abstractly, but to a compiler are fairly transparent. > it simply does lots of floating point array(mainly through fft) and normal > integer operations These are things for which compilers are designed to be highly optimized, and for which failure to enable a seemingly minor feature can cause huge variance in performance, like you're seeing. > I used -O3 flag and in my case so far it seems to be better than O2. > msvc by default uses their own O2. > > I cannot believe such a blatant difference will go unnoticed for long It's not gone unnoticed, it's merely gone largely uncommented in this group because this is the wrong group to ask. GCC is not that far behind MSVC. All the number and graphs I've seen suggest that something else is involved here. Yes, you should try GCC 4.x, because GCC 4.x implemented a different optimization framework. But this probably doesn't account for the 2x difference. It's highly likely that the difference you're seeing is a failure to tell GCC how and which CPU feature sets to target. GCC depends on the build system to dictate all the various platform-dependent optimizations; because GCC is often used for cross-compiling, and for various other reasons, that kind of logic--like CPU-specific optimizations (as opposed to mere architecture)--is farmed out. It's likely that MSVC, or Visual Studio, is turning knobs which in GCC must be done manually. I recommend that you find another newsgroup, or web page, or other documentation, to you turn GCC's knobs. The suggestions here, like -msse, etc, are a _start_, but by no means exhaustive, or even sufficient. There are lots of flags to know about; for instance, only the very latest version of GCC 4.x (4.3, I think) automatically vectorizes, but only if you specify -msse and/or -mtune or something like that. Prior to 4.x, you need to not only enable the instruction set, but enable vectorizing. And there are probably tools/applications you can find which will you turn those knobs. I'm just not that familiar to give detailed advice. And while many people are far more knowledgable in this group, you're apt to get conflicting or confusing advice because this issue isn't the group's focus. |
|
|
|
#29 |
|
Messages: n/a
Hébergeur: |
On May 10, 9:49am, kumar...@gmail.com wrote:
> guys: > i have zeroed in and created a simple test program. This progrma just > has floating point addition and integer addition. it does 20 loops x > 1million times. in relase version of visual c it takes 0 time. gcc O3 > takes 6 secs in my machine. > > this cannot be rocket science; there seems to some fundamental > deficiency in gcc. i will treat this as a bug. This should have > serious implications for linux platforms > > here is the code; test it for yourself > > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > #include <time.h> > > static double loop (long times) > { > long i=0; > double a=0; > > for (i=1; i<times; i++) > { > double x1 = i-1; In my very limited experience, optimizers tend to focus on statements as opposed to declarations. By placing these declarations in the loop and performing non-trivial computations in the initialization, you have added an unnecessary "extra level of confusion" to your primary objective. What happens if you define x1 etc at function scope and use simple assignment statements here? The initialization for n and y is superfluous. > double x2 = i; > double y = 0; > long n=0; > > y = x1+x2; > n = i+ i -1; > > y=x1*x2*y; > > a=y; > } > > return a; > > } > > int main (int argc, char **argv) > { > unsigned long times = 0; > long i=0; > time_t t=0; > time_t t1=0; > double dt=0; > long lcnt=20; > double a=0; > > times = (long) (1e9); > > /* > if(argc > 1) > { > times = atoi (argv[1]); > > times *= 1e6; > } > > if(argc > 2) > lcnt = atoi (argv[2]); > > if(lcnt < 20) > lcnt = 20; > */ > time (&t); > > for (i=0; i<20; i++) Shouldn't the limit be lcnt, not 20? > { > a = loop (times); > /* you need this for visual c show any elapsed time > printf ("\n %lg \n", a); printf is a non-trivial function which further camouflages the real point of your effort. Why not move this out of the loop and change the function call to a += loop(times); > */ > } > > time (&t1); > > dt = difftime (t1, t); > > printf ("\n times=%ld loops=%ld dtime = %lg \n", times, lcnt, dt); > > exit (0); |
|
|
|
#30 |
|
Messages: n/a
Hébergeur: |
kumar...@gmail.com wrote:
> I recently compiled a numerically intensive c project under cygwin > gcc 3.4.4 and microsoft visual c. ... > ... the most surprising thing was visual c optimized was 2x > performance over gcc optimized. > > is anybody else seeing the same thing. if this is true microsoft c > compiler is in a different league altogether Why the surprise? GNU's gcc is intended to be a fast compiler. It was not designed to produce ultra-fast executables. -- Peter |
|
|
|
#31 |
|
Messages: n/a
Hébergeur: |
Peter Nilsson wrote:
> kumar...@gmail.com wrote: >> I recently compiled a numerically intensive c project under cygwin >> gcc 3.4.4 and microsoft visual c. ... >> ... the most surprising thing was visual c optimized was 2x >> performance over gcc optimized. >> >> is anybody else seeing the same thing. if this is true microsoft c >> compiler is in a different league altogether > > Why the surprise? GNU's gcc is intended to be a fast compiler. > It was not designed to produce ultra-fast executables. > I guess you are comparing a current Microsoft compiler (with OpenMP support) against an obsolete (as stated) gcc version (no OpenMP or auto-vectorization). I can't see how you can do numerically intensive work and not be interested in vectorizing compilers, such as current gcc, or any commercial compiler other than Microsoft. As this is a C newsgroup, the C++ considerations which limit cygwin to such an old compiler shouldn't stop you from downloading a current gcc/gfortran for cygwin. |
|
|
|
#32 |
|
Messages: n/a
Hébergeur: |
Barry Schwarz wrote: > > for (i=1; i<times; i++) > > { > > double x1 = i-1; > > In my very limited experience, optimizers tend to focus on statements > as opposed to declarations. By placing these declarations in the loop > and performing non-trivial computations in the initialization, you > have added an unnecessary "extra level of confusion" to your primary > objective. What happens if you define x1 etc at function scope and > use simple assignment statements here? The initialization for n and y > is superfluous. > In the compiler tools we write the generated code for the x1 initialization would be the same size independent of a declaration inside the for loop or at the function scope. Initialization in a declaration is handled as if it were a statement and is optimized as part of the overall optimization. For example double x1 = i-1; is the same as if it were written as double x1 ; x1 = i-1; This is probably true in most C compilers. Regards -- Walter Banks Byte Craft Limited Tel. (519) 888-6911 http://www.bytecraft.com walter@bytecraft.com |
|
|
|
#33 |
|
Messages: n/a
Hébergeur: |
Barry Schwarz wrote:
> kumar...@gmail.com wrote: > .... snip ... >> >> for (i=1; i<times; i++) { >> double x1 = i-1; >> } > > In my very limited experience, optimizers tend to focus on statements > as opposed to declarations. By placing these declarations in the loop > and performing non-trivial computations in the initialization, you > have added an unnecessary "extra level of confusion" to your primary > objective. What happens if you define x1 etc at function scope and > use simple assignment statements here? The initialization for n and y > is superfluous. If you just stand back and look at that time-waster statement, you will see that x1 is discarded after the loop and never used, and that i is set to times. Therefore the optimizer can simply generate: i = times; for the whole loop, and the timing is constant. -- [mail]: Chuck F (cbfalconer at maineline dot net) [page]: <http://cbfalconer.home.att.net> Try the download section. ** Posted from http://www.teranews.com ** |
|
|
|
#34 |
|
Messages: n/a
Hébergeur: |
CBFalconer <cbfalconer@yahoo.com> writes:
> Barry Schwarz wrote: >> kumar...@gmail.com wrote: >> > ... snip ... >>> >>> for (i=1; i<times; i++) { >>> double x1 = i-1; >>> } >> >> In my very limited experience, optimizers tend to focus on statements >> as opposed to declarations. <snip> > If you just stand back and look at that time-waster statement, you > will see that x1 is discarded after the loop and never used, and > that i is set to times. Therefore the optimizer can simply > generate: > > i = times; > > for the whole loop, No it can't. times might be <= 0. > and the timing is constant. -- Ben. |
|
|
|
#35 |
|
Messages: n/a
Hébergeur: |
On May 10, 3:13 pm, kumar...@gmail.com wrote:
> On May 10, 1:18 pm, moi <r...@invalid.address.org> wrote: > > > > > On Sat, 10 May 2008 09:49:58 -0700, kumarchi wrote: > > > guys: > > > i have zeroed in and created a simple test program. This progrma just > > > has floating point addition and integer addition. it does 20 loops x > > > 1million times. in relase version of visual c it takes 0 time. gcc O3 > > > takes 6 secs in my machine. > > > : > > > this cannot be rocket science; there seems to some fundamental > > > deficiency in gcc. i will treat this as a bug. This should have serious > > > implications for linux platforms > > > > here is the code; test it for yourself > > > In gcc 4.1.2, with -O3 , on a 686, > > the whole function is elimated and inlined, > > leading to : > > > $ time ./a.out > > times=1000000000 loops=20 dtime = 0 > > > real 0m0.001s > > user 0m0.000s > > sys 0m0.003s > > $ > > In this case, there is no difference in generated code when > > -march=i686 -msse2 are added to the -O3 flag. > > > I guess, you'll have to invent a better benchmark :-) > > > AvK > > ok that was the behavior in visual c. i was using cygwin , i will test > it out in my ubuntu hardy box. thanx update: 1. I was able to install gcc4.3 in my cygwin laptop 2. I now compared by original program again visual c (release -- uses ms -O2 option) and gcc with -O3 -mtune=core2 -march=core2 -msse4. Visual c is faster by 2.5x!! 3. I try a switch in my program which deploys a different floating point algorithm. This algorithm is dominated by floating point additions as opposed to multiplications in the 'standard' program. The vc performance does not change. The gcc performance deteriorates and it is now 3.5x slower than visual c. 4. I will try to create a simpler test program to represent the above behavior. My belief is the difference has something to do with the floating point i still find it very hard to believe such a huge performance difference will exist. I can understand 10-20%, not 2.5x to 3.5x |
|
|
|
#36 |
|
Messages: n/a
Hébergeur: |
On 30 May 2008 at 16:09, kumarchi@gmail.com wrote:
> 2. I now compared by original program again visual c (release -- uses > ms -O2 option) and gcc with -O3 -mtune=core2 -march=core2 -msse4. > Visual c is faster by 2.5x!! Optimizing a program is an incredibly complex business. There are more parameters, code paths, trade offs and possibilities than you could enumerate before the sun becomes a red giant and we all melt off into oblivion. Why should a factor of 2 surprise you when two completely different compilers try this task? > 3. I try a switch in my program which deploys a different floating > point algorithm. This algorithm is dominated by floating point > additions as opposed to multiplications in the 'standard' program. > The vc performance does not change. The gcc performance deteriorates > and it is now 3.5x slower than visual c. If you care, disassemble the key routines, and compare what's going on under the hood. If gcc is doing something obviously silly, submit a patch to them. > 4. I will try to create a simpler test program to represent the above > behavior. My belief is the difference has something to do with the > floating point Good idea. I believe your proposed explanation makes no sense as stated. |
|
|
|
#37 |
|
Messages: n/a
Hébergeur: |
"Ulrich Eckhardt" <doomster@knuut.de> wrote in message
news:68jqmrF2t8f3hU1@mid.uni-berlin.de... > jacob navia wrote: >> Microsoft is running in one platform exclusively. > > Sorry, but that's untrue. The platforms I know are IA32, Intel's and AMD's > 64 bit platforms, MIPS, ARM, SH and maybe some more. Note that the latter > are used for MS' embedded platform. NT4 ran on a DEC Alpha as well. Also don't forget PowerPC in their XBOX! ;^) |
|
|
|
#38 |
|
Messages: n/a
Hébergeur: |
On May 30, 6:59pm, "Chris Thomasson" <cris...@comcast.net> wrote:
> "Ulrich Eckhardt" <dooms...@knuut.de> wrote in message > > news:68jqmrF2t8f3hU1@mid.uni-berlin.de... > > > jacob navia wrote: > >> Microsoft is running in one platform exclusively. > > > Sorry, but that's untrue. The platforms I know are IA32, Intel's and AMD's > > 64 bit platforms, MIPS, ARM, SH and maybe some more. Note that the latter > > are used for MS' embedded platform. > > NT4 ran on a DEC Alpha as well. Also don't forget PowerPC in their XBOX! NT4 also ran on PPC and MIPS. |
|
![]() |
| Outils de la discussion | |
|
|