Re: Is there an algorithm for integer multiply and divide at the sametime?
On Jul 3, 7:06 am, Kai-Uwe Bux <jkherci...@gmx.net> wrote:
> Jerry Coffin wrote:
> > In article <g4geok$fv...@aioe.org>, jkherci...@gmx.net says...
> > [ ... ]
> Elsethread the OP already remarked that he could use double
> but was afraid that the performance penalty might be too high.
Which makes me wonder. On my machine, a multiply followed by a
divide is faster in double than in int.
If speed is really an issue, and the hardware is an x86-64, then
his problem can be solved with two machine instructions:
multiplying two 64 bit values results in a 128 bit value, and
division is a 128 bit value divided by a 64 bit value.
Historically, having a multiply instruction which returned a
value with twice the width of its operands, and having division
which took a numerator of twice the width of the denominator and
the result were pretty much standard practice at the assembler
level. Modern architectures seem to have gotten away from this,
however, because no high level language supports it, and the
extra registers it uses introduced awkward constraints in the
register allocation routines in the compiler. (On the IBM 360's
architecture, the C compiler I used always used R0/R1 for
multiplication and division, and never used these registers for
anything else. Which, of course, doesn't result in particularly
good optimization.)
> I would not expect the paper and pencil method in binary to
> beat hardware division. However, only measurement will show.
I would normally expect that if you wrote a*b/c, a good compiler
on an Intel architecture would generate something which would
actually give the right results, even if there was an
intervening overflow (at least for signed values---the standard
doesn't allow it for unsigned), not for any intrinsic reasons,
but because the fastest code to evaluate this expression will
give the right results (and according to the standard, if
overflow occurs, the behavior is undefined, so it's certainly
not incorrect to give the correct results). A quick look at the
code generated by g++, however (for 32 bits, since the only
Intel architectures I have access to here are 32 bits), shows
that it systematically inserts an extra cltd instruction between
the multiply and the divide. (CLTD seems to correspond to what
Intel names CDQ.) Whether this is intentional, because g++
defines behavior in case of overflow, or whether it is simply
careless code generation (if the operand to the multiply comes
from any source other than the results of a division, it is
necessary), I don't know.
--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
|