|
Quadruple 128 bit Floating Point Library
-
Version
1.0
Signed 128-bit floating point data type library, with
64 effective bits of precision (vs. 53 for the built-
in Double type) and a 64 bit exponent (vs. 11 for
Doubles). With greater precision and far greater
range, Quads are especially useful when dealing with
very large or very small values, such as those in
probabilistic models.
Adopting a larger fixed precision rather than an
arbitrary precision type (such as Java's BigDecimal)
means that, while still slower than built-in
arithmetic, the penalty is only an order of magnitude
or less and thus still feasible in many math-heavy
applications. For example, on an Intel Core i5-2410M
laptop, a billion multiplications takes 17 seconds
with Double values, 135 seconds with Quad values using
the overloaded * operator, and just 76 seconds using
the Multiply() method (the higher overhead of * is due
to the poor inlining logic of the .Net compiler/JIT
optimizer). By comparison, the commonly-used
workaround for multiplication underflow and overflow,
|