precision - Floating Point operators correctly rounded implementation -

i'm implementing floating point library in c, it's based on ieee standard. i've started addition.

i've problem understand rounding.

maybe example help

x = -3.652248e-11  y = 1.263346e-10  cz = 8.981213e-11 mz = 8.981214e-11 cz = 8.98121e-11 mz = 8.98121e-11 x = ae20a0a7 y = 2f0ae808 cf = 2ec57fbc mf = 2ec57fbd cz = 457fbc mz = 457fbd ce = 5d me = 5d

in above, x , y random input value, cz result of c operator + , mz result of implementation.

cf , mf bit representation of final result. can see final bit different , don't understand why. i've took inspiration implementation handbook of floating point arithmetic.

what not understand, guess, how rounding performed. addition algorithm based on identity

x + y = (-1)^{sx}2^ex(|x| + (-1)^(sx xor sy) |y| 2^{ey-ex})

where if name quantity

|z| = (|x| + (-1)^(sx xor sy) |y| 2^{ey-ex})

basically problem arises when need post-normalize result using left shifting, careful in case |z| positive. rounding technique should applied here in case?

my copy of muller et al. on loan friend, can't double-check algorithm you're using specifically, walking through addition of values list:

x = 0xae20a0a7 = -b1.01000001010000010100111 * 2^-35 y = 0x2f0ae808 = +b1.00010101110100000001000 * 2^-33

if normalize x , y common exponent , add, un-normalized infinitely precise result:

  b100.01010111010000000100000 * 2^-35 -   b1.01000001010000010100111 * 2^-35 -------------------------------------    b11.00010101111111101111001 * 2^-35

now normalize without rounding yet:

b1.10001010111111110111100 1 * 2^-34                           ^                           rounding point

the infinitely-precise result halfway between 2 nearest floating-point numbers, choose even one, , round down to

b1.10001010111111110111100 * 2^-34 = 0x2ec57fbc

given exact halfway case, explanation why you're not getting correct answer you're not handling ties even part of rounding rule correctly. if try round adding half ulp , truncating, result observing.

Fun enginering

Search This Blog

precision - Floating Point operators correctly rounded implementation -

Comments

Post a Comment