Floating Point Numbers

Not Just Whole Numbers

Of course, digital devices can represent more than just whole numbers. But how can numbers with decimal points be represented in binary? The most intuitive way might be to first treat the number as a whole number, and then always insert a decimal point at the same point in its decimal equivalent. Consider the following 32 bit representation:

11011111110001011101111111000101
3754287045
375428.7045

This assumes that the number is always represented to the 1/10,000ths in precision, but this is arbitrary; we could have chosen any different number of decimal places. However, we would need to choose this number beforehand, and it would apply to any number we encoded with this representation. This is called fixed-point representation, because the decimal point is always in the same place.

Because of this, fixed-point numbers are very limited in the range of values they can represent. In our example above, we could encode 0–429496.7295. That seems like a fairly large range, but we couldn’t even encode one million using this 32 bit, four decimal place, fixed-point representation!

Floating Point Numbers

To circumvent this limitation, modern computers use a floating-point representation. This means that not only is the numeric value determined by the binary representation, but where the decimal point is located is encoded as well.

The standard convention for 32 bit floating point numbers—called the IEEE Standard for Floating-Point Arithmetic (IEEE 754)—splits the bits into groups like this:

The mantissa is the numeric portion of the encoding, the exponent indicates where to place the decimal point, and the sign denotes if the number is negative or positive. If you have ever used scientific notation, then you have written a value similar to the way computers store and process floating point numbers.

Note that moving the decimal point allows us to encode a much wider range of numbers than we could with fixed-point representation. Using only 32 bits, we can encode positive values from 1.17549 × 10-38 to 3.40282 × 1038, or 0.0000000000000000000000000000000000000117549 to 340282000000000000000000000000000000000! Note the number of zeroes in each. We can’t encode all the values between these two numbers, just values to a certain degree of precision. So as you see, we use a finite representation (floating point number) is used to model the infinite mathematical concept of a number.