|
Notes Spring 2009
|
Floating PointOverflow occurs when the value that we compute cannot fit into the number of bits we have allocated for the result. In computing we call all noninteger values real values. That is, a real number has a whole number part and a fractional part, either of which may be zero. First, we need to look at binary fractions. Binary FractionsWe use a radix point in the same role as the decimal point in decimal notation. That is, the digits to the left of the radix point represent the integer part of the value. The digits to the right represent the fractional part of the value. _ _ _ . _ _ _ 22 21 20 2-1 2-2 2-3
2-1 = 1/2 Floating Point RepresentationHow is this number represented in a computer? We store the number in a notation similar to scientific notation and include information showing where the radix point is, and what the sign is. Any real value can be described by three properties:
This is floating point notation. Let's look at some decimal numbers first, to get a feel for what needs to be done. First we need to know how many digits in the mantissa. Let's choose 5. We can convert our real numbers into a 5 digit mantissa with exponent. Floating point notation puts the radix point in front of the mantissa so we will look at that form as well.
The last example shows us one of the problems with floating-point notation – truncation error. Part of the value being stored is lost because the mantissa field is not large enough. If the mantissa held only 5 digits then the last (least significant) digits are lost. Note that we need to have positive and negative values for the exponent. And that the mantissa always starts with a non-zero digit. Now let's switch to binary. Although a common standard uses 64 bits for a floating-point number: 1 bit for the sign, 11 bits for the exponent, and 52 bits for the mantissa, we will use an abbreviated version. Let's use one byte: 1 bit for the sign, 3 bits for the exponent, and 4 bits for the mantissa. . The sign is easy enough 0 or 1. The exponent has 3 bits and must be either + or -. We could use 2's complement, but excess notation is used instead! 3 bits means excess 4 notation. That leaves 4 bits for the mantissa. Why excess notation? It gives the best range of exponents and ease of wiring the circuitry. Review: Rules:
Examples:
|
| Syllabus | Notes |