DSP processors store data in fixed or floating point formats. It is worth noting that fixed point format is not quite the same as integer: The integer format is straightforward: representing whole numbers from 0 up to the largest whole number that can be represented with the available number of bits. Fixed point format is used to represent numbers that lie between 0 and 1: with a 'binary point' assumed to lie just after the most significant bit. The most significant bit in both cases carries the sign of the number.
To make the best use of the full available word length in the fixed point format, the programmer has to make some decisions:
In both cases the programmer has to keep a track of by how much the binary point has been shifted, in order to restore all numbers to the same scale at some later stage. Floating point format has the remarkable property of automatically scaling all numbers by moving, and keeping track of, the binary point so that all numbers use the full word length available but never overflow: Floating point numbers have two parts: the mantissa, which is similar to the fixed point part of the number, and an exponent which is used to keep track of how the binary point is shifted. Every number is scaled by the floating point hardware:
In both cases the exponent is used to count how many times the number has been shifted. In floating point numbers the binary point comes after the second most significant bit in the mantissa. The block floating point format provides some of the benefits of floating point, but by scaling blocks of numbers rather than each individual number: Block floating point numbers are actually represented by the full word length of a fixed point format.
In both cases the exponent is used to count how many times the numbers in the block have been shifted. Some specialised processors, such as those from Zilog, have special features to support the use of block floating point format: more usually, it is up to the programmer to test each block of numbers and carry out the necessary scaling. The floating point format has one further advantage over fixed point: it is faster. Because of quantisation error, a basic direct form 1 IIR filter second order section requires an extra multiplier, to scale numbers and avoid overflow. But the floating point hardware automatically scales every number to avoid overflow, so this extra multiplier is not required:
