Re: Layout floating point numbers

From: Steve Judd (sjudd_at_ffd2.com)
Date: 2002-10-04 18:38:42

Hola,

I just wanted to expand on what John has said.

On Fri, 4 Oct 2002, John West McKenna wrote:

> >Questions:
> >- It says the first bit is used for the sign. So what is the function of $66
> >???
> 
> There are two floating point formats used in the C64.  "packed' has the
> sign in the first bit, and occupies five(?) bytes.  'unpacked' has the
> sign in a separate byte.  It's easier for processing, but takes more
> memory.  Packed is used for variables, unpacked in the FACs.

Well, I think the answer is that it's used for processing convenience --
for example, when multiplying numbers you need to keep track of the signs,
but need to set the high bit to 1 before multiplying, etc.

What it isn't, of course, is part of the 5-byte FP format.

> >- If I understand well the other 31 bits hold the data representing the
> >number right of the decimal point. But in what format? 
> 
> It's been a long, long time.  There should be one sign bit, some number
> of exponent bits (probably stored in excess-N format), then the rest is
> the fraction.
> 
> value = sign * 2^(exp-N) * 1.fraction

Just to expand on this, for Ruud et. al.

A number in binary like

	101.0110

has the value

	1*2^2 + 0*2^1 + 1*2^0 + 0*2^-1 + 1*2^-2 + 1*2^-3 + 0*2^-4

The point here is that numbers to the right of the decimal are inverse
powers of two.  You can also write the above number as

	10.10110 * 2
	1.010110 * 2^2
	.1010110 * 2^3
	.01010110* 2^4

etc.  You can always write any number in such a way that it is 

	1.xxxx * 2^N 

i.e. has the first one to the left of the decimal place; the only
exception is zero.

And that's all an FP number is.  You have a four-byte mantissa and a
one-byte exponent

	1.xxxxxxxxxxx * 2^exponent

Since there is always a 1 as the first digit, it doesn't need to be
stored; instead, it is used for the sign.  Special values, in this case
zero, are specified using the exponent -- modern IEEE FP numbers also have
special values like NaN (not a number) and Inf (infinity) with defined
operations (for example, atan(Inf)).

Multiplying or dividing numbers is easy -- you multiply the mantissas and
add the exponents together.  Multiplying two 4-byte mantissas together
will produce an 8-byte value, of which the lower 4-bytes are thrown away.
To preserve accuracy during a calculation, however, a fifth byte is used
to remember the lower bits.

Adding two numbers is harder, since you can't just add, say 1.000*2^6
and 1.000*2^4 by adding mantissas.  Instead, you have to first shift the
numbers to have the same exponent; in the above case, you first convert
the second number to 0.010*2^6 and then add mantissas to get 1.010*2^6.

This is why you can't add large numbers to small numbers, even though they
are well within the FP range.  For example, try adding 0.000001 to 100000
(or whatever) over and over.  In converting the small number to the larger
exponent the lower bits get shifted out of existence, and you add zero.

> There are a few constants in the ROMs, such as 1.0, 10.0, PI, and a few
> others.  You should be able to work out how many bits the exponent and
> fraction take, and what the exponent excess is.

Yeah, I agree -- try converting some of the ROM values by hand, and you'll
quickly see how it all works.

cu,

-Steve


       Message was sent through the cbm-hackers mailing list

Archive generated by hypermail 2.1.4.