Tuesday, May 19, 2009

Pythons float makes me want to smoke crack.

Ran into another annoying bug with pythons 'float' numbers today, and it reminded me of an old post about how Decimals should be default in python3.

Witness pythons supreme silliness printing out time.time()...

$ python3 -c "import time;print(time.time())"
1242730404.02

$ python2.5 -c "import time;print(time.time())"
1242730412.33

$ python2.6 -c "import time;print(time.time())"
1242730416.87

$ python2.6 -c "import time;print(repr(time.time()))"
1242730432.5543971

$ python2.6 -c "import time;import decimal;print(decimal.Decimal(str(time.time())))"
1242730432.55



Notice how it truncates the float to only two decimal places?! So silly, and likely the cause of many subtle bugs in programs throughout the land. If you repr it, there are less places cut off (12 for print vs 17 for repr).

The last example shows the real bad part... say you want to use a decimal... python strangely wants your argument to decimal be a str, not a float. Then your float gets truncated by the str() function... and BAM!! POW!.


The problem with pythons decimal.Decimal(1.0) is that it gives you this error message if you pass it a float...
TypeError: Cannot convert float to Decimal.  First convert the float to a string


Peoples first response is to do... decimal.Decimal(str(123456789.12345670))... BAM!! POW!... you just truncated your number, and lost precision(and accuracy).

Many pieces of code make this mistake with pythons silly float handling.


Decimals are not as silly as a python float for printing:

$ python2.6 -c "import time;import decimal;print(decimal.Decimal('%23.23f' % time.time()))"
1242732616.70742297172546386718750


Note the "%23.23f" which says to use up to 23 characters on the left, and up to 23 on the right. So if you want to print, or str() a float... and expect something sensible to show up... don't use print(afloat) or str(afloat) or even repr(afloat)... use that weird syntax instead (with appropriate precision chosen on each side)!



The Cobra language(which Simon wrote about recently) does use Decimals by default - and it has many other elegant features.

10 comments:

Thomas Crawley said...

Hi,

Floats are imprecise binary representations of a Number. They are correct only to a certain
number of places. A float does not represent a
number exactly and cannot be taken as doing so.
This is why all Financial Transactions use Decimal and why the need for decimals arise.

Decimals are exact representations of a Number.

From the Zen of Python:
In the face of ambiguity, refuse the temptation to guess.

The reason why Python does not convert floats to Decimals is that such a conversion is not
generally possible.

The same is true in all other languages which
handle floats e.g Java. The only solution is to use Decimals universally but this has efficiency implications and also leads to problems when dealing with C Extensions which
do not support Decimal types natively.

Compare the implementation of Decimal in Python
to BigDecimal in Java and see how much more elegant the Python implementation is.

You should try to understand the issues before
you criticize or demean language features.

See

http://www.python.org/dev/peps/pep-0327

for more information.

Tom

joep said...

printing and internal representation are not the same thing. just use the print format instead of string or use repr, which in this case should also preserve the precision.

import time
print('%30.15f' % time.time())

floats are still the fastest thing around for calculations.

Jiri said...

I do not think that float is bad - there is no silver bullet. You have to know details about Decimal calculations anyway.

See some examples of rounding and division, although decimal precision is better, division can break it.
>>> print 0.99999999999999999 # beware of rounding!
1.0
>>> print Decimal("0.99999999999999999")
0.99999999999999999
>>> print 1.0/3 # float precision is bad
0.333333333333
>>> print Decimal("1.0")/Decimal("3") # even decimal is bad
0.3333333333333333333333333333
>>> print Decimal("1.0")/Decimal("3")*Decimal("3")
0.9999999999999999999999999999

Michael Foord said...

As others have said - I think you're missing the point a bit with floats.

They're inaccurate anyway. Having the string representation less noisy than the repr is a feature.

Having to use a string or integer to construct a decimal rather than a float is intended to drive home the point about float accuracy. If you need the accuracy of decimals then you shouldn't be starting with a float.

Mark Dickinson said...

When Python 3.1 arrives, you'll be able to
use the Decimal.from_float method, which
does an exact conversion:

>>> Decimal.from_float(1.1)
Decimal('1.100000000000000088817841
970012523233890533447265625')

illume said...

hello,

nice to hear about Decimal.from_float()

pythons print(str()), and repr() do not reflect the precision of floats.

They truncate to 13(print) and 18(repr) characters. So if the the number is like: 0.12345678901234 it seems fine... and lots of the precision is there.

However when you have a large number (like a unix time) then it doesn't show the precision very nicely.

Which is nicer?
1242730432.5543971
1242730432.55
Is that reducing noise? No, it's just truncating for an implementation detail.

For this use case python prints out a number with only 100/th of a second. When you probably want to see a 10000/th of a second.

The comment in the source code says:
/* Precisions used by repr() and str(), respectively.

The repr() precision (17 significant decimal digits) is the minimal number that is guaranteed to have enough precision so that if the number is read back in the exact same binary value is recreated. This is true for IEEE floating point by design, and also happens to work for all other modern hardware.

The str() precision is chosen so that in most cases, the rounding noise created by various operations is suppressed, while giving plenty of precision for practical use.

*/


Unfortunately the implementation for str() and repr() fail with what they are trying to achieve.

They also fail with regard to surprise. Since they print out plenty of precision on the right in some cases... but when the number gets large, they only show a small amount of precision on the right.

There's a reason why the float format characters allow you to specify precision on the left and the right.

Pythons float is buggy, broken and worst of all - it's not elegant.

Floats time has past. Bring on the Decimal(or at least fix print)!


See: IEEE 754-2008
http://en.wikipedia.org/wiki/IEEE_754

illume said...

7.67 years ago...
>>> (((242782361 / 60.) / 60.) / 24.) / 365.
7.6985781646372393

unix time rolled over to use another digit.

Sometime before 7.67 years ago python float/print behaviour would have displayed milliseconds.

However, since 7.67 years ago, python has been without the ability to have print time.time() show milliseconds.

Sad panda.

illume said...

Here's the link pointing out that the str truncation problem affects real code.

http://www.google.com/codesearch?hl=en&lr=&q=%22Decimal(str%22+lang%3Apython&sbtn=Search

See how many projects make the mistake I described?

Including Django, ibm db, Satchmo(a shop written with python), py-postgresql, jython, and others...

It's also in situations where they are trying to get more precision - but instead pythons float drops their precision to 12.

Mark Dickinson said...

How would you like repr(x) to behave for a float x? Can you specify the behaviour that you'd like to see, including what should happen for very large (~1e300) or very small (~1e-320) floats, and everything in between?

Bear in mind that to write the exact value of an IEEE 754 double in base 10 would require more than 750 characters in the worst case (some subnormal floats). This is not Python's fault: it's an unavoidable consequence of using the hardware's representation of floating-point numbers. To do otherwise would incur massive performance penalties.

What this discussion is telling me is that maybe time.time() shouldn't be returning a float. Perhaps a pair (n, x) would be more appropriate, where n is an integer giving the number of days since the epoch, and x is a float giving the time as a proportion of the current day.

> pythons print(str()), and repr() do not reflect the
> precision of floats.

> They truncate to 13(print) and 18(repr)
> characters.

That's not quite right: they truncate to 12 and 17 significant digits, respectively, and also remove any trailing zeros (for the sake of aesthetics). The number of characters can be larger, especially for large or small floats where an exponent is required.

Actually, Python 3.1 will use a somewhat different algorithm for repr, which outputs the minimum number of significant digits required to recover the float exactly.

illume said...

hello,

@Mark, good to hear that one part of this bug is going to be fixed in python 3.1.

Main points in summary:

- 12 digits is too small for str().
- repr with 17 is too small.
- the comment for repr(float), and str(float) implementations show that they are failing in their specification.
- Decimals instead of floats would be better.
- there should be a number of digits on the left, and the right... not lumping them in together.
- Decimal error messages should be improved... as many projects have made the mistake of Decimal(str(afloat)), as proved by the code search link in open source projects.
- computers and the world have moved on, so python needs to keep up and move on to more dense numbers... 64 ints, Decimals etc.