Skip to content

BUG: float truncation in eval with py 2 #14255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.19.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1567,7 +1567,7 @@ Bug Fixes
- Bug in ``DataFrame.to_csv()`` with ``MultiIndex`` columns in which a stray empty line was added (:issue:`6618`)
- Bug in ``DatetimeIndex``, ``TimedeltaIndex`` and ``PeriodIndex.equals()`` may return ``True`` when input isn't ``Index`` but contains the same values (:issue:`13107`)
- Bug in assignment against datetime with timezone may not work if it contains datetime near DST boundary (:issue:`14146`)

- Bug in ``pd.eval()`` and ``HDFStore`` query truncating long float literals with python 2 (:issue:`14241`)
- Bug in ``Index`` raises ``KeyError`` displaying incorrect column when column is not in the df and columns contains duplicate values (:issue:`13822`)
- Bug in ``Period`` and ``PeriodIndex`` creating wrong dates when frequency has combined offset aliases (:issue:`13874`)
- Bug in ``.to_string()`` when called with an integer ``line_width`` and ``index=False`` raises an UnboundLocalError exception because ``idx`` referenced before assignment.
Expand Down
5 changes: 5 additions & 0 deletions pandas/computation/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,11 @@ def _resolve_name(self):
def name(self):
return self.value

def __unicode__(self):
# in python 2 str() of float
# can truncate shorter than repr()
return repr(self.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet this is a similar problem when selecting from pytables, can you add a test for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test/fix for this



_bool_op_map = {'not': '~', 'and': '&', 'or': '|'}

Expand Down
6 changes: 5 additions & 1 deletion pandas/computation/pytables.py
Original file line number Diff line number Diff line change
Expand Up @@ -611,10 +611,14 @@ def __init__(self, value, converted, kind):
def tostring(self, encoding):
""" quote the string if not encoded
else encode and return """
if self.kind == u('string'):
if self.kind == u'string':
if encoding is not None:
return self.converted
return '"%s"' % self.converted
elif self.kind == u'float':
# python 2 str(float) is not always
# round-trippable so use repr()
return repr(self.converted)
return self.converted


Expand Down
25 changes: 25 additions & 0 deletions pandas/computation/tests/test_eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -678,6 +678,31 @@ def test_line_continuation(self):
result = pd.eval(exp, engine=self.engine, parser=self.parser)
self.assertEqual(result, 12)

def test_float_truncation(self):
# GH 14241
exp = '1000000000.006'
result = pd.eval(exp, engine=self.engine, parser=self.parser)
expected = np.float64(exp)
self.assertEqual(result, expected)

df = pd.DataFrame({'A': [1000000000.0009,
1000000000.0011,
1000000000.0015]})
cutoff = 1000000000.0006
result = df.query("A < %.4f" % cutoff)
self.assertTrue(result.empty)

cutoff = 1000000000.0010
result = df.query("A > %.4f" % cutoff)
expected = df.loc[[1, 2], :]
tm.assert_frame_equal(expected, result)

exact = 1000000000.0011
result = df.query('A == %.4f' % exact)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I didn't mean for this last one to be ==, it might work but I wouldn't guarantee it (as float equality esp when you store may not be guaranteed )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know float precision is thorny generally, but shouldn't this be guaranteed with HDF5?

expected = df.loc[[1], :]
tm.assert_frame_equal(expected, result)



class TestEvalNumexprPython(TestEvalNumexprPandas):

Expand Down
23 changes: 23 additions & 0 deletions pandas/io/tests/test_pytables.py
Original file line number Diff line number Diff line change
Expand Up @@ -5002,6 +5002,29 @@ def test_read_from_py_localpath(self):

tm.assert_frame_equal(expected, actual)

def test_query_long_float_literal(self):
# GH 14241
df = pd.DataFrame({'A': [1000000000.0009,
1000000000.0011,
1000000000.0015]})

with ensure_clean_store(self.path) as store:
store.append('test', df, format='table', data_columns=True)

cutoff = 1000000000.0006
result = store.select('test', "A < %.4f" % cutoff)
self.assertTrue(result.empty)

cutoff = 1000000000.0010
result = store.select('test', "A > %.4f" % cutoff)
expected = df.loc[[1, 2], :]
tm.assert_frame_equal(expected, result)

exact = 1000000000.0011
result = store.select('test', 'A == %.4f' % exact)
expected = df.loc[[1], :]
tm.assert_frame_equal(expected, result)


class TestHDFComplexValues(Base):
# GH10447
Expand Down