-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Integer Overflow in read_json with big number in string #30329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Integer Overflow in read_json with big number in string #30329
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Some minor suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure about the expected result being a float as that would be rather inprecise - is there a way to coerce to object instead?
The result is only float if all inputs in a column are int/float. Actually, I've used a wrong testcase here, the Error only occurs when both int and str are used for one column. I will rectify that. As for the floats, I will look into keeping the whole column as object instead of float when parsing big numbers. |
def test_frame_int_overflow(self): | ||
# GH 30320 | ||
encoded_json = json.dumps([{"col": "31900441201190696999"}, {"col": "Text"}]) | ||
expected = DataFrame({"col": ["31900441201190696999", "Text"]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC from the issue, the expected output is an object column containing a number and a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They have written the number in a float form. I'll look into generating an object.
3.190044e+19
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with the string not being coerced, although I think the issue may need further discussion. see #30320 (comment).
What happens now when the encoded_json
is an int
? Maybe we should test for this too here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonjayhawkins Please see and provide guidance on this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonjayhawkins Please see and provide guidance on this comment
not sure about this. @WillAyd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the answer either. Probably needs to be done in C but if you see a way to make it happen in the error handler works as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc comment + @simonjayhawkins comments
doc/source/whatsnew/v1.0.0.rst
Outdated
@@ -803,6 +803,7 @@ I/O | |||
- Bug in :func:`read_json` where default encoding was not set to ``utf-8`` (:issue:`29565`) | |||
- Bug in :class:`PythonParser` where str and bytes were being mixed when dealing with the decimal field (:issue:`29650`) | |||
- :meth:`read_gbq` now accepts ``progress_bar_type`` to display progress bar while the data downloads. (:issue:`29857`) | |||
- Bug in :meth: `read_json` where integer overflow was occuring when json contains big number strings. (:issue:`30320`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no space before the `
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should move this to 1.1 at this point
can you merge master and will look again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rohitkg98 can you merge master and address comments?
doc/source/whatsnew/v1.0.0.rst
Outdated
@@ -803,6 +803,7 @@ I/O | |||
- Bug in :func:`read_json` where default encoding was not set to ``utf-8`` (:issue:`29565`) | |||
- Bug in :class:`PythonParser` where str and bytes were being mixed when dealing with the decimal field (:issue:`29650`) | |||
- :meth:`read_gbq` now accepts ``progress_bar_type`` to display progress bar while the data downloads. (:issue:`29857`) | |||
- Bug in :meth: `read_json` where integer overflow was occuring when json contains big number strings. (:issue:`30320`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should move this to 1.1 at this point
@rohitkg98 if you'd rebase and update for @WillAyd comment. |
Hello @rohitkg98! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-01-21 03:17:05 UTC |
removed whitespace from test_pandas
…ndle_json_int_overflow
lgtm. @WillAyd if you had further questions. |
Thanks @rohitkg98 |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff