Skip to content

BUG: Integer Overflow in read_json with big number in string #30329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 24, 2020

Conversation

rohitkg98
Copy link
Contributor

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Some minor suggestions

@WillAyd WillAyd added the IO JSON read_json, to_json, json_normalize label Dec 18, 2019
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure about the expected result being a float as that would be rather inprecise - is there a way to coerce to object instead?

@rohitkg98
Copy link
Contributor Author

The result is only float if all inputs in a column are int/float. Actually, I've used a wrong testcase here, the Error only occurs when both int and str are used for one column. I will rectify that. As for the floats, I will look into keeping the whole column as object instead of float when parsing big numbers.

def test_frame_int_overflow(self):
# GH 30320
encoded_json = json.dumps([{"col": "31900441201190696999"}, {"col": "Text"}])
expected = DataFrame({"col": ["31900441201190696999", "Text"]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC from the issue, the expected output is an object column containing a number and a string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have written the number in a float form. I'll look into generating an object.

3.190044e+19

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the string not being coerced, although I think the issue may need further discussion. see #30320 (comment).

What happens now when the encoded_json is an int? Maybe we should test for this too here?

Copy link
Contributor Author

@rohitkg98 rohitkg98 Jan 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonjayhawkins Please see and provide guidance on this comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonjayhawkins Please see and provide guidance on this comment

not sure about this. @WillAyd?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the answer either. Probably needs to be done in C but if you see a way to make it happen in the error handler works as well

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc comment + @simonjayhawkins comments

@@ -803,6 +803,7 @@ I/O
- Bug in :func:`read_json` where default encoding was not set to ``utf-8`` (:issue:`29565`)
- Bug in :class:`PythonParser` where str and bytes were being mixed when dealing with the decimal field (:issue:`29650`)
- :meth:`read_gbq` now accepts ``progress_bar_type`` to display progress bar while the data downloads. (:issue:`29857`)
- Bug in :meth: `read_json` where integer overflow was occuring when json contains big number strings. (:issue:`30320`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no space before the `

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should move this to 1.1 at this point

@jreback
Copy link
Contributor

jreback commented Dec 27, 2019

can you merge master and will look again

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohitkg98 can you merge master and address comments?

@@ -803,6 +803,7 @@ I/O
- Bug in :func:`read_json` where default encoding was not set to ``utf-8`` (:issue:`29565`)
- Bug in :class:`PythonParser` where str and bytes were being mixed when dealing with the decimal field (:issue:`29650`)
- :meth:`read_gbq` now accepts ``progress_bar_type`` to display progress bar while the data downloads. (:issue:`29857`)
- Bug in :meth: `read_json` where integer overflow was occuring when json contains big number strings. (:issue:`30320`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should move this to 1.1 at this point

@jreback
Copy link
Contributor

jreback commented Jan 18, 2020

@rohitkg98 if you'd rebase and update for @WillAyd comment.

@rohitkg98
Copy link
Contributor Author

Sorry for the late reply. I'll do it ASAP @jreback @WillAyd

@pep8speaks
Copy link

pep8speaks commented Jan 21, 2020

Hello @rohitkg98! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-01-21 03:17:05 UTC

Kaushal Rohit added 2 commits January 21, 2020 08:46
@jreback jreback added this to the 1.1 milestone Jan 24, 2020
@jreback
Copy link
Contributor

jreback commented Jan 24, 2020

lgtm. @WillAyd if you had further questions.

@WillAyd WillAyd merged commit 47922d3 into pandas-dev:master Jan 24, 2020
@WillAyd
Copy link
Member

WillAyd commented Jan 24, 2020

Thanks @rohitkg98

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Read_json overflow error when json contains big number strings
5 participants