Skip to content

ENH: json_normalize() avoid loss of precision for int64 with missing values #16918

Closed
@jzwinck

Description

@jzwinck

This code:

x = 1234567890123456789
x - pd.io.json.json_normalize([{'x': x}, {}]).loc[0, 'x'].astype(int)

Gives 21, when reasonable users might expect it to give 0.

This inaccuracy occurs when one field has an int64 value that is not present in all records, triggering a conversion of the Series dtype to float64. Of course, Pandas does this conversion so that it can put NAN where no value exists.

One solution could be to add a fill_value parameter, as seen in add(), unstack(), and other Pandas functions. It would be good for this to support a dict as well as a single value, in case different fill values are required for different columns.

The usage might be like this:

pd.io.json.json_normalize([{'x': x}, {}], fill_value=-1).x
# or
pd.io.json.json_normalize([{'x': x}, {}], fill_value={'x': -1}).x

Then instead of the current result:

0    1.234568e+18
1             NaN
Name: x, dtype: float64

The result would be:

0    1234567890123456789
1                     -1
Name: x, dtype: int64

I'm using Pandas 0.20.1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions