Closed
Description
This code:
x = 1234567890123456789
x - pd.io.json.json_normalize([{'x': x}, {}]).loc[0, 'x'].astype(int)
Gives 21
, when reasonable users might expect it to give 0
.
This inaccuracy occurs when one field has an int64 value that is not present in all records, triggering a conversion of the Series dtype to float64. Of course, Pandas does this conversion so that it can put NAN where no value exists.
One solution could be to add a fill_value
parameter, as seen in add()
, unstack()
, and other Pandas functions. It would be good for this to support a dict as well as a single value, in case different fill values are required for different columns.
The usage might be like this:
pd.io.json.json_normalize([{'x': x}, {}], fill_value=-1).x
# or
pd.io.json.json_normalize([{'x': x}, {}], fill_value={'x': -1}).x
Then instead of the current result:
0 1.234568e+18
1 NaN
Name: x, dtype: float64
The result would be:
0 1234567890123456789
1 -1
Name: x, dtype: int64
I'm using Pandas 0.20.1.