Description
Code Sample, a copy-pastable example if possible
import json
from pandas.io.json import json_normalize
the_json = """
[{"id": 99,
"data": [{"one": 1, "two": 2}]
}]
"""
print(json_normalize(json.loads(the_json),
record_path=['data'], meta=['id']))
Problem description
Through 0.25.3, this program generates a DataFrame with one row. In 1.0.0 it fails with an exception:
Traceback (most recent call last):
File "foo.py", line 11, in <module>
record_path=['data'], meta=['id']))
File "/home/dataczar/venvs/test/lib/python3.7/site-packages/pandas/util/_decorators.py", line 66, in wrapper
return alternative(*args, **kwargs)
File "/home/dataczar/venvs/test/lib/python3.7/site-packages/pandas/io/json/_normalize.py", line 327, in _json_normalize
_recursive_extract(data, record_path, {}, level=0)
File "/home/dataczar/venvs/test/lib/python3.7/site-packages/pandas/io/json/_normalize.py", line 314, in _recursive_extract
meta_val = _pull_field(obj, val[level:])
File "/home/dataczar/venvs/test/lib/python3.7/site-packages/pandas/io/json/_normalize.py", line 246, in _pull_field
f"{js} has non iterable value {result} for path {spec}. "
TypeError: {'id': 99, 'data': [{'one': 1, 'two': 2}]} has non iterable value 99 for path ['id']. Must be iterable or null.
I don't see any documentation changes that suggest a backwards-incompatible change. All my calls to json_normalize
that don't use meta
function as before.
Expected Output
Through 0.25.3, the output was:
one two id
0 1 2 99
Output of pd.show_versions()
From my virtualenv with pandas 1.0.0:
pandas : 1.0.0
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.0.3
setuptools : 40.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.11.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 1.3.13
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None
From my virtualenv with 0.25.x:
pandas : 0.25.0
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.0.3
setuptools : 40.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.7.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : None
sqlalchemy : 1.3.7
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None