Skip to content

json_normalize in 1.0.0 with meta path specified - expects iterable #31507

Closed
@tturocy

Description

@tturocy

Code Sample, a copy-pastable example if possible

import json
from pandas.io.json import json_normalize

the_json = """                                                                  
[{"id": 99,                                                                     
  "data": [{"one": 1, "two": 2}]                                                
}]                                                                              
"""

print(json_normalize(json.loads(the_json),
                     record_path=['data'], meta=['id']))

Problem description

Through 0.25.3, this program generates a DataFrame with one row. In 1.0.0 it fails with an exception:

Traceback (most recent call last):
  File "foo.py", line 11, in <module>
    record_path=['data'], meta=['id']))
  File "/home/dataczar/venvs/test/lib/python3.7/site-packages/pandas/util/_decorators.py", line 66, in wrapper
    return alternative(*args, **kwargs)
  File "/home/dataczar/venvs/test/lib/python3.7/site-packages/pandas/io/json/_normalize.py", line 327, in _json_normalize
    _recursive_extract(data, record_path, {}, level=0)
  File "/home/dataczar/venvs/test/lib/python3.7/site-packages/pandas/io/json/_normalize.py", line 314, in _recursive_extract
    meta_val = _pull_field(obj, val[level:])
  File "/home/dataczar/venvs/test/lib/python3.7/site-packages/pandas/io/json/_normalize.py", line 246, in _pull_field
    f"{js} has non iterable value {result} for path {spec}. "
TypeError: {'id': 99, 'data': [{'one': 1, 'two': 2}]} has non iterable value 99 for path ['id']. Must be iterable or null.

I don't see any documentation changes that suggest a backwards-incompatible change. All my calls to json_normalize that don't use meta function as before.

Expected Output

Through 0.25.3, the output was:

   one  two  id
0    1    2  99

Output of pd.show_versions()

From my virtualenv with pandas 1.0.0:

INSTALLED VERSIONS ------------------ commit : None python : 3.7.4.final.0 python-bits : 64 OS : Linux OS-release : 4.2.0-042stab120.16 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : en_GB.UTF-8

pandas : 1.0.0
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.0.3
setuptools : 40.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.11.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 1.3.13
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

From my virtualenv with 0.25.x:

INSTALLED VERSIONS ------------------ commit : None python : 3.7.4.final.0 python-bits : 64 OS : Linux OS-release : 4.2.0-042stab120.16 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : en_GB.UTF-8

pandas : 0.25.0
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.0.3
setuptools : 40.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.7.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : None
sqlalchemy : 1.3.7
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO JSONread_json, to_json, json_normalizeRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions