Skip to content

json_normalize does not normalize subrecords properly if any subrecords values are NoneType #20030

Closed
@aerymilts

Description

@aerymilts

Code Sample, a copy-pastable example if possible

data_fail_to_normalize = \
        [{'info': None}, 
         
        {'info': 
         {'created_at': '11/08/1993', 'last_updated': '26/05/2012'},
        'author_name': 
         {'first': 'Jane', 'last_name': 'Doe'}
        }]

data_partial_fail = \
        [{'info': None, 
         'author_name': 
         {'first': 'Smith', 'last_name': 'Appleseed'}
        }, 
        
        {'info': 
         {'created_at': '11/08/1993', 'last_updated': '26/05/2012'},
        'author_name': 
         {'first': 'Jane', 'last_name': 'Doe'}
        }]

>>> import pandas as pd
>>> pd.io.json.json_normalize(data_fail_to_normalize)

Output 1

author_name info
0 nan None
1 {'first': 'Jane', 'last_name': 'Doe'} {'created_at': '11/08/1993', 'last_updated': '26/05/2012'}
>>> pd.io.json.json_normalize(data_partial_fail)

Output 2

author_name.first author_name.last_name info info.created_at info.last_updated
0 Smith Appleseed nan nan nan
1 Jane Doe nan 11/08/1993 26/05/2012

Problem description

I expected that the json_normalize function takes into account the presence of NoneTypes in the dictionaries. This leads to 2 separate issues (If I should open this as 2 separate issues, let me know).

I have already written a fix that solves this issue - if anyone else can validate that this is not working as intended, I can set up a PR.

Output 1

Does not unnest json after encountering NoneType at first instance of subrecord, line 192 of pandas/io/json/normalize.py

Output 2

Keeps the None value when encountered, [{k: {'alpha': 'foo', 'beta': 'bar'}}, {k: None}], see nested_to_record function. Creates additional column of nans which would not otherwise occur if that particular key was removed.

Expected Output

Output 1

author_name.first author_name.last_name info.created_at info.last_updated
0 nan nan nan nan
1 Jane Doe 11/08/1993 26/05/2012

Output 2

author_name.first author_name.last_name info.created_at info.last_updated
0 Smith Appleseed nan nan
1 Jane Doe 11/08/1993 26/05/2012

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-104-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 38.4.0
Cython: None
numpy: 1.13.1
scipy: None
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO JSONread_json, to_json, json_normalizeMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions