Skip to content

ENH: json_normalize flatten lists as well #42311

Open
@cosama

Description

@cosama

Problem

Right now json_normalize will leave lists encountered within dictionaries intact:

import pandas as pd
df = pd.json_normalize([{"a": [1, 1]}, {"a": [1, 2]}])
print(df)

output:

        a
0  [1, 1]
1  [1, 2]

Each entry is a list object in this case. I am not really sure how this is of any use really. If I for example like to do anything with the first element of each row I would have to convert this first into yet another DataFrame with something like:

df2 = pd.DataFrame({f"a.{k}": [i[k] for i in df['a']] for k in range(len(df['a'][0]))})
print(df2)

output

   a.0  a.1
0    1    1
1    1    2

Solution

It would be really useful I think, if there is a flag or something that would enable to directly flatten lists as well. Something like json_normalize(data, flatten_list=True). The list index is then used as a string in the record name, e.g. "a.0.b", "a.1.b" etc.

API breaking implications

Don't think this would break any API.

Alternatives

There are a few packages that already have some of this ability, but require additional dependencies and intermediate products, so are slowing down conversion:

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementIO JSONread_json, to_json, json_normalizeNested DataData where the values are collections (lists, sets, dicts, objects, etc.).

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions