Description
Problem
Right now json_normalize
will leave lists encountered within dictionaries intact:
import pandas as pd
df = pd.json_normalize([{"a": [1, 1]}, {"a": [1, 2]}])
print(df)
output:
a
0 [1, 1]
1 [1, 2]
Each entry is a list
object in this case. I am not really sure how this is of any use really. If I for example like to do anything with the first element of each row I would have to convert this first into yet another DataFrame
with something like:
df2 = pd.DataFrame({f"a.{k}": [i[k] for i in df['a']] for k in range(len(df['a'][0]))})
print(df2)
output
a.0 a.1
0 1 1
1 1 2
Solution
It would be really useful I think, if there is a flag or something that would enable to directly flatten lists as well. Something like json_normalize(data, flatten_list=True)
. The list index is then used as a string in the record name, e.g. "a.0.b", "a.1.b"
etc.
API breaking implications
Don't think this would break any API.
Alternatives
There are a few packages that already have some of this ability, but require additional dependencies and intermediate products, so are slowing down conversion: