Open
Description
Is your feature request related to a problem?
Yes. Currently, the df.explode
method always returns an object for the column being exploded. This leads to loss of information about the dtype
of the exploded column.
E.g.
s = pd.Series([1,2,3]) # <- dtype('int64')
df = pd.DataFrame({'A': [s, s, s, s], 'B': 1})
df.explode("A").dtypes
0 | |
---|---|
A | object |
B | int64 |
It would be great if pandas could return the underlying dtype if it was consistent across all rows. (Or return the best dtype (int -> float -> object).)
Describe the solution you'd like
- solution 1: The best case scenario would be where pandas would directly infer the dtype if it was consistent (ignoring NaNs) across the across the row.
s = pd.Series([1,None,3]) # <- dtype('float64')
df = pd.DataFrame({'A': [s, s, s, s], 'B': 1}) # <- empty list is converted to NaN
df.explode("A").dtypes
0 | |
---|---|
A | float64 |
B | int64 |
- solution 2: Providing a argument to force inferring the dtype:
s = pd.Series([1,None,3]) # <- dtype('float64')
df = pd.DataFrame({'A': [s, s, s, s], 'B': 1}) # <- empty list is converted to NaN
df.explode("A", infer_type=True).dtypes
0 | |
---|---|
A | float64 |
B | int64 |
Describe alternatives you've considered
Currently, I use the following workaround:
s = pd.Series([1,None,3]) # <- dtype('float64')
df = pd.DataFrame({'A': [s, s, s, s], 'B': 1}) # <- empty list is converted to NaN
d = df.A[0].dtype
df2 = df.explode("A")
df2.A = df2.A.astype(d)
API breaking implications
Not sure.