Description
Code Sample, a copy-pastable example if possible
# Your code here
A = pd.DataFrame({'x': [['hoge', 'piyo'], ['fuga']], 'y': ['1', '2']})
B = pd.DataFrame({'x': [['hoge2', 'piyo2'], ['fuga2'], ['meta2']], 'y': ['1', '2', '3']})
A = A.append(B)
# >>> print(A)
x y
0 [hoge, piyo] 1
1 [fuga] 2
0 [hoge2, piyo2] 1
1 [fuga2] 2
2 [meta2] 3
# >>> A.explode('x')
x y
0 hoge 1
0 piyo 1
0 hoge2 1
0 piyo2 1
0 hoge 1
0 piyo 1
0 hoge2 1
0 piyo2 1
1 fuga 2
1 fuga2 2
1 fuga 2
1 fuga2 2
2 meta2 3
Problem description
The ideal of A.explode('x')
output is following.
x y
hoge 1
piyo 1
fuga 2
hoge2 1
piyo2 1
fuga2 2
meta2 3
But, The actual output is enlarged because the index of DataFrame A
.
This will cause mistakes for users of explode method.
Expected Output
The solution is to do a reset_index
before the explode
.
# >>> A.reset_index(drop=True).explode('x')
x y
0 hoge 1
0 piyo 1
1 fuga 2
2 hoge2 1
2 piyo2 1
3 fuga2 2
4 meta2 3
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.6.8.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : ja_JP.UTF-8
LOCALE : ja_JP.UTF-8
pandas : 0.25.3
numpy : 1.16.4
pytz : 2018.9
dateutil : 2.7.5
pip : 19.2.3
setuptools : 41.0.1
Cython : 0.29.4
pytest : 4.4.0
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.3.3
html5lib : None
pymysql : None
psycopg2 : 2.8.1 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 7.4.0
pandas_datareader: None
bs4 : 4.7.1
bottleneck : None
fastparquet : None
gcsfs : 0.3.0+7.g59da8cd
lxml.etree : 4.3.3
matplotlib : 3.0.3
numexpr : None
odfpy : None
openpyxl : 2.6.0
pandas_gbq : 0.10.0
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.0
sqlalchemy : 1.3.4
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
Sorry for ugly English. Thanks.