Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
import pandas as pd
exi = pd.read_parquet('/home/yoh/Documents/code/data/existing.parquet', engine='fastparquet')
new = pd.read_parquet('/home/yoh/Documents/code/data/new.parquet', engine='fastparquet')
to_record = pd.concat([exi, new])
Please, find files enclosed. I could not succeed to re-create manually faulty dataframes.
faulty_dataframes.zip
(size of each is 5 rows x 8 columns)
Problem description
Before installation of pandas 1.3.0
, I was using pandas 1.2.5
and fastparquet 0.6.4.dev0
and this extract of data was not causing problem.
After I installed pandas 1.3.0
the concat
command is now issuing following error:
to_record = pd.concat([exi, new])
Traceback (most recent call last):
File "<ipython-input-2-9967cb321e9e>", line 4, in <module>
to_record = pd.concat([exi, new])
File "/home/yoh/anaconda3/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/home/yoh/anaconda3/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 307, in concat
return op.get_result()
File "/home/yoh/anaconda3/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 532, in get_result
new_data = concatenate_managers(
File "/home/yoh/anaconda3/lib/python3.8/site-packages/pandas/core/internals/concat.py", line 222, in concatenate_managers
values = _concatenate_join_units(join_units, concat_axis, copy=copy)
File "/home/yoh/anaconda3/lib/python3.8/site-packages/pandas/core/internals/concat.py", line 486, in _concatenate_join_units
to_concat = [
File "/home/yoh/anaconda3/lib/python3.8/site-packages/pandas/core/internals/concat.py", line 487, in <listcomp>
ju.get_reindexed_values(empty_dtype=empty_dtype, upcasted_na=upcasted_na)
File "/home/yoh/anaconda3/lib/python3.8/site-packages/pandas/core/internals/concat.py", line 403, in get_reindexed_values
values = self.block.get_values()
File "/home/yoh/anaconda3/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 1360, in get_values
return np.asarray(values).reshape(self.shape)
ValueError: cannot reshape array of size 5 into shape (1,0)
I could notice that using pyarrow
to read the files back allows having dataframes not causing any error.
I also tried to concat various extract of the dataframes by selecting column one by one, or even several at once, and it does not raise the error. For instance, following concat
do not raise trouble:
to_record = pd.concat([exi[['timestamp','period','side']], new[['side','timestamp','period']]])
to_record = pd.concat([exi[['period','id']], new[['id','period']]])
to_record = pd.concat([exi['tracking'], new['tracking']])
# etc...
I am at a loss to reduce the trouble to the root cause.
Please, would anyone has some advice?
Expected Output
No error :)
Output of pd.show_versions()
INSTALLED VERSIONS
commit : f00ed8f
python : 3.8.8.final.0
python-bits : 64
OS : Linux
OS-release : 5.8.0-59-generic
Version : #66~20.04.1-Ubuntu SMP Thu Jun 17 11:14:10 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.3.0
numpy : 1.20.2
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.3
setuptools : 52.0.0.post20210125
Cython : 0.29.23
pytest : 6.2.4
hypothesis : None
sphinx : 4.0.2
blosc : None
feather : None
xlsxwriter : 1.4.4
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.22.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 2021.06.0
fastparquet : 0.6.4.dev0
gcsfs : None
matplotlib : 3.3.4
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 3.0.0
pyxlsb : None
s3fs : None
scipy : 1.6.2
sqlalchemy : 1.4.19
tables : 3.6.1
tabulate : 0.8.9
xarray : 0.18.2
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.53.1