Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Unfortunately this is not available since the errors occur exclusively over proprietary datasets which are very complex and large. It is not feasible to distill an example.
Problem description
I am seeing MemoryError
exceptions all over the place on somewhat random lines. I am measuring the memory usage using psutil
to ascertain that there is however vast amounts of free memory on the node. These exceptions are making Pandas completely unusable for me. It's struggling with allocating 22 MiB when there is over 2 TiB of free memory available.
" File ""/opt/conda/envs/condaenv/lib/python3.8/site-packages/pandas/core/frame.py"", line 7950, in merge"
return merge(
" File ""/opt/conda/envs/condaenv/lib/python3.8/site-packages/pandas/core/reshape/merge.py"", line 74, in merge"
op = _MergeOperation(
" File ""/opt/conda/envs/condaenv/lib/python3.8/site-packages/pandas/core/reshape/merge.py"", line 652, in __init__"
) = self._get_merge_keys()
" File ""/opt/conda/envs/condaenv/lib/python3.8/site-packages/pandas/core/reshape/merge.py"", line 1063, in _get_merge_keys"
self.right = self.right._drop_labels_or_levels(right_drop)
" File ""/opt/conda/envs/condaenv/lib/python3.8/site-packages/pandas/core/generic.py"", line 1637, in _drop_labels_or_levels"
dropped = self.copy()
" File ""/opt/conda/envs/condaenv/lib/python3.8/site-packages/pandas/core/generic.py"", line 5665, in copy"
data = self._mgr.copy(deep=deep)
" File ""/opt/conda/envs/condaenv/lib/python3.8/site-packages/pandas/core/internals/managers.py"", line 811, in copy"
" res = self.apply(""copy"", deep=deep)"
" File ""/opt/conda/envs/condaenv/lib/python3.8/site-packages/pandas/core/internals/managers.py"", line 409, in apply"
" applied = getattr(b, f)(**kwargs)"
" File ""/opt/conda/envs/condaenv/lib/python3.8/site-packages/pandas/core/internals/blocks.py"", line 679, in copy"
values = values.copy()
"MemoryError: Unable to allocate 22.1 MiB for an array with shape (1, 2896551) and data type object"
Expected Output
These exceptions are not supposed to happen given there is sufficient free memory. There are many prior issues noting this exception, which is suggestive of severe continued bugs in Pandas.
Output of pd.show_versions()
This is not entirely available since it's running exclusively in the cloud, but the salient versions are:
python : 3.8.6.final.0
python-bits : 64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_US.UTF-8
pandas : 1.2.1
numpy : 1.19.4
pytz : 2020.4
dateutil : 2.8.1
pip : 20.3.3
setuptools : 50.3.1.post20201107
Cython : None
pandas_datareader: None
numexpr : None
pyarrow : 3.0.0