Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import tracemalloc
import numpy as np
import time
import gc
# Start memory tracking
tracemalloc.start()
iteration = 0
Row_Number = 20000
while iteration < 1000:
test_lst = [*range(12)]
for i in range(12):
# Create a DataFrame with X amount of rows
df = pd.DataFrame({
"A": np.arange(Row_Number), # Sequential Row_Numbers from 0 to 999999
"B": np.random.rand(Row_Number), # Random floats between 0 and 1
"C": np.random.randint(0, 100, size=Row_Number), # Random integers between 0 and 99
"D": np.random.choice(["apple", "banana", "cherry"], size=Row_Number), # Random categories
"E": np.random.randn(Row_Number) # Normally distributed random Row_Numbers
})
test_lst[i] = df # The bug also appears without appending to list
del df # Deleting df at the end of loop doesnt affect memory leak
del test_lst # Deleting list at the end of loop doesnt affect memory leak
time.sleep(0.01)
iteration += 1
# Check memory usage for 3rd party packages
if iteration % 1 == 0:
snapshot = tracemalloc.take_snapshot()
# Get memory statistics **without filtering** first
top_stats = snapshot.statistics("lineno")
print(f"\n[ Memory Snapshot at iteration {iteration} ]")
for stat in top_stats[:5]: # Show top memory-consuming locations
print(stat)
Issue Description
By using tracemalloc (a tool to track memory usage in loops), I can see that pandas doesnt release memory when creating dfs inside a loop. The problem seems to come from pandas\core\internals\blocks around line 228. Would be nice if anyone could find a fix to this.
Expected Behavior
That the memory doesnt leak
Installed Versions
INSTALLED VERSIONS
commit : 0691c5c
python : 3.13.1
python-bits : 64
OS : Windows
OS-release : 11
Version : 10.0.22631
machine : AMD64
processor : Intel64 Family 6 Model 186 Stepping 2, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en
LOCALE : Norwegian Bokmål_Norway.1252
pandas : 2.2.3
numpy : 2.2.2
pytz : 2024.2
dateutil : 2.9.0.post0
pip : 24.2
Cython : None
sphinx : 8.1.3
IPython : 8.31.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : 3.1.5
lxml.etree : None
matplotlib : 3.10.0
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : 19.0.0
pyreadstat : None
pytest : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.15.1
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
tzdata : 2024.2
qtpy : 2.4.2
pyqt5 : None