Closed
Description
Describe the issue:
Process memory grows steadily until it consumes all available memory (and swap). Replicated on linux and M1 Mac. Note that the default 'fork' for multiprocessing on linux fails immediately before it even begins sampling with Errno 12 OOM.
PYMC version: 5.7.2
Linux system:
- Void Linux
- Kernel 6.3.12_1
- 64 GB DDR5 RAM
- 24 GB RTX 4090 GPU
- AMD Ryzen 9 7950X 16 core, 32 threads
Mac System:
- 16 GB memory
- 8 Cores
Dataset: ~161 mb total.
Reproduceable code example:
#!/usr/bin/env python3
import numpy as np
import pandas as pd
import pymc as pm
def pymc_bayes(df: pd.DataFrame):
a, b, c, i = df.a.values, df.b.values, df.c.values, df.i.values
n_i = int(i.max() + 1)
with pm.Model() as m:
alpha = pm.Normal("alpha", 0, 1, shape=[n_i])
beta_b = pm.HalfNormal("beta_b", 1)
beta_c = pm.HalfNormal("beta_c", 1)
beta_int = pm.Normal("beta_int", 0, 1)
mu = pm.Deterministic(
"mu", alpha[i] + beta_b * b + beta_c * c + beta_int * b * c
)
sigma = pm.Exponential("sigma", 1)
a_hat = pm.Normal("a_hat", mu, sigma, observed=a)
idata = pm.sample(mp_ctx="spawn") # fork fails immediately with OOM
idata.to_netcdf("pymc_bayes.nc")
print("finished!")
if __name__ == "__main__":
n, n_int = 2618018, 17 # to match the real dataset I care about
df = pd.DataFrame(np.random.randn(n, 3), columns=['a', 'b', 'c'])
df['i'] = np.random.randint(0, n_int, size=n)
pymc_bayes(df)
Error message:
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha, beta_b, beta_c, beta_int, sigma]
Process worker_chain_2:███████████████████████---------------| 76.14% [6091/8000 18:42<05:51 Sampling 4 chains, 0 divergences]s]
Process worker_chain_3:
Process worker_chain_0:
Process worker_chain_1:
Traceback (most recent call last):
Traceback (most recent call last):
File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 122, in run
self._start_loop()
File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 181, in _start_loop
msg = self._recv_msg()
^^^^^^^^^^^^^^^^
File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 153, in _recv_msg
return self._msg_pipe.recv()
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 249, in recv
buf = self._recv_bytes()
^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes
buf = self._recv(4)
^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 378, in _recv
chunk = read(handle, remaining)
^^^^^^^^^^^^^^^^^^^^^^^
ConnectionResetError: [Errno 104] Connection reset by peer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 122, in run
self._start_loop()
File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 181, in _start_loop
msg = self._recv_msg()
^^^^^^^^^^^^^^^^
File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 153, in _recv_msg
return self._msg_pipe.recv()
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 249, in recv
buf = self._recv_bytes()
^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes
buf = self._recv(4)
^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 378, in _recv
chunk = read(handle, remaining)
^^^^^^^^^^^^^^^^^^^^^^^
ConnectionResetError: [Errno 104] Connection reset by peer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 122, in run
self._start_loop()
File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 181, in _start_loop
msg = self._recv_msg()
^^^^^^^^^^^^^^^^
File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 153, in _recv_msg
return self._msg_pipe.recv()
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 249, in recv
buf = self._recv_bytes()
^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes
buf = self._recv(4)
^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 378, in _recv
chunk = read(handle, remaining)
^^^^^^^^^^^^^^^^^^^^^^^
ConnectionResetError: [Errno 104] Connection reset by peer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 122, in run
self._start_loop()
File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 181, in _start_loop
msg = self._recv_msg()
^^^^^^^^^^^^^^^^
File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 153, in _recv_msg
return self._msg_pipe.recv()
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 249, in recv
buf = self._recv_bytes()
^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes
buf = self._recv(4)
^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 378, in _recv
chunk = read(handle, remaining)
^^^^^^^^^^^^^^^^^^^^^^^
ConnectionResetError: [Errno 104] Connection reset by peer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 194, in _run_process
_Process(*args).run()
File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 129, in run
self._msg_pipe.send(("error", e))
File "/usr/lib/python3.11/multiprocessing/connection.py", line 205, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/lib/python3.11/multiprocessing/connection.py", line 410, in _send_bytes
self._send(header + buf)
File "/usr/lib/python3.11/multiprocessing/connection.py", line 367, in _send
n = write(self._handle, buf)
^^^^^^^^^^^^^^^^^^^^^^^^
BrokenPipeError: [Errno 32] Broken pipe
File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 194, in _run_process
_Process(*args).run()
File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 129, in run
self._msg_pipe.send(("error", e))
File "/usr/lib/python3.11/multiprocessing/connection.py", line 205, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/lib/python3.11/multiprocessing/connection.py", line 410, in _send_bytes
self._send(header + buf)
File "/usr/lib/python3.11/multiprocessing/connection.py", line 367, in _send
n = write(self._handle, buf)
^^^^^^^^^^^^^^^^^^^^^^^^
BrokenPipeError: [Errno 32] Broken pipe
File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 194, in _run_process
_Process(*args).run()
File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 129, in run
self._msg_pipe.send(("error", e))
File "/usr/lib/python3.11/multiprocessing/connection.py", line 205, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/lib/python3.11/multiprocessing/connection.py", line 410, in _send_bytes
self._send(header + buf)
File "/usr/lib/python3.11/multiprocessing/connection.py", line 367, in _send
n = write(self._handle, buf)
^^^^^^^^^^^^^^^^^^^^^^^^
BrokenPipeError: [Errno 32] Broken pipe
File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 194, in _run_process
_Process(*args).run()
File "/home/danj/.local/lib/python3.11/site-packages/pymc/sampling/parallel.py", line 129, in run
self._msg_pipe.send(("error", e))
File "/usr/lib/python3.11/multiprocessing/connection.py", line 205, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/lib/python3.11/multiprocessing/connection.py", line 410, in _send_bytes
self._send(header + buf)
File "/usr/lib/python3.11/multiprocessing/connection.py", line 367, in _send
n = write(self._handle, buf)
^^^^^^^^^^^^^^^^^^^^^^^^
BrokenPipeError: [Errno 32] Broken pipe
PyMC version information:
PYMC 5.7.2
Aesara 2.9.1
PyTensor 2.14.2
uname -a
: Linux ghost 6.3.13_1 #1 SMP PREEMPT_DYNAMIC Tue Jul 25 00:19:40 UTC 2023 x86_64 GNU/Linux
Context for the issue:
This is a simple linear model with an interaction term, although I couldn't get it to work without OOM even with two covariates.