Open
Description
Describe the issue:
This has come up in the past (#6852, #4167) and has now started cropping up again. Multiprocess sampling will fail sometime during sampling with a ConnectionResetError. Most recently, it has been happening to me on Linux (Fedora).
A workaround is to simply change the random number seed of the sampler, and it usually runs.
Details below.
Reproduceable code example:
Seems to be stochastic, so hard to reproduce.
Error message:
---------------------------------------------------------------------------
ConnectionResetError Traceback (most recent call last)
Cell In[28], [line 2](vscode-notebook-cell:?execution_count=28&line=2)
[1](vscode-notebook-cell:?execution_count=28&line=1) with ad_spend_model:
----> [2](vscode-notebook-cell:?execution_count=28&line=2) ptrace = pm.sample(100, chains=6, cores=4, random_seed=random_seed)
File ~/repos/pymc/pymc/sampling/mcmc.py:841, in sample(draws, tune, chains, cores, random_seed, progressbar, progressbar_theme, step, var_names, nuts_sampler, initvals, init, jitter_max_retries, n_init, trace, discard_tuned_samples, compute_convergence_checks, keep_warning_stat, return_inferencedata, idata_kwargs, nuts_sampler_kwargs, callback, mp_ctx, blas_cores, model, **kwargs)
[839](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/mcmc.py:839) _print_step_hierarchy(step)
[840](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/mcmc.py:840) try:
--> [841](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/mcmc.py:841) _mp_sample(**sample_args, **parallel_args)
[842](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/mcmc.py:842) except pickle.PickleError:
[843](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/mcmc.py:843) _log.warning("Could not pickle model, sampling singlethreaded.")
File ~/repos/pymc/pymc/sampling/mcmc.py:1254, in _mp_sample(draws, tune, step, chains, cores, random_seed, start, progressbar, progressbar_theme, traces, model, callback, blas_cores, mp_ctx, **kwargs)
[1252](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/mcmc.py:1252) try:
[1253](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/mcmc.py:1253) with sampler:
-> [1254](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/mcmc.py:1254) for draw in sampler:
[1255](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/mcmc.py:1255) strace = traces[draw.chain]
[1256](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/mcmc.py:1256) strace.record(draw.point, draw.stats)
File ~/repos/pymc/pymc/sampling/parallel.py:471, in ParallelSampler.__iter__(self)
[464](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:464) task = progress.add_task(
[465](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:465) self._desc.format(self),
[466](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:466) completed=self._completed_draws,
[467](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:467) total=self._total_draws,
[468](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:468) )
[470](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:470) while self._active:
--> [471](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:471) draw = ProcessAdapter.recv_draw(self._active)
[472](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:472) proc, is_last, draw, tuning, stats = draw
[473](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:473) self._completed_draws += 1
File ~/repos/pymc/pymc/sampling/parallel.py:328, in ProcessAdapter.recv_draw(processes, timeout)
[326](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:326) idxs = {id(proc._msg_pipe): proc for proc in processes}
[327](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:327) proc = idxs[id(ready[0])]
--> [328](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:328) msg = ready[0].recv()
[330](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:330) if msg[0] == "error":
[331](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/repos/pymc/pymc/sampling/parallel.py:331) old_error = msg[1]
File ~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:250, in _ConnectionBase.recv(self)
[248](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:248) self._check_closed()
[249](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:249) self._check_readable()
--> [250](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:250) buf = self._recv_bytes()
[251](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:251) return _ForkingPickler.loads(buf.getbuffer())
File ~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:430, in Connection._recv_bytes(self, maxsize)
[429](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:429) def _recv_bytes(self, maxsize=None):
--> [430](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:430) buf = self._recv(4)
[431](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:431) size, = struct.unpack("!i", buf.getvalue())
[432](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:432) if size == -1:
File ~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:395, in Connection._recv(self, size, read)
[393](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:393) remaining = size
[394](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:394) while remaining > 0:
--> [395](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:395) chunk = read(handle, remaining)
[396](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:396) n = len(chunk)
[397](https://file+.vscode-resource.vscode-cdn.net/home/cfonnesbeck/repos/bayes_pydata_london_2024/~/miniforge3/envs/pymc_course/lib/python3.12/multiprocessing/connection.py:397) if n == 0:
ConnectionResetError: [Errno 104] Connection reset by peer
### PyMC version information:
Python version : 3.12.3
pymc : 5.15.1+17.g508a1341f.dirty
pytensor : 2.22.1
### Context for the issue:
_No response_