Description
This definitely isn't a high priority issue, but I'd love to understand what's going on if anyone has ideas!
This might be an issue with my understanding of sample_prior_predictive
, but the fact that the behavior for Normal
and MvNormal
are not consistent suggests that it actually is a bug. Basically, it seems like MvNormal
doesn't seem to condition properly on the sampled variables that it depends on (probably something to do with draw_values
, but I don't really understand what happens under the hood well enough to know what!).
In the following example:
import numpy as np
import pymc3 as pm
np.random.seed(42)
ndim = 50
with pm.Model() as model:
a = pm.Normal("a", sd=100, shape=ndim)
b = pm.Normal("b", mu=a, sd=1, shape=ndim)
c = pm.MvNormal("c", mu=a, chol=np.linalg.cholesky(np.eye(ndim)), shape=ndim)
d = pm.MvNormal("d", mu=a, cov=np.eye(ndim), shape=ndim)
samples = pm.sample_prior_predictive(1000)
print(np.std(samples["a"]), np.std(samples["b"]), np.std(samples["c"]), np.std(samples["d"]))
print(np.std(samples["b"] - samples["a"]), np.std(samples["c"] - samples["a"]))
I get the following output:
100.01598664026606 100.01292464866555 99.96568032648382 nan
1.0016395711339057 141.20382229079829
In the first line, I'm surprised that the samples of d
are all nan
because it doesn't seem like there's anything wrong with the syntax, but the other results all seem right. But then the real issue is that I would expect the second line to return two numbers of order 1, but instead we're getting the sum of two random variables with sigmas of 1 and 100. This means that the mean of the MvNormal
is not being conditioned properly/consistently with the actual samples being generated. The PGM looks fine:
So I expect that the issue is in the sampling, not the model specification.
Let me know if you have any ideas about what's going on here!
Versions and main components
- PyMC3 Version: GitHub master (3.8)
- Theano Version: 1.0.4
- Python Version: 3.7.5
- Operating system: Mac
- How did you install PyMC3: pip