Skip to content

Commit 2f8072c

Browse files
Merge branch 'main' into icar
2 parents e66e1d0 + d59a960 commit 2f8072c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

72 files changed

+2695
-1438
lines changed

.github/workflows/devcontainer-docker-image.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ jobs:
2222
uses: actions/checkout@2541b1294d2704b0964813337f33b291d3f8596b
2323

2424
- name: Setup Docker buildx
25-
uses: docker/setup-buildx-action@v2.4.1
25+
uses: docker/setup-buildx-action@v2.9.1
2626

2727
- name: Prepare metadata
2828
id: meta

.github/workflows/tests.yml

+4-3
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,7 @@ jobs:
110110
tests/logprob/test_composite_logprob.py
111111
tests/logprob/test_cumsum.py
112112
tests/logprob/test_mixture.py
113+
tests/logprob/test_order.py
113114
tests/logprob/test_rewriting.py
114115
tests/logprob/test_scan.py
115116
tests/logprob/test_tensor.py
@@ -183,7 +184,7 @@ jobs:
183184
matrix:
184185
os: [windows-latest]
185186
floatx: [float64]
186-
python-version: ["3.8"]
187+
python-version: ["3.9"]
187188
test-subset:
188189
- tests/variational/test_approximations.py tests/variational/test_callbacks.py tests/variational/test_inference.py tests/variational/test_opvi.py tests/test_initial_point.py
189190
- tests/test_model.py tests/sampling/test_mcmc.py
@@ -259,7 +260,7 @@ jobs:
259260
matrix:
260261
os: [macos-latest]
261262
floatx: [float64]
262-
python-version: ["3.9"]
263+
python-version: ["3.10"]
263264
test-subset:
264265
- |
265266
tests/sampling/test_parallel.py
@@ -338,7 +339,7 @@ jobs:
338339
matrix:
339340
os: [ubuntu-20.04]
340341
floatx: [float64]
341-
python-version: ["3.10"]
342+
python-version: ["3.11"]
342343
test-subset:
343344
- tests/sampling/test_jax.py tests/sampling/test_mcmc_external.py
344345
fail-fast: false

.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -47,3 +47,5 @@ pytestdebug.log
4747
# Codespaces
4848
pythonenv*
4949
env/
50+
venv/
51+
.venv/

.pre-commit-config.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -29,12 +29,12 @@ repos:
2929
- id: isort
3030
name: isort
3131
- repo: https://github.com/asottile/pyupgrade
32-
rev: v3.6.0
32+
rev: v3.10.1
3333
hooks:
3434
- id: pyupgrade
3535
args: [--py37-plus]
3636
- repo: https://github.com/psf/black
37-
rev: 23.3.0
37+
rev: 23.7.0
3838
hooks:
3939
- id: black
4040
- id: black-jupyter

conda-envs/environment-dev.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ dependencies:
1414
- numpy>=1.15.0
1515
- pandas>=0.24.0
1616
- pip
17-
- pytensor>=2.12.0,<2.13
17+
- pytensor>=2.14.1,<2.15
1818
- python-graphviz
1919
- networkx
2020
- scipy>=1.4.1

conda-envs/environment-docs.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ dependencies:
1212
- numpy>=1.15.0
1313
- pandas>=0.24.0
1414
- pip
15-
- pytensor>=2.12.0,<2.13
15+
- pytensor>=2.14.1,<2.15
1616
- python-graphviz
1717
- scipy>=1.4.1
1818
- typing-extensions>=3.7.4

conda-envs/environment-test.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ dependencies:
1717
- numpy>=1.15.0
1818
- pandas>=0.24.0
1919
- pip
20-
- pytensor>=2.12.0,<2.13
20+
- pytensor>=2.14.1,<2.15
2121
- python-graphviz
2222
- networkx
2323
- scipy>=1.4.1

conda-envs/windows-environment-dev.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ dependencies:
1414
- numpy>=1.15.0
1515
- pandas>=0.24.0
1616
- pip
17-
- pytensor>=2.12.0,<2.13
17+
- pytensor>=2.14.1,<2.15
1818
- python-graphviz
1919
- networkx
2020
- scipy>=1.4.1

conda-envs/windows-environment-test.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ dependencies:
1717
- numpy>=1.15.0
1818
- pandas>=0.24.0
1919
- pip
20-
- pytensor>=2.12.0,<2.13
20+
- pytensor>=2.14.1,<2.15
2121
- python-graphviz
2222
- networkx
2323
- scipy>=1.4.1

docs/source/contributing/implementing_distribution.md

+26-24
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
(implementing-a-distribution)=
22
# Implementing a RandomVariable Distribution
33

4-
This guide provides an overview on how to implement a distribution for PyMC version `>=4.0.0`.
4+
This guide provides an overview on how to implement a distribution for PyMC.
55
It is designed for developers who wish to add a new distribution to the library.
6-
Users will not be aware of all this complexity and should instead make use of helper methods such as `~pymc.DensityDist`.
6+
Users will not be aware of all this complexity and should instead make use of helper methods such as `~pymc.CustomDist`.
77

8-
PyMC {class}`~pymc.Distribution` builds on top of PyTensor's {class}`~pytensor.tensor.random.op.RandomVariable`, and implements `logp`, `logcdf` and `moment` methods as well as other initialization and validation helpers.
8+
PyMC {class}`~pymc.Distribution` builds on top of PyTensor's {class}`~pytensor.tensor.random.op.RandomVariable`, and implements `logp`, `logcdf`, `icdf` and `moment` methods as well as other initialization and validation helpers.
99
Most notably `shape/dims/observed` kwargs, alternative parametrizations, and default `transform`.
1010

1111
Here is a summary check-list of the steps needed to implement a new distribution.
@@ -14,14 +14,14 @@ Each section will be expanded below:
1414
1. Creating a new `RandomVariable` `Op`
1515
1. Implementing the corresponding `Distribution` class
1616
1. Adding tests for the new `RandomVariable`
17-
1. Adding tests for `logp` / `logcdf` and `moment` methods
17+
1. Adding tests for `logp` / `logcdf` / `icdf` and `moment` methods
1818
1. Documenting the new `Distribution`.
1919

2020
This guide does not attempt to explain the rationale behind the `Distributions` current implementation, and details are provided only insofar as they help to implement new "standard" distributions.
2121

2222
## 1. Creating a new `RandomVariable` `Op`
2323

24-
{class}`~pytensor.tensor.random.op.RandomVariable` are responsible for implementing the random sampling methods, which in version 3 of PyMC used to be one of the standard `Distribution` methods, alongside `logp` and `logcdf`.
24+
{class}`~pytensor.tensor.random.op.RandomVariable` are responsible for implementing the random sampling methods.
2525
The `RandomVariable` is also responsible for parameter broadcasting and shape inference.
2626

2727
Before creating a new `RandomVariable` make sure that it is not already offered in the {mod}`NumPy library <numpy.random>`.
@@ -68,8 +68,7 @@ class BlahRV(RandomVariable):
6868
# This is the Python code that produces samples. Its signature will always
6969
# start with a NumPy `RandomState` object, then the distribution
7070
# parameters, and, finally, the size.
71-
#
72-
# This is effectively the PyMC >=4.0 replacement for `Distribution.random`.
71+
7372
@classmethod
7473
def rng_fn(
7574
cls,
@@ -87,19 +86,20 @@ blah = BlahRV()
8786

8887
Some important things to keep in mind:
8988

90-
1. Everything inside the `rng_fn` method is pure Python code (as are the inputs) and should not make use of other `PyTensor` symbolic ops. The random method should make use of the `rng` which is a NumPy {class}`~numpy.random.RandomState`, so that samples are reproducible.
89+
1. Everything inside the `rng_fn` method is pure Python code (as are the inputs) and should __not__ make use of other `PyTensor` symbolic ops. The random method should make use of the `rng` which is a NumPy {class}`~numpy.random.RandomGenerator`, so that samples are reproducible.
9190
1. Non-default `RandomVariable` dimensions will end up in the `rng_fn` via the `size` kwarg. The `rng_fn` will have to take this into consideration for correct output. `size` is the specification used by NumPy and SciPy and works like PyMC `shape` for univariate distributions, but is different for multivariate distributions. For multivariate distributions the __`size` excludes the `ndim_supp` support dimensions__, whereas the __`shape` of the resulting `TensorVariable` or `ndarray` includes the support dimensions__. For more context check {ref}`The dimensionality notebook <dimensionality>`.
92-
1. `PyTensor` tries to infer the output shape of the `RandomVariable` (given a user-specified size) by introspection of the `ndim_supp` and `ndim_params` attributes. However, the default method may not work for more complex distributions. In that case, custom `_supp_shape_from_params` (and less probably, `_infer_shape`) should also be implemented in the new `RandomVariable` class. One simple example is seen in the {class}`~pymc.DirichletMultinomialRV` where it was necessary to specify the `rep_param_idx` so that the `default_supp_shape_from_params` helper method can do its job. In more complex cases, it may not suffice to use this default helper. This could happen for instance if the argument values determined the support shape of the distribution, as happens in the `~pymc.distributions.multivarite._LKJCholeskyCovRV`.
91+
1. `PyTensor` can automatically infer the output shape of univariate `RandomVariable`s (`ndim_supp=0`). For multivariate distributions (`ndim_supp>=1`), the method `_supp_shape_from_params` must be implemented in the new `RandomVariable` class. This method returns the support dimensionality of an RV given its parameters. In some cases this can be derived from the shape of one of its parameters, in which case the helper {func}`pytensor.tensor.random.utils.supp_shape_from_ref_param_shape` cand be used as is in {class}`~pymc.DirichletMultinomialRV`. In other cases the argument values (and not their shapes) may determine the support shape of the distribution, as happens in the `~pymc.distributions.multivarite._LKJCholeskyCovRV`. In simpler cases they may be constant.
9392
1. It's okay to use the `rng_fn` `classmethods` of other PyTensor and PyMC `RandomVariables` inside the new `rng_fn`. For example if you are implementing a negative HalfNormal `RandomVariable`, your `rng_fn` can simply return `- halfnormal.rng_fn(rng, scale, size)`.
9493

9594
*Note: In addition to `size`, the PyMC API also provides `shape`, `dims` and `observed` as alternatives to define a distribution dimensionality, but this is taken care of by {class}`~pymc.Distribution`, and should not require any extra changes.*
9695

97-
For a quick test that your new `RandomVariable` `Op` is working, you can call the `Op` with the necessary parameters and then call `eval()` on the returned object:
96+
For a quick test that your new `RandomVariable` `Op` is working, you can call the `Op` with the necessary parameters and then call {class}`~pymc.draw` on the returned object:
9897

9998
```python
10099

101100
# blah = pytensor.tensor.random.uniform in this example
102-
blah([0, 0], [1, 2], size=(10, 2)).eval()
101+
# multiple calls with the same seed should return the same values
102+
pm.draw(blah([0, 0], [1, 2], size=(10, 2)), random_seed=1)
103103

104104
# array([[0.83674527, 0.76593773],
105105
# [0.00958496, 1.85742402],
@@ -117,10 +117,10 @@ blah([0, 0], [1, 2], size=(10, 2)).eval()
117117
## 2. Inheriting from a PyMC base `Distribution` class
118118

119119
After implementing the new `RandomVariable` `Op`, it's time to make use of it in a new PyMC {class}`~pymc.Distribution`.
120-
PyMC `>=4.0.0` works in a very {term}`functional <Functional Programming>` way, and the `distribution` classes are there mostly to facilitate porting the `PyMC3` v3.x code to PyMC `>=4.0.0`, add PyMC API features and keep related methods organized together.
120+
PyMC works in a very {term}`functional <Functional Programming>` way, and the `distribution` classes are there mostly to add PyMC API features and keep related methods organized together.
121121
In practice, they take care of:
122122

123-
1. Linking ({term}`Dispatching`) an rv_op class with the corresponding `moment`, `logp` and `logcdf` methods.
123+
1. Linking ({term}`Dispatching`) an `rv_op` class with the corresponding `moment`, `logp`, `logcdf` and `icdf` methods.
124124
1. Defining a standard transformation (for continuous distributions) that converts a bounded variable domain (e.g., positive line) to an unbounded domain (i.e., the real line), which many samplers prefer.
125125
1. Validating the parametrization of a distribution and converting non-symbolic inputs (i.e., numeric literals or NumPy arrays) to symbolic variables.
126126
1. Converting multiple alternative parametrizations to the standard parametrization that the `RandomVariable` is defined in terms of.
@@ -130,7 +130,6 @@ Here is how the example continues:
130130
```python
131131

132132
import pytensor.tensor as pt
133-
from pymc.pytensorf import floatX, intX
134133
from pymc.distributions.continuous import PositiveContinuous
135134
from pymc.distributions.dist_math import check_parameters
136135
from pymc.distributions.shape_utils import rv_size_is_none
@@ -146,12 +145,12 @@ class Blah(PositiveContinuous):
146145
# We pass the standard parametrizations to super().dist
147146
@classmethod
148147
def dist(cls, param1, param2=None, alt_param2=None, **kwargs):
149-
param1 = pt.as_tensor_variable(intX(param1))
148+
param1 = pt.as_tensor_variable(param1)
150149
if param2 is not None and alt_param2 is not None:
151150
raise ValueError("Only one of param2 and alt_param2 is allowed.")
152151
if alt_param2 is not None:
153152
param2 = 1 / alt_param2
154-
param2 = pt.as_tensor_variable(floatX(param2))
153+
param2 = pt.as_tensor_variable(param2)
155154

156155
# The first value-only argument should be a list of the parameters that
157156
# the rv_op needs in order to be instantiated
@@ -183,7 +182,7 @@ class Blah(PositiveContinuous):
183182
# Whenever a bound is invalidated, the returned expression raises an error
184183
# with the message defined in the optional `msg` keyword argument.
185184
return check_parameters(
186-
logp_expression,
185+
bounded_logp_expression,
187186
param2 >= 0,
188187
msg="param2 >= 0",
189188
)
@@ -193,15 +192,18 @@ class Blah(PositiveContinuous):
193192
def logcdf(value, param1, param2):
194193
...
195194

195+
def icdf(value, param1, param2):
196+
...
197+
196198
```
197199

198200
Some notes:
199201

200202
1. A distribution should at the very least inherit from {class}`~pymc.Discrete` or {class}`~pymc.Continuous`. For the latter, more specific subclasses exist: `PositiveContinuous`, `UnitContinuous`, `BoundedContinuous`, `CircularContinuous`, `SimplexContinuous`, which specify default transformations for the variables. If you need to specify a one-time custom transform you can also create a `_default_transform` dispatch function as is done for the {class}`~pymc.distributions.multivariate.LKJCholeskyCov`.
201203
1. If a distribution does not have a corresponding `rng_fn` implementation, a `RandomVariable` should still be created to raise a `NotImplementedError`. This is, for example, the case in {class}`~pymc.distributions.continuous.Flat`. In this case it will be necessary to provide a `moment` method, because without a `rng_fn`, PyMC can't fall back to a random draw to use as an initial point for MCMC.
202-
1. As mentioned above, PyMC `>=4.0.0` works in a very {term}`functional <Functional Programming>` way, and all the information that is needed in the `logp` and `logcdf` methods is expected to be "carried" via the `RandomVariable` inputs. You may pass numerical arguments that are not strictly needed for the `rng_fn` method but are used in the `logp` and `logcdf` methods. Just keep in mind whether this affects the correct shape inference behavior of the `RandomVariable`. If specialized non-numeric information is needed you might need to define your custom`_logp` and `_logcdf` {term}`Dispatching` functions, but this should be done as a last resort.
203-
1. The `logcdf` method is not a requirement, but it's a nice plus!
204-
1. Currently, only one moment is supported in the `moment` method, and probably the "higher-order" one is the most useful (that is `mean` > `median` > `mode`)... You might need to truncate the moment if you are dealing with a discrete distribution.
204+
1. As mentioned above, PyMC works in a very {term}`functional <Functional Programming>` way, and all the information that is needed in the `logp`, `logcdf`, `icdf` and `moment` methods is expected to be "carried" via the `RandomVariable` inputs. You may pass numerical arguments that are not strictly needed for the `rng_fn` method but are used in the those methods. Just keep in mind whether this affects the correct shape inference behavior of the `RandomVariable`.
205+
1. The `logcdf`, and `icdf` methods is not a requirement, but it's a nice plus!
206+
1. Currently, only one moment is supported in the `moment` method, and probably the "higher-order" one is the most useful (that is `mean` > `median` > `mode`)... You might need to truncate the moment if you are dealing with a discrete distribution. `moment` should return a valid point for the random variable (i.e., it always has non-zero probability when evaluated at that point)
205207
1. When creating the `moment` method, be careful with `size != None` and broadcast properly also based on parameters that are not necessarily used to calculate the moment. For example, the `sigma` in `pm.Normal.dist(mu=0, sigma=np.arange(1, 6))` is irrelevant for the moment, but may nevertheless inform about the shape. In this case, the `moment` should return `[mu, mu, mu, mu, mu]`.
206208

207209
For a quick check that things are working you can try the following:
@@ -215,7 +217,7 @@ from pymc.distributions.distribution import moment
215217
blah = pm.blah.dist(mu=0, sigma=1)
216218

217219
# Test that the returned blah_op is still working fine
218-
blah.eval()
220+
pm.draw(blah, random_seed=1)
219221
# array(-1.01397228)
220222

221223
# Test the moment method
@@ -306,10 +308,10 @@ Finally, when your `rng_fn` is doing something more than just calling a NumPy or
306308
You can find an example in {class}`~tests.distributions.test_continuous.TestWeibull`, whose `rng_fn` returns `beta * np.random.weibull(alpha, size=size)`.
307309

308310

309-
## 4. Adding tests for the `logp` / `logcdf` methods
311+
## 4. Adding tests for the `logp` / `logcdf` / `icdf` methods
310312

311-
Tests for the `logp` and `logcdf` mostly make use of the helpers `check_logp`, `check_logcdf`, and
312-
`check_selfconsistency_discrete_logcdf` implemented in `~tests.distributions.util`
313+
Tests for the `logp`, `logcdf` and `icdf` mostly make use of the helpers `check_logp`, `check_logcdf`, `check_icdf` and
314+
`check_selfconsistency_discrete_logcdf` implemented in `~testing`
313315

314316
```python
315317

0 commit comments

Comments
 (0)