-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
documenting get_moment #5244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
documenting get_moment #5244
Changes from 2 commits
561465e
fadad67
37b5024
ba6df63
96dd6b4
db1336e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,15 +4,15 @@ This guide provides an overview on how to implement a distribution for version 4 | |
It is designed for developers who wish to add a new distribution to the library. | ||
Users will not be aware of all this complexity and should instead make use of helper methods such as (TODO). | ||
|
||
PyMC {class}`~pymc.distributions.Distribution` build on top of Aesara's {class}`~aesara.tensor.random.op.RandomVariable`, and implement `logp` and `logcdf` methods as well as other initialization and validation helpers, most notably `shape/dims`, alternative parametrizations, and default `transforms`. | ||
PyMC {class}`~pymc.distributions.Distribution` build on top of Aesara's {class}`~aesara.tensor.random.op.RandomVariable`, and implement `logp`, `logcdf` and `get_moment` methods as well as other initialization and validation helpers, most notably `shape/dims`, alternative parametrizations, and default `transforms`. | ||
|
||
Here is a summary check-list of the steps needed to implement a new distribution. | ||
Each section will be expanded below: | ||
|
||
1. Creating a new `RandomVariable` `Op` | ||
1. Implementing the corresponding `Distribution` class | ||
1. Adding tests for the new `RandomVariable` | ||
1. Adding tests for the `logp` / `logcdf` methods | ||
1. Adding tests for `logp` / `logcdf` and `get_moment` methods | ||
1. Documenting the new `Distribution`. | ||
|
||
This guide does not attempt to explain the rationale behind the `Distributions` current implementation, and details are provided only insofar as they help to implement new "standard" distributions. | ||
|
@@ -118,7 +118,7 @@ After implementing the new `RandomVariable` `Op`, it's time to make use of it in | |
PyMC 4.x works in a very {term}`functional <Functional Programming>` way, and the `distribution` classes are there mostly to facilitate porting the `PyMC3` v3.x code to the new `PyMC` v4.x version, add PyMC API features and keep related methods organized together. | ||
In practice, they take care of: | ||
|
||
1. Linking ({term}`Dispatching`) a rv_op class with the corresponding logp and logcdf methods. | ||
1. Linking ({term}`Dispatching`) a rv_op class with the corresponding `get_moment`, `logp` and `logcdf` methods. | ||
1. Defining a standard transformation (for continuous distributions) that converts a bounded variable domain (e.g., positive line) to an unbounded domain (i.e., the real line), which many samplers prefer. | ||
1. Validating the parametrization of a distribution and converting non-symbolic inputs (i.e., numeric literals or numpy arrays) to symbolic variables. | ||
1. Converting multiple alternative parametrizations to the standard parametrization that the `RandomVariable` is defined in terms of. | ||
|
@@ -153,6 +153,14 @@ class Blah(PositiveContinuous): | |
# the rv_op needs in order to be instantiated | ||
return super().dist([param1, param2], **kwargs) | ||
|
||
# get_moment returns the stable moment from which to start sampling for the | ||
# distribution given the implicit `rv`, `size` and `param1` ... `paramN` | ||
amyoshino marked this conversation as resolved.
Show resolved
Hide resolved
|
||
def get_moment(rv, size, param1, param2): | ||
moment, _ = at.broadcast_arrays(param1, param2) | ||
if not rv_size_is_none(size): | ||
moment = at.full(size, moment) | ||
return moment | ||
|
||
# Logp returns a symbolic expression for the logp evaluation of the variable | ||
# given the `value` of the variable and the parameters `param1` ... `paramN` | ||
def logp(value, param1, param2): | ||
|
@@ -188,12 +196,15 @@ Some notes: | |
overriding `__new__`. | ||
1. As mentioned above, `PyMC` v4.x works in a very {term}`functional <Functional Programming>` way, and all the information that is needed in the `logp` and `logcdf` methods is expected to be "carried" via the `RandomVariable` inputs. You may pass numerical arguments that are not strictly needed for the `rng_fn` method but are used in the `logp` and `logcdf` methods. Just keep in mind whether this affects the correct shape inference behavior of the `RandomVariable`. If specialized non-numeric information is needed you might need to define your custom`_logp` and `_logcdf` {term}`Dispatching` functions, but this should be done as a last resort. | ||
1. The `logcdf` method is not a requirement, but it's a nice plus! | ||
1. Currently only one moment is supported in the `get_moment` method, and probably the "higher-order" one is the most useful (that is `mean` > `median` > `mode`)... You might need to truncate the moment if you are dealing with a discrete distribution. | ||
1. When creating the `get_moment` method, we have to be careful with `size != None` and broadcast properly when some parameters that are not used in the moment may nevertheless inform about the shape of the distribution. E.g. `pm.Normal.dist(mu=0, sigma=np.arange(1, 6))` returns a moment of `[mu, mu, mu, mu, mu]`. | ||
|
||
For a quick check that things are working you can try the following: | ||
|
||
```python | ||
|
||
import pymc as pm | ||
from pm.distributions.distribution import get_moment | ||
amyoshino marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# pm.blah = pm.Uniform in this example | ||
blah = pm.Blah.dist([0, 0], [1, 2]) | ||
|
@@ -202,6 +213,10 @@ blah = pm.Blah.dist([0, 0], [1, 2]) | |
blah.eval() | ||
# array([0.62778803, 1.95165513]) | ||
|
||
# Test the get_moment | ||
get_moment(blah).eval() | ||
# array([0.5, 1. ]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These would be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is what I'm getting from I did some tests and I think this is the right output. Considering that the first argument is location and second is scale (or lower and upper), then first interval is import pymc as pm
from pymc.distributions.distribution import get_moment
from scipy.stats import uniform
# PyMC implementation
blah = pm.Uniform.dist([0, 0], [1, 2])
get_moment(blah).eval()
>> array([0.5, 1. ])
# spicy implementation
uniform.mean([0, 0], [1, 2])
>> array([0.5, 1. ]) I see that there is an error in Thank you, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Your comments & code here are correct. I think Ricardo referred to the example implementation above. The test case further down also assumes that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the clarification! Yes, I agree it makes sense to keep it uniform by changing this example from |
||
|
||
# Test the logp | ||
pm.logp(blah, [1.5, 1.5]).eval() | ||
# array([ -inf, -0.69314718]) | ||
|
@@ -350,8 +365,40 @@ def test_blah_logcdf(self): | |
|
||
``` | ||
|
||
## 5. Adding tests for the `get_moment` method | ||
|
||
Tests for the `get_moment` method are contained in `pymc/tests/test_distributions_moments.py`, and make use of the function `assert_moment_is_expected` | ||
which checks if: | ||
1. Moments return `expected` values | ||
1. Moments have the expected size and shape | ||
|
||
```python | ||
|
||
import pytest | ||
from pymc.distributions import Blah | ||
|
||
@pytest.mark.parametrize( | ||
"param1, param2, size, expected", | ||
[ | ||
(0, 1, None, 0), | ||
(0, np.ones(5), None, np.zeros(5)), | ||
(np.arange(5), 1, None, np.arange(5)), | ||
(np.arange(5), np.arange(1, 6), (2, 5), np.full((2, 5), np.arange(5))), | ||
], | ||
) | ||
def test_blah_moment(param1, param2, size, expected): | ||
with Model() as model: | ||
Blah("x", param1=param1, param2=param2, size=size) | ||
assert_moment_is_expected(model, expected) | ||
|
||
``` | ||
|
||
Here are some details worth keeping in mind: | ||
|
||
1. In the case where you have to manually broadcast the parameters with each other it's important to add test conditions that would fail if you were not to do that. A straightforward way to do this is to make the used parameter a scalar, the unused one(s) a vector (one at a time) and size `None`. | ||
1. In other words, make sure to test different combinations of size and broadcasting to cover these cases. | ||
|
||
## 5. Documenting the new `Distribution` | ||
## 6. Documenting the new `Distribution` | ||
|
||
New distributions should have a rich docstring, following the same format as that of previously implemented distributions. | ||
It generally looks something like this: | ||
|
Uh oh!
There was an error while loading. Please reload this page.