Skip to content

Update docstrings of pm.set_data and model.Data #6087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 7, 2022

Conversation

bwengals
Copy link
Contributor

@bwengals bwengals commented Aug 31, 2022

Adds docstrings describing how to track changing shape of MutableData in the likelihood. Meant to address issue #5987. The recommended solution is then to set the shape=x.shape, like this:

import pymc as pm                                                                       
with pm.Model() as model:                                                               
    x = pm.MutableData('x', [1., 2., 3.])                                               
    y = pm.MutableData('y', [1., 2., 3.])                                               
    beta = pm.Normal('beta', 0, 1)                                                      
    obs = pm.Normal('obs', x * beta, 1, observed=y, shape=x.shape) # <--- here                      
    idata = pm.sample(1000, tune=1000)   

with model:                                                                             
    pm.set_data({'x': [5., 6., 9., 12., 15.]})                                               
    y_test = pm.sample_posterior_predictive(idata)                                      
y_test.posterior_predictive['obs'].mean(('chain', 'draw'))

Major / Breaking Changes

None

Bugfixes / New features

None

Docs / Maintenance

Add recommendation to docstrings relevant to prediction and posterior predictive sampling.

pymc/model.py Outdated
@@ -1849,6 +1849,13 @@ def point_logps(self, point=None, round_vals=2):
def set_data(new_data, model=None, *, coords=None):
"""Sets the value of one or more data container variables.

When doing prediction or posterior predictive sampling, make sure that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the difference between prediction and posterior predictive sampling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing.. wanted to catch the eye of someone skimming with either word. Maybe better phrasing here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just drop the first prediction mention.

"make sure" is also a bit strong, as it's not required, it's more of a convenience. And maybe also say at the end that the benefit of this is that you only have to set_data() x, but not both.

Copy link
Member

@ricardoV94 ricardoV94 Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also add an example of resizing y alone first. MutableData is not useful only in posterior predictive sampling with GLMs where this concern for having to resize both variables arises.

It's also useful to fit the same model to different datasets.

Would be strange for me as user seeing this being discussed in the only example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twiecki that makes sense, I'll change it.

@ricardoV94 For an example with resizing y alone first, would y be a pm.MutableData object, and observed? Makes sense RE fitting the same model to different datasets. In the past I've just remade the model object but you could save on compiling time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@codecov
Copy link

codecov bot commented Aug 31, 2022

Codecov Report

Merging #6087 (163e2fa) into main (0b191ad) will increase coverage by 2.59%.
The diff coverage is n/a.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #6087      +/-   ##
==========================================
+ Coverage   89.54%   92.14%   +2.59%     
==========================================
  Files          72       92      +20     
  Lines       12929    21061    +8132     
==========================================
+ Hits        11577    19406    +7829     
- Misses       1352     1655     +303     
Impacted Files Coverage Δ
pymc/data.py 80.08% <ø> (ø)
pymc/model.py 88.23% <ø> (ø)
pymc/variational/inference.py 84.97% <0.00%> (-1.13%) ⬇️
pymc/printing.py 86.08% <0.00%> (-0.58%) ⬇️
pymc/distributions/logprob.py 97.31% <0.00%> (-0.42%) ⬇️
pymc/distributions/shape_utils.py 98.95% <0.00%> (-0.33%) ⬇️
pymc/distributions/mixture.py 95.40% <0.00%> (-0.32%) ⬇️
pymc/distributions/simulator.py 87.23% <0.00%> (-0.27%) ⬇️
pymc/distributions/continuous.py 97.50% <0.00%> (-0.01%) ⬇️
... and 55 more

@junpenglao
Copy link
Member

Is this ready to merge?

pymc/data.py Outdated
@@ -594,6 +594,11 @@ def Data(
To set the value of the data container variable, check out
:func:`pymc.Model.set_data`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:func:`pymc.Model.set_data`.
:meth:`pymc.Model.set_data`.

pymc/data.py Outdated
When making predictions or doing posterior predictive sampling, the shape of the
registered data variable will most likely need to be changed. If you encounter an
Aesara shape mismatch error, refer to the documentation for
:func:`pymc.model.set_data`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:func:`pymc.model.set_data`.
:func:`pymc.set_data`.

The function is imported as pymc.set_data and since 4.0 we document objects in the same place they should be imported.

pymc/model.py Outdated
the random variable acting as the likelihood (`observed` is True) is dynamically
resized when the data is updated. Set the `shape` argument of the random
variable acting as the likelihood to the shape of the relevant
:class:`pymc.data.MutableData` object, as shown in the example below
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:class:`pymc.data.MutableData` object, as shown in the example below
:class:`pymc.MutableData` object, as shown in the example below

@bwengals
Copy link
Contributor Author

Took another crack at it, basically made the docstring more minimal and added an example as per your suggestion @ricardoV94.

Would it be nice to also have the syntax allow pm.set_data(y=y) as well as pm.set_data({'y': y})? a la xarray?

Copy link
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great

@ricardoV94
Copy link
Member

Would it be nice to also have the syntax allow pm.set_data(y=y) as well as pm.set_data({'y': y})? a la xarray?

I guess so ;)

@bwengals
Copy link
Contributor Author

One last question, if you don't mind, I can't seem to figure out why pre-commit isn't happy. Since this PR just changes docstrings, what's it saying about numpy, positional arguments?

@ricardoV94
Copy link
Member

That was fixed today. If you rebase from main it should go away

@michaelosthege michaelosthege merged commit faebc60 into pymc-devs:main Oct 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants