Description
Notebook proposal
Title: GLM-missing-numeric-values
New example for auto-imputation aka handle missing values with a simple dataset and full workflow.
I propose this is a present-day solution, which should hopefully be generally applicable, and if/when we get newer
functionality, let's extend this notebook with a new Section 2 to demonstrate and compare that. Related further
discussion in pymc-devs/pymc#6626, and pymc-devs/pymc#7204
Why should this notebook be added to pymc-examples?
Our problem statement is that when faced with data with missing values, we want to:
- Infer the missing values for the in-sample dataset and sample full posterior parameters
- Predict the endogenous feature and the missing values for an out-of-sample dataset
This notebook takes the opportunity to:
- Demonstrate a general method using a
numpy.masked_array
, often mentioned inpymc
folklore but rarely demonstrated - Demonstrate a reasonably complete Bayesian workflow {cite:p}
gelman2020bayesian
including data creation
This notebook is a partner to another pymc-examples
notebook Missing_Data_Imputation.ipynb
which goes into more detail of taxonomies and a much more complicated dataset and tutorial-style worked example.
Suggested categories:
- Level:
Intermediate
- Diataxis type:
Reference
Related notebooks
This notebook is a partner to another pymc-examples
notebook Missing_Data_Imputation.ipynb
which goes into more detail of taxonomies and a much more complicated dataset and tutorial-style worked example.
References
Related further discussion in pymc-devs/pymc#6626, and pymc-devs/pymc#7204
Also already in references.bib
@book{enders2022,
title = {Applied Missing Data Analysis},
author = {Enders K, Craig},
year = {2022},
publisher = {The Guilford Press}
}
@Article{gelman2020bayesian,
title = {Bayesian workflow},
author = {Gelman, Andrew and Vehtari, Aki and Simpson, Daniel and Margossian, Charles C and Carpenter, Bob and Yao, Yuling and Kennedy, Lauren and Gabry, Jonah and B{"u}rkner, Paul-Christian and Modr{'a}k, Martin},
journal = {arXiv preprint arXiv:2011.01808},
year = {2020},
url = {https://arxiv.org/abs/2011.01808}
}