Description
Description
With PyMC being the main user of ArviZ, I would like to coordinate regarding the ongoing refactor on the arviz side as it has a lot of breaking changes.
General idea
Split ArviZ into multiple smaller subpackages, so it isn't such a huge monolithical block but a more modular thing. Each of these smaller libraries: arviz-base
, arviz-stats
and arviz-plots
has as dependencies only the minimal set strictily needed, anything that extends functionality or that does things that can happen via different alternatives (like plotting backend or idata io engine) is an optional dependency.
We still plan to have an arviz
package which would install all 3 of them (unclear if along with some "default" optional dependencies to have a feel closer to what it is now) which exposes the functions from all 3 libraries through a common namespace. But for people running a model on a cloud for example, it is might best to install pymc and arviz-base only, save the output as zarr or netcdf and download it. Then locally or on a smaller machine run convergence checks and analyze the results.
Module/library highlight of breaking changes
arviz-base
Uses DataTree
instead of InferenceData
. This will probably be the main pain point but also a source of nice new features.
New features, more io backends and support for nested hierarchies. Potential pain points idata[group]
will be a DataTree
instead of a Dataset
even if there are no nested groups. DataTree
is new so it will probably have some rough edges for a bit, plus the custom methods like .map
or .extend
won't exist anymore (there are things like merge, map_over_subtree...).
A bit more flexible in general, especially when it comes to groups, no warnings anymore for "unrecognized" ones things like that.
Small ask for help. DataTree supports nested groups, but I don't have an example of this nor I am sure how should nested groups behave.
arviz-stats
Very unclear as of now, it is the last module to be worked on. For now it mostly has what we need for arviz-plots
to work.
arviz-plots
The main focus on this end has actually been easing development and maintenance, but thanks to the refactor it is more flexible when it comes to facetting/aesthetics mappings as well as more homogeneous plotting backend support (instead of nice matplotlib and barely working bokeh stuff) having now support for matplotlib, bokeh and plotly.
Several plots have been renamed such as plot_posterior
-> plot_dist
, plot_trace
-> plot_trace_dist
(plot_trace continues to exist but plots only the traces now). And all plots return a new class defined in arviz-plots
called PlotCollection
which contains the figure, axes and artist objects in matplotlib lingo.
This is the more advanced out of the 3 libraries in my opinion and it is ready to use, so it would be great to get people to test it out. My recommendation is install arviz-plots
from github along with pymc+arviz, then you can pass arviz.InferenceData
to arviz-plots
functions. Useful docs: example gallery of updated plots (showing all 3 backends) and main intro notebook
Regarding PyMC itself. What would you like PyMC to depend on? And how would you like PyMC to behave?
For me, continuing to depend on arviz
(provided it only installs the 3 arviz-xyz, numpy, scipy and xarray) would probably be best so functionality continues to be the same, convergence checks continue to be run by default and stats and plots can continue to be exposed if desired (even if plotting won't work unless at least one of the plotting backends is installed).
And how would you coordinate updates in pymc to account for the breaking changes that will happen? Keep in mind the still unclear timeline on arviz-xyz so I don't think it is nothing urgent and there is a lot of room to do things however we want on this end.