add ICARRV and ICAR #6831

daniel-saunders-phil · 2023-07-17T22:37:24Z

This PR adds support for intrinsic conditional autoregressive (ICAR) models to PyMC. The ICAR is a special case of the CAR distribution that arises when the $\alpha$ parameter equals $1$ and we add a zero-sum constraint. There are two big advantages of ICAR over CAR.

First, computing the logp likelihood is much faster, especially on large datasets. The CAR distribution involves either inverting a large precision matrix or a variety of eigenvalue strategies. As the dataset gets larger, those operations become quite expensive, even with the sparsity tricks PyMC currently employs. In some benchmarking I was doing I found it takes nearly 60 seconds to find the logp of a dataset with 6000 observations. ICAR exploits a pairwise difference calculation which grows basically linearly with the size of the dataset.

Second, the zero-sum constraint can also help improve the identifiability of autoregressive models. Suppose you're working on a spatial statistics problem with the 50 US states. You might want a global intercept in your model plus 50 spatial random effects. With CAR, this leaves you overparameterized. If the sampler increases the mean value of the 50 spatial effects, then it can decrease the value of the global intercept to get the same overall intercept. The vector of random effects in an ICAR distribution is forced to sum to zero, which allows the sampler to independently adjust the global intercept.

Finally, ICAR often serves as a sub-component of larger models, like the BYM model. Providing a convenient API can make easy to build up to the more complex models.

More information on the design of this ICAR distribution can be found in a stan case study

codecov · 2023-07-17T22:53:15Z

Codecov Report

Merging #6831 (2f8072c) into main (d59a960) will decrease coverage by 1.40%.
The diff coverage is 100.00%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6831      +/-   ##
==========================================
- Coverage   92.05%   90.65%   -1.40%     
==========================================
  Files          96       96              
  Lines       16369    16410      +41     
==========================================
- Hits        15068    14877     -191     
- Misses       1301     1533     +232

Files Changed	Coverage Δ
pymc/distributions/__init__.py	`100.00% <ø> (ø)`
pymc/distributions/multivariate.py	`92.55% <100.00%> (+0.33%)`	⬆️

... and 13 files with indirect coverage changes

bwengals · 2023-07-18T22:07:42Z

RE checking if W contains ints. If W doesn't contain ints and the code runs but gives wrong results it'd def be good to check. If the code errors and says something informative then there's no need to check I think

bwengals · 2023-07-18T22:08:56Z

pymc/distributions/multivariate.py

+    _print_name = ("ICAR", "\\operatorname{ICAR}")
+
+    def __call__(self, W, node1, node2, N, sigma, centering_strength, size=None, **kwargs):
+        return super().__call__(W, node1, node2, N, sigma, centering_strength, size=size, **kwargs)


I think it'd be better to use a different word than "centering", since that's usually more about centered vs non-centered parameterizations. Maybe "mean_zero_strength" or something similar instead?

daniel-saunders-phil · 2023-07-20T20:27:15Z

RE checking if W contains ints. If W doesn't contain ints and the code runs but gives wrong results it'd def be good to check. If the code errors and says something informative then there's no need to check I think

Okay that sounds good. Looks like the code fails silently when they pass an adjacency matrix with a non-zero, non-one in it. People might be inclined to do this, some literature in the CAR area discusses using edge weights for more flexibility. But the pairwise difference formula cannot handle edge weights. I'll add a check that requires every entry to be 0 or 1.

bwengals · 2023-07-28T20:01:02Z

pymc/distributions/multivariate.py

+        # check that adjacency matrix is two dimensional,
+        # square,
+        # symmetrical
+        # and composed of 1s or 0s.


this block of comments doesn't add too much to the code doing the checks below, dont think its necessary

pymc/distributions/multivariate.py

bwengals · 2023-07-28T20:02:13Z

pymc/distributions/multivariate.py

+        N = pt.shape(W)[0]
+        N = pt.as_tensor_variable(N)
+
+        # check on sigma


this comment and the comment check on centering strength are out of date now

bwengals · 2023-07-28T20:03:50Z

pymc/distributions/multivariate.py

+        pairwise_difference = (-1 / (2 * sigma**2)) * pt.sum(
+            pt.square(value[node1] - value[node2])
+        )
+        soft_center = (


same as before RE wording, "center" vs zero sum. I think the latter is better and is more consistent with zero_sum_strength

bwengals · 2023-07-28T20:06:07Z

pymc/distributions/multivariate.py

+        f(\\phi| W,\\sigma) =
+          -\frac{1}{2\\sigma^{2}} \\sum_{i\\sim j} (\\phi_{i} - \\phi_{j})^2 -
+          \frac{1}{2}*\frac{\\sum_{i}{\\phi_{i}}}{0.001N}^{2} - \\ln{\\sqrt{2\\pi}} -
+          \\ln{0.001N}


why are some backslashes twice and others once?

Not sure, I wrote them with single slashes. Maybe one of the pre-commit checks added them by mistake?

bwengals · 2023-07-28T20:07:13Z

pymc/distributions/multivariate.py

+class ICAR(Continuous):
+    r"""
+    The intrinsic conditional autoregressive prior. It is primarily used to model
+    covariance between neighboring areas on large datasets. It is a special case


on large datasets

Or small datasets!

bwengals · 2023-07-28T20:08:35Z

pymc/distributions/multivariate.py

+    based on the square distance from each of its neighbors. The notation $i\\sim j$
+    indicates a sum over all the neighbors of $\\phi_{i}$. The last three terms are the
+    Normal log density function where the mean is zero and the standard deviation is
+    $N * 0.001$ (where N is the length of the vector $\\phi$). This component imposed the zero-sum


Suggested change

$N * 0.001$ (where N is the length of the vector $\\phi$). This component imposed the zero-sum

$N * 0.001$ (where N is the length of the vector $\\phi$). This component imposes a zero-sum

bwengals · 2023-07-28T20:09:15Z

pymc/distributions/multivariate.py

+    ----------
+    W : ndarray of int
+        Symmetric adjacency matrix of 1s and 0s indicating adjacency between elements.
+        Must pass either W or both node1 and node2.


Suggested change

Must pass either W or both node1 and node2.

bwengals · 2023-07-28T20:16:32Z

pymc/distributions/multivariate.py

+        W = np.array([[0,1,0,1],
+                      [1,0,1,0],
+                      [0,1,0,1],
+                      [1,0,1,0]])
+
+        # centered parameterization
+
+        with pm.Model():
+            sigma = pm.Exponential('sigma',1)
+            phi = pm.ICAR('phi',W=W,sigma=sigma)
+
+            mu = phi
+
+        # non-centered parameterization
+
+        with pm.Model():
+            sigma = pm.Exponential('sigma',1)
+            phi = pm.ICAR('phi',W=W)


Suggested change

W = np.array([[0,1,0,1],

[1,0,1,0],

[0,1,0,1],

[1,0,1,0]])

# centered parameterization

with pm.Model():

sigma = pm.Exponential('sigma',1)

phi = pm.ICAR('phi',W=W,sigma=sigma)

mu = phi

# non-centered parameterization

with pm.Model():

sigma = pm.Exponential('sigma',1)

phi = pm.ICAR('phi',W=W)

W = np.array([

[0, 1, 0, 1],

[1, 0, 1, 0],

[0, 1, 0, 1],

[1, 0, 1, 0]

])

# centered parameterization

with pm.Model():

sigma = pm.Exponential('sigma', 1)

phi = pm.ICAR('phi', W=W, sigma=sigma)

mu = phi

# non-centered parameterization

with pm.Model():

sigma = pm.Exponential('sigma', 1)

phi = pm.ICAR('phi', W=W)

mu = sigma * phi

adjusted some formatting here, mostly putting a space after comma

bwengals · 2023-07-28T20:20:47Z

pymc/distributions/multivariate.py

+        Controls how strongly to enforce the zero-sum constraint. It sets the
+        standard deviation of a normal density function with mean zero.


Could add a bit of detail here, like, "It puts an additional normal prior on the sum of the phi, such that the sum is normally distributed with mean zero and a small standard deviation, whose value is zero_sum_strength. Maybe zero_sum_stdev is a clearer name then?

bwengals

Looks good! mostly cosmetic stuff

daniel-saunders-phil · 2023-07-31T21:12:57Z

Hi @ricardoV94 , I've been working on adding a shape inference function since the change in #388 . I thought supp_shape_from_ref_param_shape would be my best bet. In the ICAR, the output shape is always either the length or width of W, the adjacency matrix. I thought it was similar to the multivariate normal case where the length or width of the covariance matrix gives the output shape.

Continuous integration failed though because it couldn't import supp_shape_from_ref_param_shape. I also couldn't get the test to run locally for the same reason.

Is this related to releases? I assumed the continuous integration would run the latest pytensor but now I'm not so sure. Otherwise, do you have suggestions for how I should handle shape inference?

bwengals · 2023-08-13T23:00:18Z

Looks great!

fonnesbeck · 2023-08-18T16:31:18Z

Thanks, @daniel-saunders-phil !

daniel-saunders-phil added 2 commits July 14, 2023 14:06

add ICARRV and ICAR

1dfa059

reorganized API so W is required, added tests on logp,rng,checks

10fed3b

bwengals reviewed Jul 18, 2023

View reviewed changes

daniel-saunders-phil added 6 commits July 24, 2023 11:22

adjusted tests and api

0a6e963

add ICAR to lists in distributions/__init__.py

d496689

fix check_pymc_params_match_rv_op test

8114a7d

deleted unnecessary pytest.mark.parametize

bf8be00

fix check for square matrix + check_pymc_match_rv_op

299ea68

fix moment test

6f0579a

daniel-saunders-phil marked this pull request as ready for review July 25, 2023 23:38

removed asserts, added tests on sizes

e6a7acd

bwengals reviewed Jul 28, 2023

View reviewed changes

pymc/distributions/multivariate.py Show resolved Hide resolved

bwengals reviewed Jul 28, 2023

View reviewed changes

bwengals requested changes Jul 28, 2023

View reviewed changes

daniel-saunders-phil added 2 commits July 31, 2023 10:20

Implementing Bill's comments

6b46604

add _supp_shape_from_params()

e66e1d0

Merge branch 'main' into icar

2f8072c

bwengals approved these changes Aug 13, 2023

View reviewed changes

fonnesbeck merged commit 9ee5d84 into pymc-devs:main Aug 18, 2023

daniel-saunders-phil mentioned this pull request Aug 19, 2023

Add ICAR to the list of distributions in the docs #6867

Merged

ricardoV94 added the enhancements label Sep 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add ICARRV and ICAR #6831

add ICARRV and ICAR #6831

daniel-saunders-phil commented Jul 17, 2023 •

edited

Loading

codecov bot commented Jul 17, 2023 •

edited

Loading

bwengals commented Jul 18, 2023

bwengals Jul 18, 2023

daniel-saunders-phil commented Jul 20, 2023

bwengals Jul 28, 2023

bwengals Jul 28, 2023 •

edited

Loading

bwengals Jul 28, 2023

bwengals Jul 28, 2023

daniel-saunders-phil Jul 31, 2023

bwengals Jul 28, 2023

bwengals Jul 28, 2023

bwengals Jul 28, 2023

bwengals Jul 28, 2023

bwengals Jul 28, 2023

bwengals left a comment

daniel-saunders-phil commented Jul 31, 2023 •

edited

Loading

bwengals commented Aug 13, 2023

fonnesbeck commented Aug 18, 2023

	$N * 0.001$ (where N is the length of the vector $\\phi$). This component imposed the zero-sum
	$N * 0.001$ (where N is the length of the vector $\\phi$). This component imposes a zero-sum

		Controls how strongly to enforce the zero-sum constraint. It sets the
		standard deviation of a normal density function with mean zero.

add ICARRV and ICAR #6831

add ICARRV and ICAR #6831

Conversation

daniel-saunders-phil commented Jul 17, 2023 • edited Loading

codecov bot commented Jul 17, 2023 • edited Loading

Codecov Report

bwengals commented Jul 18, 2023

Choose a reason for hiding this comment

daniel-saunders-phil commented Jul 20, 2023

Choose a reason for hiding this comment

bwengals Jul 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bwengals left a comment

Choose a reason for hiding this comment

daniel-saunders-phil commented Jul 31, 2023 • edited Loading

bwengals commented Aug 13, 2023

fonnesbeck commented Aug 18, 2023

daniel-saunders-phil commented Jul 17, 2023 •

edited

Loading

codecov bot commented Jul 17, 2023 •

edited

Loading

bwengals Jul 28, 2023 •

edited

Loading

daniel-saunders-phil commented Jul 31, 2023 •

edited

Loading