Description
Description
This blog post makes an important point about speed in cases where we want to repeatedly compute A
matrix once, then recycle it for repeatedly solving. The specific factorization used in this case is LU factorization -- this can be verified by looking at the function calls for dgesv -- first dgetrf is called to perform an LU factorization, then dtrsm to actually solve the system. In scipy, these correspond to linalg.lu_factor
and linalg.lu_solve
. Here's a gist showing a very simple speed test, suggesting nice speedups by computing the LU decomposition once then recycling it many times for different
(Question: are other factorization schemes used in other solve cases (symmetric, positive-definite?)
All of this is relevant because of blockwise. This would be very low-hanging fruit for a rewrite in cases where a user has batch dimensions on the b
matrix.
In addition, gradient implementations exist for both in pytorch (lu_factor , lu_solve ) and jax (lu_factor, lu_solve). I haven't found an academic reference that shows where their equations come from, though.