Description
Should ndarray-linalg have a dedicated algorithm for computing the Moore-Penrose pseudoinverse? Python's numpy has pinv()
. LAPACK does not have a built-in pinv()
, although it has been much discussed. Writing a turnkey pseudoinverse method is challenging because the optimal implementation depends on the use case. Personally, I think having some reasonably-optimal pinv()
in ndarray-linalg would make the crate more user-friendly.
Currently it is possible to compute the pseudoinverse with ndarray-linalg by solving into an identity matrix, which calls LAPACK's dgesvd()
.
use ndarray::{Array, array};
use ndarray_linalg::{LeastSquaresResult, LeastSquaresSvdInto};
let x = array![[1., 7.], [4., 2.], [3., 8.]]; // skinny matrix
let LeastSquaresResult { solution: x_pinv, .. } = x.least_squares_into(Array::<f64, _>::eye(n))?;
println!("The pseudoinverse is:\n{:?}", x_pinv);
I propose a more optimal algorithm implemented here. The most common use case for computing the Moore-Penrose pseudoinverse is performing least-squares regression, where we would like to know the pseudoinverse of the design matrix x
. Most of the time x
will have full column rank, in which case we can compute the pseudoinverse using faster QR decomposition. If the diagonal elements of R are ill-conditioned we can fall back to singular value decomposition (using the newer dgesdd()
instead of dgesvd()
). We discover the matrix rank for free. We can have specialized methods for cases where the caller knows something about the matrix rank in advance. While there is surely room for further optimization, this algorithm already benchmarks 2-4 times faster than the code above.
Are the maintainers of ndarray-linalg interested in a pull request adding extension traits for pinv()
?