-
Notifications
You must be signed in to change notification settings - Fork 129
A more low level implementation of vectorize in numba #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
04580b8
to
089db94
Compare
089db94
to
f66814d
Compare
layout = aryty.layout | ||
return (data, shape, strides, layout) | ||
|
||
# TODO I think this is better than the noalias attribute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can ask for a release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would need to be merged first :-)
numba/llvmlite#895
Also, I don't think this is a major issue. The current noalias attribute for the output arrays should cover most (if not all) cases as well.
2ce8ae9
to
8e52ff7
Compare
I think I finally got there, tests might just pass this time :-) |
I'll try to review soon. In the meantime do you want to add 1 or 2 benchmark tests? #139 |
@aseyboldt Any more timing results? |
8e52ff7
to
2fd7a75
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #92 +/- ##
==========================================
- Coverage 79.98% 79.95% -0.03%
==========================================
Files 169 170 +1
Lines 44607 44848 +241
Branches 9426 9493 +67
==========================================
+ Hits 35678 35859 +181
- Misses 6738 6778 +40
- Partials 2191 2211 +20
|
@twiecki The above benchmark hasn't really changed. @ricardoV94 added the code from the description as benchmark. |
c912f3f
to
683479c
Compare
@@ -0,0 +1,240 @@ | |||
from typing import Any, List, Optional, Tuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can guess the answer to this question... but do we need to go this level? Can't we add some shape asserts in a vanilla numba function as a way to get the speedup without us having to do all this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some shape asserts won't help I'm afraid. :-)
I think it is pretty clear that we want support for reductions during elemwise, and we also want to support multiple outputs. This means that the build-in numba vectorize won't do it.
So as far as I can see we have two options to get those:
- Build numba functions using string processing
- Use the llvm interface
I think working with strings is actually harder and more error-prone than working with the llvm code directly. (have a look at the Reduction functions in this module. I think that's even more complicated). Also, if we think about what happens if we do this, this would look like:
- We start with a properly typed graph
- We (conditionally) generate strings of numba code, that have no type info
- The numba code get's evaluated and transformed to bytecode
- The bytecode in analysed by numba
- Numba tries to figure out types for all the variables (those types that we discarded earlier, hopefully)
- Numba uses code just like in the PR to build an llvm module, based on the bytecode
I don't see why we would want those in-between steps to happen? pytensor
is a compiler, I don't think we can reasonably expect to get around doing what compilers do: generate code. ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The long term problem I see here is maintenance. This is the backbone of the library and I don't know how much time it would take me (or another dev) to sort out a bug in the future. I don't know how volatile these functions are or how well documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the move from "we build a stats library" to "let's build a compiler" is a bit sudden :-)
@@ -117,6 +117,25 @@ def test_Elemwise(inputs, input_vals, output_fn, exc): | |||
compare_numba_and_py(out_fg, input_vals) | |||
|
|||
|
|||
def test_elemwise_speed(benchmark): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a case with broadcasting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That one is using broadcasting, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I guess I wanted non-dimshuffled broadcasting because that's the one we've been thinking about, but should be the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elemwise will always call dimshuffle in make_node anyway, so there shouldn't be a non-dimshuffled case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By non-dimshuffled I mean matrix + row/col
LGTM, not happy with having to write low-level code that seems as complex if not more than our C implementation. Left some small questions/comments above |
Yeah, I'd also feel more comfortable if we proved the benefit a bit more of adding this much complexity. |
The docs failure looks like it is because of the 6.0 release, which dropped py37 support: https://www.sphinx-doc.org/en/master/changes.html#id9
Does that post above convince you already? |
So it's not just faster but actually allowing us to do things we couldn't do, i.e. reductions during elemwise. In that case I'm convinced. |
@aseyboldt a rebase should fix the docs build |
Co-authored-by: Ricardo Vieira <[email protected]>
c62ed02
to
462b957
Compare
As discussed in #70, this rewrites the numba implementation using llvm intrinsics.
This way we can use broadcasting information, supply llvm with required aliasing information for vectoriziation, support multiple outputs of elemwise and prepare for ElemwiseSum-like optimizations. (the llvm code supports that already).
Locally, there are still some test failures due to execution in python mode that I will have to work out.
This can lead to sizable performance improvements (with svml installed):