Skip to content

Make scan to be a cython extension #77

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 15, 2022
Merged

Make scan to be a cython extension #77

merged 6 commits into from
Dec 15, 2022

Conversation

ferrine
Copy link
Member

@ferrine ferrine commented Dec 2, 2022

Thank you for opening a PR!
Address #78 to make scan be a Cython extension

  • To install properly with pip I had to use pyproject.toml which is a go-to recommendation. With setup.cfg I struggled to specify build time dependencies

Before

  1. User installs Cython and a compiler at install time
  2. At first import time compiles Scan

After

  1. We build Scan at bdist_wheel time
  2. We create a bdist_wheel matrix on the github actions as done in e.g. scikit-learn/pandas
  3. User installs the compiled code or from sdist (compiles on his own during pip install), explained here

Here are a few important guidelines and requirements to check before your PR can be merged:

  • There is an informative high-level description of the changes.
  • The description and/or commit message(s) references the relevant GitHub issue(s).
  • pre-commit is installed and set up.
  • The commit messages follow these guidelines.
  • The commits correspond to relevant logical changes, and there are no commits that fix changes introduced by other commits in the same branch/BR.
  • There are tests covering the changes introduced in the PR.

Don't worry, your PR doesn't need to be in perfect order to submit it. As development progresses and/or reviewers request changes, you can always rewrite the history of your feature/PR branches.

If your PR is an ongoing effort and you would like to involve us in the process, simply make it a draft PR.

@ferrine ferrine marked this pull request as draft December 2, 2022 07:15
@ricardoV94
Copy link
Member

Will this be doing compilation all the time or is it cached?

@ferrine
Copy link
Member Author

ferrine commented Dec 2, 2022

Will this be doing compilation all the time or is it cached?

It will compile once on installation and it requires one of

  1. install time Cython dependency. We can omit scan_perform.c from git
  2. cythonize scan_perform and declare Extension via setuptools, no Cython Dependency

I like the Cython first approach more, less maintainance + a lot of libraries require Cython anyway.

The Cython install time dependency can be stated in pyproject.toml/setup.cfg so user never takes care of installing Cython before he installs pytensor

@twiecki
Copy link
Member

twiecki commented Dec 2, 2022

With hddm we supported both. If Cython was installed, we would cythonize to c, if not, the shipped c-extensions would be used: https://github.com/hddm-devs/hddm/blob/master/setup.py#L4

@ferrine
Copy link
Member Author

ferrine commented Dec 2, 2022

With hddm we supported both. If Cython was installed, we would cythonize to c, if not, the shipped c-extensions would be used: https://github.com/hddm-devs/hddm/blob/master/setup.py#L4

pyproject.toml allows to specify buld time dependencies so if you pip install without having some packages like numpy or Cython it will install them for you to run setup.py in an isolated mode. After installation they will not appear in the userspace. So you can only have Cython to compile the extension and be always sure it will be available

@codecov-commenter
Copy link

codecov-commenter commented Dec 2, 2022

Codecov Report

Merging #77 (73a4599) into main (5c63ee7) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #77      +/-   ##
==========================================
- Coverage   74.46%   74.45%   -0.01%     
==========================================
  Files         179      179              
  Lines       49282    49235      -47     
  Branches    10432    10426       -6     
==========================================
- Hits        36696    36660      -36     
+ Misses      10282    10275       -7     
+ Partials     2304     2300       -4     
Impacted Files Coverage Δ
pytensor/scan/op.py 85.14% <100.00%> (ø)
pytensor/scan/scan_perform_ext.py 100.00% <100.00%> (+18.75%) ⬆️
pytensor/typed_list/basic.py 88.55% <100.00%> (+0.11%) ⬆️
pytensor/link/c/cmodule.py 51.96% <0.00%> (-0.09%) ⬇️
pytensor/link/jax/dispatch/random.py 100.00% <0.00%> (ø)

@aseyboldt
Copy link
Member

Did you maybe forget to git add pytensor/scan/scan_perform.pyx?
I don't think I get it yet. Why do we want to compile some part of the scan code at package build time? We don't know what the scan ops are supposed to do at that time, right?

@ferrine
Copy link
Member Author

ferrine commented Dec 3, 2022

@aseyboldt the scan Op as it is written in the package accepts storage for outputs and a void function f()
It iteratively calls f(), checks outputs and returns. It does not seem to have any interaction outside its scope

@ferrine
Copy link
Member Author

ferrine commented Dec 3, 2022

Did you maybe forget to git add pytensor/scan/scan_perform.pyx?

It was already there, I think I better delete some generated C code instead and reorganize imports quite a bit

@ricardoV94
Copy link
Member

@ferrine can we get a clean git history for this one? Seems too important to just merge squash everything.

@ferrine ferrine force-pushed the scan-ext-mod branch 2 times, most recently from 08b99d1 to 8629a40 Compare December 5, 2022 13:30
@ferrine
Copy link
Member Author

ferrine commented Dec 5, 2022

@ricardoV94 I've cleaned the history and removed the code on that we are not going to depend:

  • The C generated code using Cython (Cython is installed by setuptools and we have the .pyx original file)
  • The wrapper code over scan_perform.c that checked if the extension is already compiled (a lot of in scan_perform_ext.py)

@ferrine
Copy link
Member Author

ferrine commented Dec 5, 2022

The last major question I think is important is:

"Who compiles scan now given it was compiled by user before?"

Available options

  • Users compile (old behaviour)
  • Users on Linux + python 3.7-3.10(11) do not compile, Windows/MacOS users do compile
  • Users on [Linux, Windows, MacOS(x86), MacOS(arm)] x [py3.7, py3.8, py3.9, py3.10, py3.11] do not compile (whatever we manage to provide)

@aseyboldt
Copy link
Member

@aseyboldt the scan Op as it is written in the package accepts storage for outputs and a void function f()
It iteratively calls f(), checks outputs and returns. It does not seem to have any interaction outside its scope

I still don't get it yet. Previously this was compiled at runtime, right? So the c compiler would see the function in the loop and could optimize the loop with that in mind. But if this is precompiled, it can no longer specialize at all. That doesn't sound like what we want to me, or am I missing something?

@ricardoV94
Copy link
Member

ricardoV94 commented Dec 6, 2022

I still don't get it yet. Previously this was compiled at runtime, right? So the c compiler would see the function in the loop and could optimize the loop with that in mind. But if this is precompiled, it can no longer specialize at all. That doesn't sound like what we want to me, or am I missing something?

My understanding is that the general Scan Cython code is frozen within a release. Only the inner function is compiled at runtime. I'll let @ferrine confirm that though.

@ferrine
Copy link
Member Author

ferrine commented Dec 6, 2022

I still don't get it yet. Previously this was compiled at runtime, right? So the c compiler would see the function in the loop and could optimize the loop with that in mind. But if this is precompiled, it can no longer specialize at all. That doesn't sound like what we want to me, or am I missing something?

My understanding is that the general Scan Cython code is frozen within a release. Only the inner function is compiled at runtime. I'll let @ferrine confirm that though.

Scan is not a COp unlike other C provided functions
https://github.com/pymc-devs/pytensor/blob/main/pytensor/scan/op.py#L640

Scan is a monolyth to call a thunk the inner graph.

It will compile inner function for you

https://github.com/pymc-devs/pytensor/blob/main/pytensor/scan/op.py#L1433

and will use the compiled extension if it has it:
https://github.com/pymc-devs/pytensor/blob/main/pytensor/scan/op.py#L1526

It will pass the inner graph fn to cython fn
https://github.com/pymc-devs/pytensor/blob/main/pytensor/scan/op.py#L1620

@twiecki
Copy link
Member

twiecki commented Dec 6, 2022

What's the benefit from this PR? We save on shipping the C code? Anything else?

@ferrine
Copy link
Member Author

ferrine commented Dec 6, 2022

What's the benefit from this PR? We save on shipping the C code? Anything else?

  1. We remove legacy code to support on demand compilation for scan which was a bit hackish
  2. We provide an example to write Cython extensions for future developers
  3. We provide all the requirements to add a custom Cython extension if any is needed to extend numba
  4. https://github.com/pymc-devs/pytensor/pull/46/files#diff-18626e6b1322e1b5dda375d2401417044a4e25d94041ca894059f1b2db878f14R140 gives the path to custom Cython extensions that interact with numba

I think this PR will make the code base slightly more structured which we would want it to be in the long term. Even if we remove Scan C code in the future there is a setup to support Cython extensions. If we prefer scan over generated numba code, we wrap it, if we prefer to drop Scan it is also fine.

@aseyboldt
Copy link
Member

I'm still somewhat hung up on precompilation before distribution. This complicates distribution, because pytensor is no longer a pure python package then. We have to build the extension module for different architectures and operating systems, and can't easily use more recent machine code instructions. Why would we want to do this in the first place?
I looked through the old source for scan, and that clearly doesn't seem ideal, where something might be compiled at import time, but I don't think moving that into the package creation time is helpful. I would really like to get rid of this, instead of move it to a different place.
I'd really love to get this a place where pytensor is just a pure python package, and we can move all the pain of having to deal with distribution of compiled libraries to numba or other backend packages. This seems like a step away from this, because it introduces distribution / compilation pain that we didn't have before. From the PR:

  # The job to build pypi wheels users can installed precompiled for them.
  # At the moment only linux wheels are build, Windows and MacOS will compile on installation
  # Before prebuilds all users had to compile code anyway.

This is true currently, but we want to get rid of that "users have to compile code anyway".

Building a few Cython extensions is more managable than all the other compiler shenanigans we have to deal with, so if there really is a long-term need for our own extension modules, I think we can manage, but I really don't see the need.

@ferrine
Copy link
Member Author

ferrine commented Dec 10, 2022

@ricardoV94 @aseyboldt Will Scan code be removed at some point?

@ricardoV94
Copy link
Member

@ricardoV94 @aseyboldt Will Scan code be removed at some point?

Only when we remove the C backend. I would imagine it will stay for a minimum of one year more.

@aseyboldt
Copy link
Member

Been thinking it over a bit..

Only when we remove the C backend. I would imagine it will stay for a minimum of one year more.

I think you are right, and if so I think this cython extension is probably a good think after all. I guess I've been a bit too optimistic about how fast we can remove the c backend...

So sorry @ferrine, I think you are right :-)

@ferrine
Copy link
Member Author

ferrine commented Dec 12, 2022

If we are not going to promote C backend that much I think it is easier to have linux only wheels and let other users just compile a thing they had compiled before the change. This will make our ci smaller.

@ferrine
Copy link
Member Author

ferrine commented Dec 12, 2022

Got merge conflicts resolved

@ferrine
Copy link
Member Author

ferrine commented Dec 13, 2022

Can we merge this?

@ricardoV94
Copy link
Member

Your first commit title seems to have a typo? "refresgithub actions"

And the last two commits should go together I think.

Otherwise I'll trust you because I am not familiar with these internals. Pinging @maresb in case he can spot anything.

Copy link
Member

@michaelosthege michaelosthege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you copied the changes to versioneer.py and we don't need to review these?

Comment on lines +143 to +152
"pytensor/assert_op.py",
"pytensor/graph/opt.py",
"pytensor/graph/opt_utils.py",
"pytensor/graph/optdb.py",
"pytensor/graph/kanren.py",
"pytensor/graph/unify.py",
"pytensor/link/jax/jax_linker.py",
"pytensor/link/jax/jax_dispatch.py",
"pytensor/graph/toolbox.py",
"pytensor/scalar/basic_scipy.py",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these get deleted in your other PR, right?

We should remember to update this after rebasing the other PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, one of these PRs goes first

@maresb
Copy link
Contributor

maresb commented Dec 14, 2022

Sorry, I'm just seeing this now. What did you want me to have a look at? Something in the run_mypy.py script? Or did you solve it? (I see that CI is green now.)

@michaelosthege
Copy link
Member

Sorry, I'm just seeing this now. What did you want me to have a look at? Something in the run_mypy.py script? Or did you solve it? (I see that CI is green now.)

If you could check the pyproject.toml and changes in GitHub actions that'd be great

@maresb
Copy link
Contributor

maresb commented Dec 14, 2022

DESCRIPTION.txt is RST, so probably better to rename the file so that it can be parsed properly.

@maresb
Copy link
Contributor

maresb commented Dec 14, 2022

☝️ fixed in #134

✔️ Thanks for the merge!

@ferrine ferrine merged commit e2ae99e into main Dec 15, 2022
@ferrine
Copy link
Member Author

ferrine commented Dec 15, 2022

I did rebase an merge since there were some major blocks of changes

@ferrine ferrine deleted the scan-ext-mod branch December 15, 2022 11:21
@ricardoV94
Copy link
Member

I did rebase an merge since there were some major blocks of changes

That should be the action in most of the PRs ;)

@ferrine ferrine mentioned this pull request Dec 15, 2022
6 tasks
@ricardoV94 ricardoV94 changed the title Make scan to be a cython extention Make scan to be a cython extension May 5, 2023
@ferrine ferrine mentioned this pull request Jun 29, 2023
@twiecki twiecki mentioned this pull request Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants