Skip to content

use getptr a lot less #618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

use getptr a lot less #618

wants to merge 17 commits into from

Conversation

cjdoris
Copy link
Collaborator

@cjdoris cjdoris commented May 28, 2025

Defines an unsafe_convert rule for Py to C.PyPtr so we can pass Py directly to C functions.

Also defines incref(::Py) so we don't have to operate on pointers.

Generalises our faked C functions in extras.jl to also allow ::Py arguments in place of ::PyPtr.

Fixes #617 - and more generally removes a lot of places where the Julia GC could have freed a Py while we still have a pointer to the Python object, causing a read-after-free error.

@MilesCranmer
Copy link
Contributor

Fixes some local segfaults I was getting. Nice!

What is left before merging? Let me know if I can help. Eager to see if this fixes some PySR segfaults people have been reporting.

@cjdoris
Copy link
Collaborator Author

cjdoris commented Jun 4, 2025

What is left before merging?

Fix the segfault on Mac 😞 - I've not got a Mac to hand right now so debugging will be painful...

@MilesCranmer
Copy link
Contributor

@cjdoris you should try https://github.com/mxschmitt/action-tmate. It's pretty useful for when you don't have the OS that is failing

@MilesCranmer
Copy link
Contributor

I don't see the segfault btw. It just looks like a notebook error?

_________________________ pytest/test_nb.ipynb::Cell 0 _________________________
Notebook cell execution failed
Cell 0: Timeout of 2000 seconds exceeded while executing cell. Failed to interrupt kernel in 5 seconds, so failing without traceback.

Input:
# NBVAL_IGNORE_OUTPUT
import numpy as np
from juliacall import Main as jl

@cjdoris
Copy link
Collaborator Author

cjdoris commented Jun 5, 2025

Yeah a more recent CI had the same issue. I suspect some bug is causing undefined behaviour which might cause segfaults or silently kill the kernel or ...

@cjdoris
Copy link
Collaborator Author

cjdoris commented Jun 5, 2025

Oh goodie it's non-deterministic. The Mac test just passed but Linux failed! But Mac did fail with Python 3.13.2 (was previously testing on latest Python 3.13.3). At least I might be able to reproduce this locally on Linux...

@cjdoris
Copy link
Collaborator Author

cjdoris commented Jun 5, 2025

I have completely reverted this branch (except for some trivial details in the CI workflow definition) and still get the errors, so these should be fixed first. The error seems to only occur on Mac on Python 3.13.

@cjdoris
Copy link
Collaborator Author

cjdoris commented Jun 5, 2025

Argh Heisenbug! I add tmate to be able to observe it and it goes away.

Does anyone have a Mac they can debug this locally with? I just want to run the tests under gdb or something.

@MilesCranmer
Copy link
Contributor

MilesCranmer commented Jun 5, 2025

I have a mac but don't see the issue myself - just tested. Perhaps it was an upstream dependency which is now fixed?

@MilesCranmer
Copy link
Contributor

Ah sorry let me try Python 3.13, i was using 3.12.

@MilesCranmer
Copy link
Contributor

Actually it does look like the macos test is failing? So you could ssh in now: https://github.com/JuliaPy/PythonCall.jl/actions/runs/15477354381/job/43576415671?pr=618

@MilesCranmer
Copy link
Contributor

I can reproduce the segfault: I see it on 3.13.0-3.13.2, but not on 3.13.3. So perhaps it was a Python bug that is now fixed?

@cjdoris
Copy link
Collaborator Author

cjdoris commented Jun 6, 2025

OK the current failures are due to JuliaLang/julia#58171 so I guess we have another reproducer for that.

@dpinol
Copy link
Contributor

dpinol commented Jun 6, 2025

I've run several long stress tests on linux with julia 1.11.5 & python 3.13.3/3.13.4, which used to consistently crash before this PR.
With the PR, I just get a consistent crash when finishing the stress test in one of the tests, and only on an ubuntu 24.4, but not on an ubuntu 25.4
I'll try to test it with 1.12 beta4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segmentation fault when calling basic python functions
3 participants