Skip to content

[SYCL][RTC] Adopt recent changes from sycl-post-link #17447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 14, 2025

Conversation

jopperm
Copy link
Contributor

@jopperm jopperm commented Mar 13, 2025

@jopperm jopperm self-assigned this Mar 13, 2025
@jopperm jopperm requested review from a team as code owners March 13, 2025 10:10
@jopperm jopperm requested a review from sergey-semenov March 13, 2025 10:10
@jopperm jopperm requested a review from cperkinsintel March 13, 2025 10:11
@jopperm
Copy link
Contributor Author

jopperm commented Mar 13, 2025

@jinge90 I believe that the new bfloat16 device library image mechanism breaks if device binaries are removed again from the program manager. The problem boils down to m_ExportedSymbolImages containing references to destroyed images the next time an image imports the bfloat16 functions.
I've added a workaround for SYCL-RTC in this PR (always loading and cleaning up the bfloat16 device library images), but it would be great if you could look at the problem in general at some point. CC @KseniyaTikhomirova who also had an interest in ProgramManager::removeImages, IIRC.

Signed-off-by: Julian Oppermann <[email protected]>
@jinge90
Copy link
Contributor

jinge90 commented Mar 14, 2025

@jinge90 I believe that the new bfloat16 device library image mechanism breaks if device binaries are removed again from the program manager. The problem boils down to m_ExportedSymbolImages containing references to destroyed images the next time an image imports the bfloat16 functions. I've added a workaround for SYCL-RTC in this PR (always loading and cleaning up the bfloat16 device library images), but it would be great if you could look at the problem in general at some point. CC @KseniyaTikhomirova who also had an interest in ProgramManager::removeImages, IIRC.

Hi, @jopperm
I used dlopen/dlcose code which will trigger removeImage to reproduce a crash, it should be a bug in addImage. Here is a quick fix for the crash issue in my side: #17461 , could you try in your side to see if the issue is gone?
Thanks very much for pointing out this!

@sommerlukas sommerlukas merged commit 77e110d into intel:sycl Mar 14, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants