Skip to content

[clang][modules] Fix filesystem races in ModuleManager #131354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jansvoboda11
Copy link
Contributor

@jansvoboda11 jansvoboda11 commented Mar 14, 2025

The ModuleManager uses FileEntry objects to uniquely identify module files. This requires first consulting the FileManager (and therefore the file system) when loading PCM files. This is problematic, as this may load a different PCM file to what's already in the InMemoryModuleCache and fail the size and mtime checks.

This PR changes things so that module files are identified by their file system path. This removes the need of knowing the file entry at the start and allows us to fix the race. The downside is that we no longer get the FileManager inode-based uniquing, meaning symlinks in the module cache no longer work. I think this is fine, since Clang itself never creates symlinks in that directory. Moreover, we already had to work around filesystems recycling inode numbers, so this actually seems like a win to me (and resolves a long-standing FIXME-like comment).

Note that this change also requires that all places calling into ModuleManager from within a single CompilerInstance use consistent paths to refer to module files. This might be problematic if there are concurrent Clang processes operating on the same module cache directory but with different spellings of its path - the IMPORT records in PCM files will be valid and consistent within single process, but may break uniquing in another process. We might need to do some canonicalization of the module cache paths to avoid this issue.

@hahnjo
Copy link
Member

hahnjo commented Mar 18, 2025

Thanks for the PR, testing for the use case described in #130795 it seems to fix the issue!

tl;dr I'm building LLVM+Clang+libcxx with

-DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS=clang -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind"

and then use the built clang with libc++ to configure LLVM with LLVM_ENABLE_MODULES=ON:

-DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=/path/to/clang -DCMAKE_CXX_COMPILER=/path/to/clang++ -DCMAKE_CXX_FLAGS="-Xclang -fno-implicit-modules-use-lock" -DLLVM_ENABLE_LIBCXX=ON -DLLVM_ENABLE_MODULES=ON

Using -Xclang -fno-implicit-modules-use-lock as described in #130795 (comment) instead of my manual changes still fails (almost) instantly with main, but passed two times the complete build with this PR 👏

jansvoboda11 and others added 2 commits March 19, 2025 10:04
The `ModuleManager` uses `FileEntry` objects to uniquely identify module files. This requires first consulting the `FileManager` (and therefore the file system) when loading PCM files. This is problematic, as this may load a different PCM file to what's already in the `InMemoryModuleCache` and fail the size and mtime checks.

This PR changes things so that module files are identified by their file system path. This removes the need of knowing the file entry at the start and allows us to fix the race. The downside is that we no longer get the `FileManager` inode-based uniquing, meaning symlinks in the module cache no longer work. I think this is fine, since Clang itself never creates symlinks in that directory. Moreover, we already had to work around filesystems recycling inode numbers, so this actually seems like a win to me (and resolves a long-standing FIXME-like comment).

Note that this change also requires that all places calling into `ModuleManager` from within a single `CompilerInstance` use consistent paths to refer to module files. This might be problematic if there are concurrent Clang processes operating on the same module cache directory but with different spellings - the `IMPORT` records in PCM files will be valid and consistent within single process, but may break uniquing in another process. We might need to do some canonicalization of the module cache paths to avoid this issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants