Description
Title:
clang_getCursorExtent()
crashes on TRANSLATION_UNIT cursor on macOS but works on Linux
Description:
When calling clang_getCursorExtent()
on a CursorKind.TRANSLATION_UNIT
cursor, and then accessing .start
or .end
of the resulting CXSourceRange
, a segmentation fault occurs on macOS.
This happens consistently across:
- Python versions: 3.11, 3.12, 3.13, and 3.14
- Clang versions: 18.x.y and 19.x.y (built from Homebrew or official sources)
- macOS versions: (tested on macOS 14.4+ Apple Silicon and Intel)
Reproducer (Python clang.cindex
bindings):
from clang import cindex
cindex.Config.set_library_file("/opt/homebrew/opt/llvm/lib/libclang.dylib") # tried with different versions
index = cindex.Index.create()
tu = index.parse("example.cpp", args=[
"-isystem/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include",
]
cursor = tu.cursor # this is of kind TRANSLATION_UNIT
# These work:
print(cursor.kind)
print(cursor.spelling)
print(cursor.location)
# This also works (struct is returned)
print(cursor.extent.ptr_data)
print(cursor.extent.begin_int_data)
print(cursor.extent.end_int_data)
# But this crashes:
print(cursor.extent.start) # or `.end` → causes a segmentation fault on macOS
On Linux, this behaves safely — extent.start.file
may be None
, but no crash occurs.
C-level Reproducer:
CXIndex index = clang_createIndex(0, 0);
CXTranslationUnit tu = clang_parseTranslationUnit(index, "example.cpp", NULL, 0, NULL, 0, CXTranslationUnit_None);
CXCursor cursor = clang_getTranslationUnitCursor(tu);
CXSourceRange range = clang_getCursorExtent(cursor);
CXSourceLocation start = clang_getRangeStart(range); // 💥 Segfaults on macOS
Expected Behavior
clang_getCursorExtent()
should never return a CXSourceRange
that causes clang_getRangeStart()
or clang_getRangeEnd()
to crash, even for synthetic cursors like TRANSLATION_UNIT
.
If no valid extent exists, it should:
- return a dummy or sentinel range, or
- document that
.extent
is unsafe to access on certain kinds (not currently documented inlibclang
)
Notes
- This crash does not occur when calling
.extent.start
on normal entities like functions, structs, typedefs, etc. - The bug only affects the
TRANSLATION_UNIT
cursor, which is returned byclang_getTranslationUnitCursor()
. - The Python
clang.cindex
binding merely exposes the crash; the underlying issue is inclang_getRangeStart()
accessing bad memory.
Suggested Fix
Either:
- Have
clang_getCursorExtent()
return a well-defined dummyCXSourceRange
forTRANSLATION_UNIT
, or - Have
clang_getRangeStart()
gracefully reject invalid or synthetic ranges, or - Document that
TRANSLATION_UNIT
has no valid range
EDIT 2025.05.20
it looks like the issues I am having have to do with internal likely inconsitent state of llvm/clang. If I am trying to access those attributtes immediately after I obtained the translation unit, then all fine. However if I store a python reference (the python object) in a dictionary/list and come back later to it, then it becomes invalid.
Why I need this. I am trying to build a full flat AST with all possible objects (Cursors, Types, Tokens) by returning all attributes and calling all possible functions (well, functions with no args, getting lot done however). A recursive traversal would not work as it would get into cycles. The strategy was to save the objects for later processing, but though the python objects are alive, the C++ objects behind the scene get someway inconsistently messed up.
BTW: I am aware that it is possible to generate json representation of AST, but that does not contain certain information