Skip to content

cursor.extent or clang_getCursorExtent() crashes on TRANSLATION_UNIT cursor on macOS, but works on Linux #140241

Open
@zokrezyl

Description

@zokrezyl

Title:

clang_getCursorExtent() crashes on TRANSLATION_UNIT cursor on macOS but works on Linux

Description:

When calling clang_getCursorExtent() on a CursorKind.TRANSLATION_UNIT cursor, and then accessing .start or .end of the resulting CXSourceRange, a segmentation fault occurs on macOS.

This happens consistently across:

  • Python versions: 3.11, 3.12, 3.13, and 3.14
  • Clang versions: 18.x.y and 19.x.y (built from Homebrew or official sources)
  • macOS versions: (tested on macOS 14.4+ Apple Silicon and Intel)

Reproducer (Python clang.cindex bindings):

from clang import cindex

cindex.Config.set_library_file("/opt/homebrew/opt/llvm/lib/libclang.dylib")  # tried with different versions

index = cindex.Index.create()
tu = index.parse("example.cpp", args=[
        "-isystem/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include",
    ]

cursor = tu.cursor  # this is of kind TRANSLATION_UNIT

# These work:
print(cursor.kind)
print(cursor.spelling)
print(cursor.location)

# This also works (struct is returned)
print(cursor.extent.ptr_data)
print(cursor.extent.begin_int_data)
print(cursor.extent.end_int_data)

# But this crashes:
print(cursor.extent.start)  # or `.end` → causes a segmentation fault on macOS

On Linux, this behaves safely — extent.start.file may be None, but no crash occurs.

C-level Reproducer:

CXIndex index = clang_createIndex(0, 0);
CXTranslationUnit tu = clang_parseTranslationUnit(index, "example.cpp", NULL, 0, NULL, 0, CXTranslationUnit_None);
CXCursor cursor = clang_getTranslationUnitCursor(tu);

CXSourceRange range = clang_getCursorExtent(cursor);
CXSourceLocation start = clang_getRangeStart(range);  // 💥 Segfaults on macOS

Expected Behavior

clang_getCursorExtent() should never return a CXSourceRange that causes clang_getRangeStart() or clang_getRangeEnd() to crash, even for synthetic cursors like TRANSLATION_UNIT.

If no valid extent exists, it should:

  • return a dummy or sentinel range, or
  • document that .extent is unsafe to access on certain kinds (not currently documented in libclang)

Notes

  • This crash does not occur when calling .extent.start on normal entities like functions, structs, typedefs, etc.
  • The bug only affects the TRANSLATION_UNIT cursor, which is returned by clang_getTranslationUnitCursor().
  • The Python clang.cindex binding merely exposes the crash; the underlying issue is in clang_getRangeStart() accessing bad memory.

Suggested Fix

Either:

  • Have clang_getCursorExtent() return a well-defined dummy CXSourceRange for TRANSLATION_UNIT, or
  • Have clang_getRangeStart() gracefully reject invalid or synthetic ranges, or
  • Document that TRANSLATION_UNIT has no valid range

EDIT 2025.05.20

it looks like the issues I am having have to do with internal likely inconsitent state of llvm/clang. If I am trying to access those attributtes immediately after I obtained the translation unit, then all fine. However if I store a python reference (the python object) in a dictionary/list and come back later to it, then it becomes invalid.

Why I need this. I am trying to build a full flat AST with all possible objects (Cursors, Types, Tokens) by returning all attributes and calling all possible functions (well, functions with no args, getting lot done however). A recursive traversal would not work as it would get into cycles. The strategy was to save the objects for later processing, but though the python objects are alive, the C++ objects behind the scene get someway inconsistently messed up.

BTW: I am aware that it is possible to generate json representation of AST, but that does not contain certain information

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions