Skip to content

[BUG]: In some situations, py::cast returns null instead of raising an exception #4099

Open
@ezyang

Description

@ezyang

Required prerequisites

Problem description

I have noticed under certain situations that I can hit the error: TypeError: Unregistered type : c10::intrusive_ptr<c10::SymIntNodeImpl, c10::detail::intrusive_target_default_null_type<c10::SymIntNodeImpl> > when I use a py::cast on a custom smart pointer. Ordinarily, I expect this to work (and in other contexts, it does work), but sometimes it does not. Additionally, the documentation specifies that py::cast should always raise an exception upon cast failure, but I observe that it instead returns a nullptr and sets the Python error context, without actually raising an exception.

I wasn't able to extract a short repro; I do have a full repro but it involves compiling a giant project, LMK if you're interested. The triggering code looks like:

      auto py_symint = py::cast(si.toSymIntNodeImpl()).release().ptr();
      if (!py_symint) throw python_error();

where toSymIntNodeImpl returns a c10::intrusive_ptr<c10::SymIntNodeImpl>. py_symint is null and a Python error is set after calling py::cast. Here is the backtrace at this point:

#0  pybind11::detail::type_caster_generic::src_and_type (src=0x7fffffffad88, 
    cast_type=..., rtti_type=0x0)
    at /data/users/ezyang/pytorch-tmp/cmake/../third_party/pybind11/include/pybind11/detail/type_caster_base.h:788
#1  0x00007fffdf7c16d9 in pybind11::detail::type_caster_base<c10::intrusive_ptr<c10::SymIntNodeImpl, c10::detail::intrusive_target_default_null_type<c10::SymIntNodeImpl> > >::src_and_type (src=0x7fffffffad88)
    at /data/users/ezyang/pytorch-tmp/cmake/../third_party/pybind11/include/pybind11/detail/type_caster_base.h:948
#2  0x00007fffdf7c15af in pybind11::detail::type_caster_base<c10::intrusive_ptr<c10::SymIntNodeImpl, c10::detail::intrusive_target_default_null_type<c10::SymIntNodeImpl> > >::cast (
    src=0x7fffffffad88, policy=pybind11::return_value_policy::move, parent=...)
    at /data/users/ezyang/pytorch-tmp/cmake/../third_party/pybind11/include/pybind11/detail/type_caster_base.h:952
#3  0x00007fffdf7c1570 in pybind11::detail::type_caster_base<c10::intrusive_ptr<c10::SymIntNodeImpl, c10::detail::intrusive_target_default_null_type<c10::SymIntNodeImpl> > >::cast (
    src=..., parent=...)
    at /data/users/ezyang/pytorch-tmp/cmake/../third_party/pybind11/include/pybind11/detail/type_caster_base.h:923
#4  0x00007fffdf7c0b7d in pybind11::cast<c10::intrusive_ptr<c10::SymIntNodeImpl, c10::detail::intrusive_target_default_null_type<c10::SymIntNodeImpl> >, 0> (value=..., 
    policy=pybind11::return_value_policy::move, parent=...)
    at /data/users/ezyang/pytorch-tmp/cmake/../third_party/pybind11/include/pybind11/cast.h:1067
#5  0x00007fffdfc609df in THPSize_NewFromSymSizes (self_=...)
    at /data/users/ezyang/pytorch-tmp/torch/csrc/Size.cpp:60

Stepping through the rest of the execution, lack of type info means type_caster_generic::cast short circuits:

pybind11::detail::type_caster_generic::cast (_src=0x0, policy=pybind11::return_value_policy::move, parent=...,
 tinfo=0x0, copy_constructor=0x7fffdf7c1770 <pybind11::detail::type_caster_base<c10::intrusive_ptr<c10::SymInt
NodeImpl, c10::detail::intrusive_target_default_null_type<c10::SymIntNodeImpl> > >::make_copy_constructor<c10:
:intrusive_ptr<c10::SymIntNodeImpl, c10::detail::intrusive_target_default_null_type<c10::SymIntNodeImpl> >, vo
id>(c10::intrusive_ptr<c10::SymIntNodeImpl, c10::detail::intrusive_target_default_null_type<c10::SymIntNodeImp
l> > const*)::{lambda(void const*)#1}::__invoke(void const*)>, move_constructor=0x7fffdf7c19f0 <pybind11::deta
il::type_caster_base<c10::intrusive_ptr<c10::SymIntNodeImpl, c10::detail::intrusive_target_default_null_type<c
10::SymIntNodeImpl> > >::make_move_constructor<c10::intrusive_ptr<c10::SymIntNodeImpl, c10::detail::intrusive_
target_default_null_type<c10::SymIntNodeImpl> >, void>(c10::intrusive_ptr<c10::SymIntNodeImpl, c10::detail::in
trusive_target_default_null_type<c10::SymIntNodeImpl> > const*)::{lambda(void const*)#1}::__invoke(void const*
)>, existing_holder=0x0) at /data/users/ezyang/pytorch-tmp/cmake/../third_party/pybind11/include/pybind11/deta
il/type_caster_base.h:515                                                                                     
515             if (!tinfo) { // no type info: error will be set already                                      
(gdb)                                                                                                         
516                 return handle();                                                                          
(gdb)                         

but then nothing seems to detect that the handle is empty and so this null handle ends being returned all the way.

c10::intrusive_ptr is a shared ptr like class that does intrusive refcounting. It was declared to be a holder type with

torch/csrc/utils/pybind.h:PYBIND11_DECLARE_HOLDER_TYPE(T, c10::intrusive_ptr<T>, true);

I also interposed the type info registration mechanism, and observed that SymIntNodeImpl was registered, but not c10::intrusive_ptr<SymIntNodeImpl>. The workaround, in this case, is to explicitly deref the intrusive ptr before passing it to cast, but this is error prone and it would be nice to root cause the issue.

This is on pybind11 aa304c9

Reproducible example code

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions