Description
The Problem
Using the FFI, without unsafe
, it's possible to get a segfault or incorrect results from defined behaviors (see rust-lang/rfcs#2195 and #46123) around non-C-like enumerations. I have only tested this on Linux with an x86_64
processor. I have reproduced these problems with both of gcc9 and clang8. The problems are exhibited both on a recent rustc master
and on rustc stable 1.40.0.
I've written two tests against the rust-lang/rust
repository that demonstrate the problems described later in this issue. They are here:
- This is the test for returning enumerations by value that segfaults
- This is the test for passing enumerations by value as arguments that fails assertions
Side note: the language reference states that non-C-like enumerations have unspecified layout, but I believe that this is no longer true after merging #46123 when specifying a repr
for the enumeration.
Reproducing a Segmentation Fault When Returning by Value
Given an enumeration like this:
#[repr(C,u8)]
pub enum OptionLikeType {
OptionLikeSome(u64),
OptionLikeNone,
}
And an FFI function like this:
#[no_mangle]
pub extern "C" fn option_like_type_new(value: u64) -> OptionLikeType {
OptionLikeType::OptionLikeSome(value)
}
And an invocation from C like this:
// Types generated from OptionLikeType by cbindgen version 0.12.1
enum OptionLikeType_Tag {
OptionLikeSome,
OptionLikeNone,
};
typedef uint8_t OptionLikeType_Tag;
typedef struct {
uint64_t _0;
} OptionLikeSome_Body;
typedef struct {
OptionLikeType_Tag tag;
union {
OptionLikeSome_Body option_like_some;
};
} OptionLikeType;
int main(int argc, char *argv[]) {
(void)argc; (void)argv;
printf("Create OptionLikeType by return value\n");
OptionLikeType olt = option_like_type_new(10);
assert(olt.tag == OptionLikeSome);
assert(olt.option_like_some._0 == 10);
return 0;
}
The compiled C file linked against the Rust static library cause a segmentation fault:
$ gcc9 -ggdb3 -Wall -o test_return_option_by_value.bin test_return_option_by_value.c -Ltarget/debug -lrepro -ldl -lpthread
$ ./test_return_option_by_value.bin
Create OptionLikeType by return value
Segmentation fault (core dumped)
Reproducing an Assertion Failure When Passing by Value
Given the same Rust OptionLikeSome
type from above, and the same C type representation, define a Rust function that adds two options like this:
#[no_mangle]
pub extern "C" fn option_like_type_add(a: OptionLikeType, b: OptionLikeType) -> u64 {
use OptionLikeType::{OptionLikeSome, OptionLikeNone};
match (a,b) {
(OptionLikeSome(a), OptionLikeSome(b)) => a + b,
(OptionLikeSome(a), OptionLikeNone) => a,
(OptionLikeNone, OptionLikeSome(b)) => b,
_ => 0,
}
}
Then define a C function that exercises it like this:
int main(int argc, char *argv[]) {
(void)argc; (void)argv;
printf("Add two OptionLikeType instances by value\n");
OptionLikeType a = {.tag = OptionLikeSome, .option_like_some = { ._0 = 10 } };
OptionLikeType b = {.tag = OptionLikeSome, .option_like_some = { ._0 = 20 } };
uint64_t r = option_like_type_add(a, b);
printf("a + b is %" PRIu64 ", and is expected to be 30\n", r);
assert(r == 30);
return 0;
}
When running this C code, we get an unexpected result:
Add two OptionLikeType instances by value
a + b is 4748609293, and is expected to be 30
test_add_option_by_value.bin: test_add_option_by_value.c:18: main: Assertion `r == 30' failed.
Aborted (core dumped)
Other Notes
Reproduction is not limited to the exact shapes above. For example, the primitive type used in the repr
does not seem to affect outcomes. #[repr(C,u32)]
and #[repr(C,u64)]
both exhibit the bugs.
Some Analysis
I believe that Rust is internally consistent about how it passes these enumerations, but it seems to be in violation of the SystemV guidance on how to pass parameters. For example, calling the above extern functions from Rust does not exhibit the invalid behavior.
Furthermore, for enumerations with larger representations, the bugs are also not present. For example, using two u64
values in the OptionLikeSome
definition prevents the crash or assertion failure from surfacing.
SystemV Requirements
I received an enormous amount of help from @iximeow producing the following explanation.
What appears to be occurring is that rustc
expects the caller to allocate space on the caller's stack for the return value, and then expects the caller to pass a pointer to that location in a register. gcc and clang both expect to pass smaller structures as registers. What's also interesting is that rustc does the "right thing" for structs that should have an identical layout.
Here's a comparison of the assembly generated for structs and enumerations that should have extremely similar layout: https://godbolt.org/z/Mo7cJ6. Notice how the initialization of an enumeration is being done on the caller's stack while the initialization of the struct is done entirely in registers.
Here's very similar code, but in C: https://godbolt.org/z/CCxigj. Notice that neither of the C functions use the stack for initialization.
@iximeow and I believe that the proper handling for the enumeration according to SystemV can be described as follows:
- the enum is an aggregate of
{ u8, u64 }
- from psABI-x86_64 section 3.2.3, The classification of aggregate (structures and arrays) and union types works as follows
- Each field of an object is classified recursively so that always two fields are considered. The resulting class is calculated according to the classes of the fields in the eightbyte: ... "(d) If one of the classes is INTEGER, the result is the INTEGER."
- so the elements of this aggregate are both INTEGER, barring other constraints.
When passing this type as an argument:
- If the class is INTEGER, the next available register of the sequence
%rdi
,%rsi
,%rdx
,%rcx
,%r8
and%r9
is used - this is contrary to rustc's usage, passing a pointer to the enum, rather than its items directly.
When returning this type:
- If the class is INTEGER, the next available register of the sequence
%rax
,%rdx
is used. - this is contrary to rustc's usage, passing a pointer to the enum as a hidden first parameter, then returning that pointer in rax.
- this may explain why a larger aggregate does not express this bug - with three or more INTEGER elements, the aggregate no longer fits in return registers, and becomes MEMORY with the hidden-pointer-parameter semantics rustc uses for the two-item aggregate
Summary
Something seems to treat enumerations differently from similarly laid out structs, and treats the enumerations incorrectly when passing them across SystemV ABI boundaries.