Skip to content

SystemV ABI Mismatch on x86 with a repr(C) enum for extern "C"/FFI functions. #68190

Closed
@sw17ch

Description

@sw17ch

The Problem

Using the FFI, without unsafe, it's possible to get a segfault or incorrect results from defined behaviors (see rust-lang/rfcs#2195 and #46123) around non-C-like enumerations. I have only tested this on Linux with an x86_64 processor. I have reproduced these problems with both of gcc9 and clang8. The problems are exhibited both on a recent rustc master and on rustc stable 1.40.0.

I've written two tests against the rust-lang/rust repository that demonstrate the problems described later in this issue. They are here:

  1. This is the test for returning enumerations by value that segfaults
  2. This is the test for passing enumerations by value as arguments that fails assertions

Side note: the language reference states that non-C-like enumerations have unspecified layout, but I believe that this is no longer true after merging #46123 when specifying a repr for the enumeration.

Reproducing a Segmentation Fault When Returning by Value

Given an enumeration like this:

#[repr(C,u8)]
pub enum OptionLikeType {
	OptionLikeSome(u64),
	OptionLikeNone,
}

And an FFI function like this:

#[no_mangle]
pub extern "C" fn option_like_type_new(value: u64) -> OptionLikeType {
    OptionLikeType::OptionLikeSome(value)
}

And an invocation from C like this:

// Types generated from OptionLikeType by cbindgen version 0.12.1
enum OptionLikeType_Tag {
  OptionLikeSome,
  OptionLikeNone,
};
typedef uint8_t OptionLikeType_Tag;

typedef struct {
  uint64_t _0;
} OptionLikeSome_Body;

typedef struct {
  OptionLikeType_Tag tag;
  union {
    OptionLikeSome_Body option_like_some;
  };
} OptionLikeType;

int main(int argc, char *argv[]) {
  (void)argc; (void)argv;

  printf("Create OptionLikeType by return value\n");
  OptionLikeType olt = option_like_type_new(10);
  assert(olt.tag == OptionLikeSome);
  assert(olt.option_like_some._0 == 10);

  return 0;
}

The compiled C file linked against the Rust static library cause a segmentation fault:

$ gcc9 -ggdb3 -Wall -o test_return_option_by_value.bin test_return_option_by_value.c -Ltarget/debug -lrepro -ldl -lpthread
$ ./test_return_option_by_value.bin
Create OptionLikeType by return value
Segmentation fault (core dumped)

Reproducing an Assertion Failure When Passing by Value

Given the same Rust OptionLikeSome type from above, and the same C type representation, define a Rust function that adds two options like this:

#[no_mangle]
pub extern "C" fn option_like_type_add(a: OptionLikeType, b: OptionLikeType) -> u64 {
    use OptionLikeType::{OptionLikeSome, OptionLikeNone};
    match (a,b) {
        (OptionLikeSome(a), OptionLikeSome(b)) => a + b,
        (OptionLikeSome(a), OptionLikeNone) => a,
        (OptionLikeNone, OptionLikeSome(b)) => b,
        _ => 0,
    }
}

Then define a C function that exercises it like this:

int main(int argc, char *argv[]) {
  (void)argc; (void)argv;
  printf("Add two OptionLikeType instances by value\n");

  OptionLikeType a = {.tag = OptionLikeSome, .option_like_some = { ._0 = 10 } };
  OptionLikeType b = {.tag = OptionLikeSome, .option_like_some = { ._0 = 20 } };
  
  uint64_t r = option_like_type_add(a, b);
  printf("a + b is %" PRIu64 ", and is expected to be 30\n", r);
  assert(r == 30);

  return 0;
}

When running this C code, we get an unexpected result:

Add two OptionLikeType instances by value
a + b is 4748609293, and is expected to be 30
test_add_option_by_value.bin: test_add_option_by_value.c:18: main: Assertion `r == 30' failed.
Aborted (core dumped)

Other Notes

Reproduction is not limited to the exact shapes above. For example, the primitive type used in the repr does not seem to affect outcomes. #[repr(C,u32)] and #[repr(C,u64)] both exhibit the bugs.

Some Analysis

I believe that Rust is internally consistent about how it passes these enumerations, but it seems to be in violation of the SystemV guidance on how to pass parameters. For example, calling the above extern functions from Rust does not exhibit the invalid behavior.

Furthermore, for enumerations with larger representations, the bugs are also not present. For example, using two u64 values in the OptionLikeSome definition prevents the crash or assertion failure from surfacing.

SystemV Requirements

I received an enormous amount of help from @iximeow producing the following explanation.

What appears to be occurring is that rustc expects the caller to allocate space on the caller's stack for the return value, and then expects the caller to pass a pointer to that location in a register. gcc and clang both expect to pass smaller structures as registers. What's also interesting is that rustc does the "right thing" for structs that should have an identical layout.

Here's a comparison of the assembly generated for structs and enumerations that should have extremely similar layout: https://godbolt.org/z/Mo7cJ6. Notice how the initialization of an enumeration is being done on the caller's stack while the initialization of the struct is done entirely in registers.

Here's very similar code, but in C: https://godbolt.org/z/CCxigj. Notice that neither of the C functions use the stack for initialization.

@iximeow and I believe that the proper handling for the enumeration according to SystemV can be described as follows:

  • the enum is an aggregate of { u8, u64 }
  • from psABI-x86_64 section 3.2.3, The classification of aggregate (structures and arrays) and union types works as follows
    • Each field of an object is classified recursively so that always two fields are considered. The resulting class is calculated according to the classes of the fields in the eightbyte: ... "(d) If one of the classes is INTEGER, the result is the INTEGER."
  • so the elements of this aggregate are both INTEGER, barring other constraints.

When passing this type as an argument:

  • If the class is INTEGER, the next available register of the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is used
  • this is contrary to rustc's usage, passing a pointer to the enum, rather than its items directly.

When returning this type:

  • If the class is INTEGER, the next available register of the sequence %rax, %rdx is used.
  • this is contrary to rustc's usage, passing a pointer to the enum as a hidden first parameter, then returning that pointer in rax.
  • this may explain why a larger aggregate does not express this bug - with three or more INTEGER elements, the aggregate no longer fits in return registers, and becomes MEMORY with the hidden-pointer-parameter semantics rustc uses for the two-item aggregate

Summary

Something seems to treat enumerations differently from similarly laid out structs, and treats the enumerations incorrectly when passing them across SystemV ABI boundaries.

Metadata

Metadata

Assignees

Labels

A-FFIArea: Foreign function interface (FFI)C-bugCategory: This is a bug.O-linuxOperating system: LinuxO-macosOperating system: macOSO-x86_32Target: x86 processors, 32 bit (like i686-*) (IA-32)P-highHigh priorityT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions