Skip to content

Virtualised cargo fails with a "No such device" error n°19 likely because of MAP_SHARED mmap call on qemu + virtiofsd with --cache=never. #122262

Closed
@gl-yziquel

Description

@gl-yziquel

Problem

I believe this issue is possibly kind of the same issue that was raised back in 2016:

rust-lang/cargo#2808

Context: Virtualised "cargo build" fails with "No such device" whenever setup is qemu +virtiofsd with --cache=never. Essentially the same issue that I just got fixed for gix. (The link below has more details and context.)

GitoxideLabs/gitoxide#1312

As can be seen on the issue for gix, in the links referenced there, rust uses by default a MAP_SHARED (like gix before today's fix) instead of a MAP_PRIVATE (like git and not gix) for mmap system calls. MAP_SHARED has more stringent coherence requirements than MAP_PRIVATE, BUT that bug doesn't materialise with a virtiofs filesystem unless cache is disabled with --cache=never (and I need to disable caching because heavy IO workloads on the guest blow up the file descriptors on the host if cache is enabled).

To see the "No such device" issue, one needs both MAP_SHARED in place of MAP_PRIVATE and --cache=never on virtiosfd.

mini-me@virtucon ~/h/s/my-project (master)> cargo build                                                                
   Compiling rand_core v0.6.4                                                                                                                                                                                                                 
   Compiling siphasher v0.3.11                             
   Compiling libc v0.2.153                                                                                                                                                                                                                    
   Compiling typenum v1.17.0                                                                                           
   Compiling memchr v2.7.1                                 
   Compiling minimal-lexical v0.2.1                                                                                    
   Compiling tinyvec_macros v0.1.1                                                                                     
   Compiling version_check v0.9.4                                                                                      
   Compiling cfg-if v1.0.0                                                                                                                                                                                                                    
   Compiling smallvec v1.13.1                                                                                          
   Compiling bitflags v1.3.2                               
   Compiling fnv v1.0.7                                                                                                
   Compiling bitflags v2.4.2                                                                                           
   Compiling unicode-width v0.1.11                                                                                     
error[E0786]: found invalid metadata files for crate `core`                                                            
  |                                                                                                                    
  = note: failed to mmap file '/home/mini-me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcore-193cf992125ccd4c.rlib': No such device (os error 19)

The fix is to use MAP_PRIVATE in rust code, like done for gix:

GitoxideLabs/gitoxide@88061a1

I believe this is a bug that is pervasive in the wider rust ecosystem when virtualised because MAP_SHARED is used by default. It hits gix. It hits cargo.

Conclusion: Either the rust ecosystem was wrong to choose to use MAP_SHARED as a default, either the virtiofs has a semi-broken memory mapped file implementation. I assume the latter is true, but that doesn't change the fact that virtualised rust using mmap() system calls tends to be affected, like cargo is, by spurrious "No such device" linux OS n° 19 errors for code that runs perfectly fine when not virtualised.

Debugging gix was easy with strace. Debugging cargo to provide nice reproducible strace dumps is harder to me as I get lost in the subprocesses it triggers. Any guidance as to how to get the relevant strace dumps would be appreciated. But the gix link shows that MAP_SHARED indeed is what triggers these spurrious errors when virtualised, like in that old 2016 bug report linked at the top. Here is my own stracing of cargo build:

mini-me@virtucon ~/h/s/my-project (master) [SIGINT]> strace cargo build
clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7795163f1990, parent_tid=0x7795163f1990, exit_signal=0, stack=0x7795161f1000, stack_size=0x1ffbc0, tls=0x7795163f16c0} => {parent_tid=[43528]}, 88) = 43528
[...]
write(2, "error[E0786]", 12error[E0786])            = 12
write(2, ": found invalid metadata files f"..., 47: found invalid metadata files for crate `core`) = 47
[...]
write(2, "note", 4note)                     = 4
write(2, ": failed to mmap file '/home/min"..., 189: failed to mmap file '/home/mini-me/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcore-193cf992125ccd4c.rlib': No such device (os error 19)) = 189

The problem likely occurs in the clone3() system call. I don't know how to strace it to provide clear cut proof.

But MAP_SHARED inside the clone does seem to be something that occured in the 2016 bug:

rust-lang/cargo#2808 (comment)

Steps

Virtualise an Ubuntu Mantic VM, with a virtiofs mount backed up by a ZFS share. Ensure virtiofsd is launched with --cache=never. Install cargo on that virtualised filesystem. Try to build some rust code with cargo build. Observe "No such device" error 19 with "cargo build".

This MAP_SHARED issue likely impacts more virtualisation setups than the one I described. MAP_PRIVATE is arguably more unsafe with ugly corner cases, but MAP_SHARED asks for too much guarantees on some virtualised filesystems.

Possible Solution(s)

Replace MAP_SHARED mmap() system calls in the rust stack of cargo by MAP_PRIVATE mmap() system calls. Like gix / gitoxide just did.

Notes

No response

Version

cargo 1.75.0 (1d8b05cdd 2023-11-20)
release: 1.75.0
commit-hash: 1d8b05cdd1287c64467306cf3ca2c8ac60c11eb0
commit-date: 2023-11-20
host: x86_64-unknown-linux-gnu
libgit2: 1.7.1 (sys:0.18.1 vendored)
libcurl: 8.4.0-DEV (sys:0.4.68+curl-8.4.0 vendored ssl:OpenSSL/1.1.1u)
ssl: OpenSSL 1.1.1u  30 May 2023
os: Ubuntu 23.10 (mantic) [64-bit]

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-feature-requestCategory: A feature request, i.e: not implemented / a PR.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions