Skip to content

I/O splicing #160

Closed
Closed
@dead-claudia

Description

@dead-claudia

Proposal

Problem statement

std::io::copy provides a way to do a complete copy, but blocks indefinitely until it's completed. There's no way to portably do partial copies within the standard library, and it's not extensible to other types.

Motivation, use-cases

std::io::copy, is very useful for quick and dirty copying from a readable to a writable. However, it does have some serious limitations:

  • You can intercept cases where writes would fail, but signal interrupts are ignored. This is specifically bad when using file descriptor polling outside async runtimes, because it breaks signal handling.
  • The current source code is hard-coded to optimize only for native file handles and (as of RFC 2930) BufRead. It's not extensible to other types, and more importantly, userland types can't optimize for each other, either.
  • As it loops indefinitely until completed, it cannot be used in async runtimes.

There's also a number of outstanding performance gaps:

  • Copying from a Vec<u8> or VecDeque<u8> to a file or socket currently involves an intermediate copy, one that's very easily avoidable.
  • VecDeque<u8> itself only tries to copy one half at a time, when in many cases it could just use a single vectored write. This would make it a lot more viable as a general byte buffer when paired with a &[u8] reader or &mut [u8] writer.

The RFC draft linked below goes into detail about potential use cases, including some sample code. It's admittedly a bit complex of a proposal.

Solution sketches

Option 1: a Splicer trait. This is the primary one due to concerns around whether adding the needed "type" field to std::fs::File (and possibly std::net::TcpStream) is possible, or if too many packages break due to someone transmuting between those and raw file descriptors somewhere in the dependency chain.

// Exported from `std::io`
pub trait Splicer<'a, R, W> {
    fn new(reader: &'a mut R, writer: &'a mut W) -> io::Result<Self>;
    fn splice(&mut self) -> io::Result<usize>;
    fn reader(&mut self) -> &'a mut R;
    fn writer(&mut self) -> &'a mut W;
}

pub trait Read {
    // ...
    type Splicer<'a, W: io::Write>: Splicer<'a, Self, W> = DefaultSplicer<'a, Self, W>;
    fn splicer_to<'a, W: io::Write>(
        &'a mut self,
        dest: &'a mut W
    ) -> io::Result<Self::Splicer<'a, W>> {
        Self::Splicer::new(self, dest)
    }
    fn can_efficiently_splice_to(&self) -> bool;
}

pub trait Write {
    // ...
    type Splicer<'a, R: io::Read>: Splicer<'a, R, Self> = R::Splicer<'a>;
    fn splicer_from<'a, R: io::Read>(
        &'a mut self,
        src: &'a mut R
    ) -> io::Result<Self::Splicer<'a, R>> {
        Self::Splicer::new(self, dest)
    }
    fn can_efficiently_splice_from(&self) -> bool;
}

pub struct DefaultSplicer<'a, R, W> { ... }
impl<'a, R: io::Read, W: io::Write> Splicer<'a, R, W> for DefaultSplicer<'a, R, W> { ... }

pub struct OptimalSplicer<'a, R, W> { ... }
impl<'a, R: io::Read, W: io::Write> Splicer<'a, R, W> for OptimalSplicer<'a, R, W> { ... }

pub fn splicer<'a, R: io::Read, W: io::Write>(reader: &'a mut R, writer: &'a mut W) -> OptimalSplicer<'a, R, W>;

Option 2: splice_to/splice_from methods, preferred if adding those fields to std::fs::File/possibly std::net::TcpStream won't create compatibility concerns.

// Exported from `std::io`
trait Read {
    // ...
    fn splice_to<W: io::Write>(&mut self, dest: &mut W) -> io::Result<usize>;
    fn can_efficiently_splice_to(&self) -> bool;
}

trait Write {
    // ...
    fn splice_from<R: io::Read>(&mut self, src: &mut R) -> io::Result<usize> {
        src.splice_to(self)
    }
    fn can_efficiently_splice_from(&self) -> bool;
}

pub fn splice<R: io::Read, W: io::Write>(reader: &mut R, writer: &mut W) -> io::Result<usize>;

And in either case, std::io::copy just splicing instead of a complicated copy in its loop, with the read → write copy fallback being moved to the default splicer.

Links and related work

Drafted an RFC here that goes into much gorier detail: https://github.com/dead-claudia/rust-rfcs/blob/io-splice/text/0000-io-splice.md

Decided to file it here first rather than go straight into the RFC process. It's complex enough it'll probably go through an RFC anyways just because of all the cross-cutting concerns.

What happens now?

This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.

Metadata

Metadata

Assignees

No one assigned

    Labels

    T-libs-apiapi-change-proposalA proposal to add or alter unstable APIs in the standard libraries

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions