Skip to content

Read::read_to_end time is quadratic for input over 8KB #45851

Closed
@mbrubeck

Description

@mbrubeck

The default implementation of read_to_end grows its buffer by at most DEFAULT_BUF_SIZE bytes (8KB) at a time. When the input is large, this takes O(n²) time due to repeated reallocation and copying. Calling read_to_end on a 100MB file with an empty buffer will resize and copy the buffer over 10,000 times.

When the input size is known ahead of time, callers can avoid this problem by passing in a pre-allocated buffer. It would be nice if this was not necessary in cases where the standard library could do this automatically.

  • read_to_end could be specialized to pre-reserve space for types that implement Seek.
  • Types like File that have other ways to determine the input size could provide their own implementations of read_to_end.

This still leaves potentially quadratic behavior for cases where the input size can't be determined up front. This could be fixed using exponential growth, at the cost of wasting more memory due to unused buffer space.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions