Skip to content

Explain that size_hint cannot be trusted #29684

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 8, 2015
Merged

Conversation

stepancheg
Copy link
Contributor

  • size_hint() cannot be relied upon to perform memory unsafe operations

Same applies to len() function of ExactSizeIterator trait.

@rust-highfive
Copy link
Contributor

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

/// # Performance
///
/// `size_hint()` should be cheap, it should work in `O(1)` time,
/// and should be cheaper than call to `next()`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we don't usually document these kinds of things. I'm not sure this is something we want to require. I'm not opposed, exactly, but I'd like to hear what @rust-lang/libs has to say

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steveklabnik Vec::extend_desugared calls size_hint() in loop. I think either Iterator should declare to be cheap or Vec shouldn't rely on that. I think former is better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheaper than a call to next doesn't sound necessary if you already say size_hint should be cheap.

Extend only calls size_hint when reallocating, and since vector allocation growth is exponential, size_hint is only called log(n) times for extending O(n) elements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bluss yes, you are right about log(n). I don't know how to say better.

  • just "cheap" is vague
  • O(n) is unacceptable
  • O(1) is probably two strict
  • O(n / log(n)) is too complicated

I should probably drop Performance section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a lot of interfaces which are intuitively expected to be cheap; it's not clear to me that mentioning this yields significant dividends. Although I do think that the documentation for size_hint should emphasize it's a very "opportunistic" thing. That is, if there isn't a simple way
to determine the size_hint, it's not worth overloading. In particular this is relevant in terms of how much you can expect a size_hint to be accurate given a concrete implementor. slice iterator? Probably good. Filter? Probably not.

Also, Iterator::next can take an unbounded amount of time in terms of its length with things like iterator adaptors (may execute arbitrary functions) and collections like HashMap (expected O(cap/len) per element).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gankro

it's not clear to me that mentioning this yields significant dividends

It answers the question: given arbitrary iterator, how often can I call size_hint without making iteration slower than linear?

@steveklabnik
Copy link
Member

Thanks @stepancheg ! Just a few grammar nits and a question for the libs team.

@stepancheg
Copy link
Contributor Author

Updated grammar, waiting to hear about performance requirements.

/// estimation of size must not break memory safety.
///
/// That said, the implementation should provide proper estimation,
/// because otherwize it may hurt performance or lead to a panic.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling: otherwise

I'd like to use the phrase provide a correct estimation. It may hurt performance or break the trait's protocol.

Same applies to `len()` function of `ExactSizeIterator` trait.
@stepancheg
Copy link
Contributor Author

Updated the patch, dropped "Performance" section, renamed "Safety" to "Implementation notes", changed couple of phrases after bluss@ comments.

/// # Implementation notes
///
/// It is not enforced that an iterator implementation yields
/// declared number of elements. A buggy iterator may yield less
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the declared

@stepancheg stepancheg changed the title Document size_hint requirements and guarantees Explain that size_hint cannot be trusted Nov 8, 2015
/// than lower bound or more than upper bound of elements.
///
/// `size_hint()` can be used to optimize the fast path, and can be
/// used to panic if estimation is incorrect, but must not be used,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda get what you're saying here, but it's maybe trying to compact too much information. My suggestion:

size_hint is primarily intended to be used for optimizations such as reserving space for the elements of the iterator. One may assert that the bounds on the size_hint are accurate, but this is not advisable. It's incorrect to assume that size_hint is correct to e.g. omit bounds checks in unsafe code. Implementing size_hint incorrectly should not lead to memory safety violations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially applied your version.

@stepancheg stepancheg force-pushed the size-hint branch 2 times, most recently from c621b59 to 44ea32e Compare November 8, 2015 20:43
///
/// That said, the implementation should provide a correct
/// estimation, because otherwise it would be a violation of the
/// trait's protocol.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to clarify that the default is correct for all iterators?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

    /// The default implementation returns `(0, None)` which is correct
    /// for any iterator.

@Gankra
Copy link
Contributor

Gankra commented Nov 8, 2015

👍 this is really solid now.

@Gankra
Copy link
Contributor

Gankra commented Nov 8, 2015

@bors r+ rollup

Nice stuff!

@bors
Copy link
Collaborator

bors commented Nov 8, 2015

📌 Commit f6f99ce has been approved by Gankro

@bors
Copy link
Collaborator

bors commented Nov 8, 2015

⌛ Testing commit f6f99ce with merge 5b4986f...

bors added a commit that referenced this pull request Nov 8, 2015
* `size_hint()` cannot be relied upon to perform memory unsafe operations

Same applies to `len()` function of `ExactSizeIterator` trait.
@bors bors merged commit f6f99ce into rust-lang:master Nov 8, 2015
steveklabnik added a commit to steveklabnik/rust that referenced this pull request Nov 10, 2015
* wrap to 80 cols
* small grammar fix, missing 'the'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants