-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Explain that size_hint cannot be trusted #29684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
/// # Performance | ||
/// | ||
/// `size_hint()` should be cheap, it should work in `O(1)` time, | ||
/// and should be cheaper than call to `next()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we don't usually document these kinds of things. I'm not sure this is something we want to require. I'm not opposed, exactly, but I'd like to hear what @rust-lang/libs has to say
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steveklabnik Vec::extend_desugared
calls size_hint()
in loop. I think either Iterator
should declare to be cheap or Vec
shouldn't rely on that. I think former is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cheaper than a call to next doesn't sound necessary if you already say size_hint should be cheap.
Extend only calls size_hint when reallocating, and since vector allocation growth is exponential, size_hint is only called log(n) times for extending O(n) elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bluss yes, you are right about log(n)
. I don't know how to say better.
- just "cheap" is vague
O(n)
is unacceptableO(1)
is probably two strictO(n / log(n))
is too complicated
I should probably drop Performance section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a lot of interfaces which are intuitively expected to be cheap; it's not clear to me that mentioning this yields significant dividends. Although I do think that the documentation for size_hint should emphasize it's a very "opportunistic" thing. That is, if there isn't a simple way
to determine the size_hint, it's not worth overloading. In particular this is relevant in terms of how much you can expect a size_hint to be accurate given a concrete implementor. slice iterator? Probably good. Filter? Probably not.
Also, Iterator::next can take an unbounded amount of time in terms of its length with things like iterator adaptors (may execute arbitrary functions) and collections like HashMap (expected O(cap/len) per element).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not clear to me that mentioning this yields significant dividends
It answers the question: given arbitrary iterator, how often can I call size_hint
without making iteration slower than linear?
Thanks @stepancheg ! Just a few grammar nits and a question for the libs team. |
Updated grammar, waiting to hear about performance requirements. |
/// estimation of size must not break memory safety. | ||
/// | ||
/// That said, the implementation should provide proper estimation, | ||
/// because otherwize it may hurt performance or lead to a panic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling: otherwise
I'd like to use the phrase provide a correct estimation. It may hurt performance or break the trait's protocol.
Same applies to `len()` function of `ExactSizeIterator` trait.
Updated the patch, dropped "Performance" section, renamed "Safety" to "Implementation notes", changed couple of phrases after bluss@ comments. |
/// # Implementation notes | ||
/// | ||
/// It is not enforced that an iterator implementation yields | ||
/// declared number of elements. A buggy iterator may yield less |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the declared
/// than lower bound or more than upper bound of elements. | ||
/// | ||
/// `size_hint()` can be used to optimize the fast path, and can be | ||
/// used to panic if estimation is incorrect, but must not be used, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda get what you're saying here, but it's maybe trying to compact too much information. My suggestion:
size_hint is primarily intended to be used for optimizations such as reserving space for the elements of the iterator. One may assert that the bounds on the size_hint are accurate, but this is not advisable. It's incorrect to assume that size_hint is correct to e.g. omit bounds checks in unsafe code. Implementing size_hint incorrectly should not lead to memory safety violations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partially applied your version.
c621b59
to
44ea32e
Compare
/// | ||
/// That said, the implementation should provide a correct | ||
/// estimation, because otherwise it would be a violation of the | ||
/// trait's protocol. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to clarify that the default is correct for all iterators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
/// The default implementation returns `(0, None)` which is correct
/// for any iterator.
👍 this is really solid now. |
@bors r+ rollup Nice stuff! |
📌 Commit f6f99ce has been approved by |
* `size_hint()` cannot be relied upon to perform memory unsafe operations Same applies to `len()` function of `ExactSizeIterator` trait.
* wrap to 80 cols * small grammar fix, missing 'the'
size_hint()
cannot be relied upon to perform memory unsafe operationsSame applies to
len()
function ofExactSizeIterator
trait.