-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Avoid zero-length memcpy in formatting #85391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
r? @scottmcm (rust-highfive has picked a reviewer for you, use r? to override) |
@bors try @rust-timer queue It may be advantageous to consider guarding the write_str calls in the formatting machinery here with a nonzero length check too, or trying to rework formatting machinery to avoid empty pieces entirely, but those are less clear wins I suspect. |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 4bcad44febd1d5bed6d720c6fa314865bbf7e251 with merge 021776915c55787521fe6b579f88425c7d9bff19... |
☀️ Try build successful - checks-actions |
Queued 021776915c55787521fe6b579f88425c7d9bff19 with parent fe72845, future comparison URL. |
Finished benchmarking try commit (021776915c55787521fe6b579f88425c7d9bff19): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Nice performance win! It seems unfortunate to have to have a public function just because it lives in core and needs to be called from alloc. Is there any way we could avoid exposing that function? |
This avoids a zero-length write_str call, which boils down to a zero-length memmove and ultimately costs quite a few instructions on some workloads. This is approximately a 0.33% instruction count win on diesel-check.
4bcad44
to
c7c9336
Compare
I don't think exposing the function is necessarily avoidable, but it is worth noting there have been several requests to create/store @bors try @rust-timer queue -- added an is_empty guard on the push_str calls, we'll see if that helps or hurts here. Maybe that would be sufficient and we don't need the extra function at all, though I'd guess to_string() is a bit more efficient without going through the format machinery at all. |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit c7c9336 with merge 1dcdf9bc01bf15dfd254ba6643d4747b20b4cb16... |
☀️ Try build successful - checks-actions |
Queued 1dcdf9bc01bf15dfd254ba6643d4747b20b4cb16 with parent 44ec846, future comparison URL. |
Finished benchmarking try commit (1dcdf9bc01bf15dfd254ba6643d4747b20b4cb16): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Alright, that seemed to just make this more of an improvement. I'll try an intermediate version (i.e., just the second commit) later too, and then we can decide which pieces to land. |
@bors try @rust-timer queue |
⌛ Trying commit 42f56563152f3ce79b0dffe7b45fb531d6bce19e with merge 3caeb07804c31d2ad5de523df7e6348f65c4cf07... |
☀️ Try build successful - checks-actions |
Queued 3caeb07804c31d2ad5de523df7e6348f65c4cf07 with parent 9964284, future comparison URL. |
Finished benchmarking try commit (3caeb07804c31d2ad5de523df7e6348f65c4cf07): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
42f5656
to
c7c9336
Compare
This change seems reasonable, given the perf results, so Does make me wonder what it is about the |
📌 Commit c7c9336 has been approved by |
-doc builds are just generating documentation which is naturely more dominated by string formatting; this is likely to improve the performance of any string formatting which has a zero-length string in between or before the first interpolated (e.g., |
⌛ Testing commit c7c9336 with merge d8f7764a4ffcb67338b1152661f9f318c3e16b35... |
💔 Test failed - checks-actions |
MacOS builder lost networking. @bors retry |
☀️ Test successful - checks-actions |
This has two separate and somewhat orthogonal commits. The first change adjusts the ToString general impl for all types that implement Display; it no longer uses the full format machinery, rather directly falling onto a
std::fmt::Display::fmt
call. The second change directly adjusts the general core::fmt::write function which handles the production of format_args! to avoid zero-length push_str calls.Both changes target the fact that push_str will still call memmove internally (or a similar function), as it doesn't know the length of the passed string. For zero-length strings in particular, this is quite expensive, and even for very short (several bytes long) strings, this is also expensive. Future work in this area may wish to have us fallback to write_char or similar, which may be cheaper on the (typically) short strings between the interpolated pieces in format_args!.