Description
Hi, I'm the author of FastStr
crate, and recently I found a wired problem that the clone cost of FastStr
is really high. For example, an empty FastStr
clone costs about 40ns on amd64 compared to about 4ns of a normal String.
The FastStr
itself is a newtype of the inner Repr
, which previously has the following layout:
#[derive(Clone)]
pub type FastStr(Repr);
#[cfg(all(test, target_pointer_width = "64"))]
mod size_asserts {
static_assertions::assert_eq_size!(super::FastStr, [u8; 40]); // 40 bytes
}
const INLINE_CAP: usize = 38;
#[derive(Clone)]
enum Repr {
Empty,
Bytes(Bytes),
ArcStr(Arc<str>),
ArcString(Arc<String>),
StaticStr(&'static str),
Inline { len: u8, buf: [u8; INLINE_CAP] },
}
Playground link for old version
After some time of investigation, I found that this is because the Repr::Inline
part has really great affect on the performance. And after I added a padding to the Repr::Inline
variant(change the type of len
from u8
to usize
), the performance of clone a Repr::Empty
(and other variants all) boosts about 9x from 40ns to 4ns. But the root cause is still not clear:
const INLINE_CAP: usize = 24; // This is becuase I don't want to enlarge the size of FastStr
#[derive(Clone)]
enum Repr {
Empty,
Bytes(Bytes),
ArcStr(Arc<str>),
ArcString(Arc<String>),
StaticStr(&'static str),
Inline { len: usize, buf: [u8; INLINE_CAP] },
}
Playground link for new version
A simple criterion benchmark code for the old version:
use bytes::Bytes;
use std::sync::Arc;
use criterion::{black_box, criterion_group, criterion_main, Criterion};
const INLINE_CAP: usize = 38;
#[derive(Clone)]
enum Repr {
Empty,
Bytes(Bytes),
ArcStr(Arc<str>),
ArcString(Arc<String>),
StaticStr(&'static str),
Inline { len: u8, buf: [u8; INLINE_CAP] },
}
fn criterion_benchmark(c: &mut Criterion) {
let s = Repr::Empty;
c.bench_function("empty repr", |b| b.iter(|| black_box(s.clone())));
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
For a full benchmark, you may refer to: https://github.com/volo-rs/faststr/blob/main/benches/faststr.rs
Related PR: volo-rs/faststr#6
And commit: volo-rs/faststr@342bdc9
Furthermore, I've tried the following methods, but none helps:
- only change
INLINE_CAP
to 24 - change
INLINE_CAP
to 22 and added a padding to the Inline variant:Inline {_pad: u64,len: u8,buf: [u8; INLINE_CAP],},
- change
INLINE_CAP
to 22 and add a new structInline
without the_pad
field
To change the INLINE_CAP
to 22 is only for not increasing the size of FastStr
itself when add an extra padding, so the performance is nothing to do with it.
Edit: related discussions users.rust-lang.org, reddit