Document comment representation is inefficient

Imagine a doc comment like this:
```
/// aaa
/// `bbb`
/// ccc
```
This gets treated like this:
```
#[doc=" aaa"]
#[doc=" `bbb`"]
#[doc=" ccc"]
```
And each line gets represented with a `TokenStream` containing two elements: a `Eq` token and a `Lit` token. These end up in metadata, and the decoding cost is quite high. Because doc comments can be long, spanning many lines, it would be nice to join them together -- much like rustdoc's collapse-span pass does, but earlier -- to reduce the metadata decoding cost.

I wrote an experimental, not-quite-right patch that condensed to the following in the parser:
```
#[doc="/// aaa\n/// `bbb`\n/// ccc"]
```
It failed lots of rustdoc tests, but the perf results were great:
```
ssue-46449-check
        avg: -16.1%     min: -16.1%     max: -16.1%
helloworld-check
        avg: -12.6%     min: -12.6%     max: -12.6%
unify-linearly-check
        avg: -10.9%     min: -10.9%     max: -10.9%
deeply-nested-check
        avg: -8.7%      min: -8.7%      max: -8.7%
await-call-tree-check
        avg: -8.1%      min: -8.1%      max: -8.1%
regression-31157-check
        avg: -4.0%      min: -4.0%      max: -4.0%
ripgrep-check
        avg: -2.7%      min: -2.7%      max: -2.7%
encoding-check
        avg: -2.4%      min: -2.4%      max: -2.4%
regex-check
        avg: -2.0%      min: -2.0%      max: -2.0%
syn-check
        avg: -1.6%      min: -1.6%      max: -1.6%
futures-check
        avg: -1.5%      min: -1.5%      max: -1.5%
clap-rs-check
        avg: -0.9%      min: -0.9%      max: -0.9%
piston-image-check
        avg: -0.9%      min: -0.9%      max: -0.9%
coercions-check
        avg: -0.8%?     min: -0.8%?     max: -0.8%?
webrender-check
        avg: -0.6%      min: -0.6%      max: -0.6%
token-stream-stress-check
        avg: -0.6%      min: -0.6%      max: -0.6%
html5ever-check
        avg: -0.6%      min: -0.6%      max: -0.6%
inflate-check
        avg: -0.4%      min: -0.4%      max: -0.4%
cargo-check
        avg: -0.4%      min: -0.4%      max: -0.4%
cranelift-codegen-check
        avg: -0.4%      min: -0.4%      max: -0.4%
serde-check
        avg: -0.3%      min: -0.3%      max: -0.3%
wg-grammar-check
        avg: -0.3%      min: -0.3%      max: -0.3%
deep-vector-check
        avg: -0.2%      min: -0.2%      max: -0.2%
```
I then modified it to be more correct, like this:
```
#[doc="/// aaa\n `bbb`\n ccc"]
```
But then I get build failures -- the doc string appears to be tokenized somewhere, and it complains about the backticks not being valid tokens ("error: unknown start of token: \`") due to the lack of `///` after the first newline. Anyone know where that tokenization might arise from?

So, there is definitely room for improvement in the representation of doc comments, but some care will be needed to keep things working. I'd love to hear suggestions on the right way to do this.

cc @rust-lang/wg-compiler-performance 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document comment representation is inefficient #65714

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Document comment representation is inefficient #65714

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions