Skip to content

Document comment representation is inefficient #65714

Closed
@nnethercote

Description

@nnethercote

Imagine a doc comment like this:

/// aaa
/// `bbb`
/// ccc

This gets treated like this:

#[doc=" aaa"]
#[doc=" `bbb`"]
#[doc=" ccc"]

And each line gets represented with a TokenStream containing two elements: a Eq token and a Lit token. These end up in metadata, and the decoding cost is quite high. Because doc comments can be long, spanning many lines, it would be nice to join them together -- much like rustdoc's collapse-span pass does, but earlier -- to reduce the metadata decoding cost.

I wrote an experimental, not-quite-right patch that condensed to the following in the parser:

#[doc="/// aaa\n/// `bbb`\n/// ccc"]

It failed lots of rustdoc tests, but the perf results were great:

ssue-46449-check
        avg: -16.1%     min: -16.1%     max: -16.1%
helloworld-check
        avg: -12.6%     min: -12.6%     max: -12.6%
unify-linearly-check
        avg: -10.9%     min: -10.9%     max: -10.9%
deeply-nested-check
        avg: -8.7%      min: -8.7%      max: -8.7%
await-call-tree-check
        avg: -8.1%      min: -8.1%      max: -8.1%
regression-31157-check
        avg: -4.0%      min: -4.0%      max: -4.0%
ripgrep-check
        avg: -2.7%      min: -2.7%      max: -2.7%
encoding-check
        avg: -2.4%      min: -2.4%      max: -2.4%
regex-check
        avg: -2.0%      min: -2.0%      max: -2.0%
syn-check
        avg: -1.6%      min: -1.6%      max: -1.6%
futures-check
        avg: -1.5%      min: -1.5%      max: -1.5%
clap-rs-check
        avg: -0.9%      min: -0.9%      max: -0.9%
piston-image-check
        avg: -0.9%      min: -0.9%      max: -0.9%
coercions-check
        avg: -0.8%?     min: -0.8%?     max: -0.8%?
webrender-check
        avg: -0.6%      min: -0.6%      max: -0.6%
token-stream-stress-check
        avg: -0.6%      min: -0.6%      max: -0.6%
html5ever-check
        avg: -0.6%      min: -0.6%      max: -0.6%
inflate-check
        avg: -0.4%      min: -0.4%      max: -0.4%
cargo-check
        avg: -0.4%      min: -0.4%      max: -0.4%
cranelift-codegen-check
        avg: -0.4%      min: -0.4%      max: -0.4%
serde-check
        avg: -0.3%      min: -0.3%      max: -0.3%
wg-grammar-check
        avg: -0.3%      min: -0.3%      max: -0.3%
deep-vector-check
        avg: -0.2%      min: -0.2%      max: -0.2%

I then modified it to be more correct, like this:

#[doc="/// aaa\n `bbb`\n ccc"]

But then I get build failures -- the doc string appears to be tokenized somewhere, and it complains about the backticks not being valid tokens ("error: unknown start of token: `") due to the lack of /// after the first newline. Anyone know where that tokenization might arise from?

So, there is definitely room for improvement in the representation of doc comments, but some care will be needed to keep things working. I'd love to hear suggestions on the right way to do this.

cc @rust-lang/wg-compiler-performance

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions