Skip to content

rustdoc percent-encodes ~ in URLs, breaks links #97125

Closed
@lilyball

Description

@lilyball

~

When generating documentation with rustdoc, it appears to percent-encode ~ in link destinations. For an example, see moka 0.8.3. At the bottom of the crate documentation is a link with the title "hierarchical timer wheel". The href in the HTML is http://www.cs.columbia.edu/%7Enahum/w6998/papers/ton97-timing-wheels.pdf, note the %7E, whereas the source is

//! [timer-wheel]: http://www.cs.columbia.edu/~nahum/w6998/papers/ton97-timing-wheels.pdf

RFC 1738 declared ~ to be an "unsafe" character, but that was obsoleted 17 years ago by RFC 3986 which explicitly lists ~ as an unreserved character and says that unreserved characters should not be percent-encoded.

The fact that rustdoc encodes this is a problem because it actually breaks links. Case in point, the link from the motivating example here is broken by the percent-encoding. It shouldn't be, but not all servers percent-decode paths before interpreting them. If you click on http://www.cs.columbia.edu/%7Enahum/w6998/papers/ton97-timing-wheels.pdf you get a 404, but if you click on the originally-specified http://www.cs.columbia.edu/~nahum/w6998/papers/ton97-timing-wheels.pdf it works.

^ and other characters

I've also noticed that rustdoc percent-encodes ^, which is annoying when trying to use a link like https://docs.rs/parking_lot/^0.12/parking_lot/type.Mutex.html as it ends up looking ugly. RFC 3986 disallows ^ inside URLs, but the HTML5 spec extends the URL syntax to add ^ to the set of unreserved characters (along with other characters that RFC 3986 omitted). As such, rustdoc should target HTML5's notion of what constitutes a valid URL rather than RFC 3986's definition, as the URLs it produces will be parsed according to the HTML spec.

More generally, rustdoc should attempt to preserve the URL as it was written to the extent possible. This may in fact mean not adding any percent-encoding at all, as the URL is written directly in the markdown and RFC 3986 §2.4 specifies that under normal circumstances, URL-encoding should only be done when producing a URL from its component parts. As rustdoc is not producing a URL from component parts it should probably just leave the URL alone.

Meta

This occurs both in rust 1.60.0 and in the unstable compiler used by docs.rs (currently 1.63.0-nightly (c52b9c10b 2022-05-16)).

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-bugCategory: This is a bug.T-rustdocRelevant to the rustdoc team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions