Description
Proposal (Updated on 2024-09-06)
Context
The JSON documentation for a crate often refers to items (e.g. functions, traits, types, etc.) defined in one or more its dependencies, either direct or transitive.
This can happen in a variety of scenarios:
- A local function uses a foreign type as one of input parameters or as its return type
- A foreign trait is implemented for a local type or used in a bound
- Re-exports
- Etc.
Problem
The JSON documentation for a crate doesn't provide many details when it comes to foreign items. If you want to dig deeper, you must generate the JSON documentation for the crate where they are defined and then combine the information from those two JSON documents to get a complete picture1.
In other words, you need to walk the dependency graph to perform analyses where foreign items are in scope.
To walk the graph, you need to go from "I have this foreign item" to "here's the JSON documentation for crate X, where that item was defined".
This is the difficult part: the JSON documentation for a crate contains very little information about third party crates. In particular, it only reports their names and the root URLs to their docs.
In the simplest scenario, that's not a problem: the crate name is enough to generate the JSON documentation if there is only one instance of that crate in the dependency graph.
It becomes an issue when multiple versions of the same crate appear together in the same JSON document. There will be multiple entries in external_crates
with the same name
, one for each version of that duplicated dependency.
There is no bullet-proof mechanism to disambuigate between those entries and map them back to specific package entries in the dependency tree of the "root" crate2. That's the problem this MCP tries to solve.
The proposed solution
For rustdoc
rustdoc
will provide an additional field in ExternalCrate
, named orchestrator_id
.
orchestrator_id
will be of type Option<String>
.
orchestrator_id
will be (optionally) provided by the tool orchestrating the overall build. This will be cargo
in the most common case, but it may as well be an alternative orchestrator (bazel
, buck2
, etc.). If an orchestrator decides to provide an id for each build unit, it must guarantee that those ids uniquely identify that unit within the dependency graph.
For cargo
cargo
will populate the orchestrator_id
field.
The id will be a valid string for the --package
flag in cargo
(e.g. cargo rustdoc -p <build-id>
should always work). This will allow tool builders to walk the graph without ambiguity when working with JSON documentation.
Implementation strategy
How would rustdoc
receive this orchestrator_id
, in practice?
rustdoc
looks at the rmeta
file to gather information about third party crates. We want to include orchestrator_id
in that file to get it from cargo
to rustdoc
. This requires rustc
to cooperate.
Adding a new (unstable) --orchestrator-id
option to rustc
seems to be the simplest way forward.
This could then be stabilised as is or, if preferred, moved to a different "channel" (e.g. as part of -Cmetadata
, its own -C
flag, etc.).
Mentors or Reviewers
A mentor would definitely be appreciated, I never touched any of the code involved here.
Process
The main points of the [Major Change Process][MCP] are as follows:
- File an issue describing the proposal.
- A compiler team member or contributor who is knowledgeable in the area can second by writing
@rustbot second
.- Finding a "second" suffices for internal changes. If however, you are proposing a new public-facing feature, such as a
-C flag
, then full team check-off is required. - Compiler team members can initiate a check-off via
@rfcbot fcp merge
on either the MCP or the PR.
- Finding a "second" suffices for internal changes. If however, you are proposing a new public-facing feature, such as a
- Once an MCP is seconded, the Final Comment Period begins. If no objections are raised after 10 days, the MCP is considered approved.
Comments
This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.
Notes
This proposal supercedes #622, incorporating the feedback and ideas that were surfaced in the associated Zulip stream.
Previous proposal, before first round of feedback
The problem
rustdoc
's JSON output for a crate must often refer to items (e.g. functions, traits, types, etc.) defined in one or more its dependencies.
Very little information is captured about those dependencies: their name and the root URL to their docs.
This causes issues when multiple versions of the same crate appear as direct dependencies (e.g. via renames) of the crate we are documenting—e.g. it becomes impossible to find out where items are coming from.
The proposed solution
Add a --build-id
flag to rustc
.
The value would be provided by cargo
and then captured as part of the .rlib metadata, which would in turn make it available to rustdoc
, as well as other parts of the compiler that might benefit from it (e.g. diagnostics, such as version mismatches which currently limit themselves to perhaps two different version of CRATE_NAME are being used?
).
The provided build-id
must:
- Uniquely identify the crate it points to within the current workspace;
- Be a valid string for the
--package
flag incargo
(e.g.cargo rustdoc -p <build-id>
should always work). This allows toolmakers to fetch more information on-demand when working withrustdoc
's JSON output; - Be as human-friendly as possible (e.g. use just the crate name if there is no ambiguity) in order to be used in diagnostics.
Footnotes
-
This is an intentional design choice, that's been discussed at length in the
rustdoc
team and that's been endorsed by all JSON-based tool builders that interface with the team. ↩ -
See this Zulip thread for more details. ↩