Skip to content

Add a new --orchestrator-id flag to rustc #635

Closed
@LukeMathWalker

Description

@LukeMathWalker

Proposal (Updated on 2024-09-06)

Context

The JSON documentation for a crate often refers to items (e.g. functions, traits, types, etc.) defined in one or more its dependencies, either direct or transitive.
This can happen in a variety of scenarios:

  • A local function uses a foreign type as one of input parameters or as its return type
  • A foreign trait is implemented for a local type or used in a bound
  • Re-exports
  • Etc.

Problem

The JSON documentation for a crate doesn't provide many details when it comes to foreign items. If you want to dig deeper, you must generate the JSON documentation for the crate where they are defined and then combine the information from those two JSON documents to get a complete picture1.
In other words, you need to walk the dependency graph to perform analyses where foreign items are in scope.

To walk the graph, you need to go from "I have this foreign item" to "here's the JSON documentation for crate X, where that item was defined".
This is the difficult part: the JSON documentation for a crate contains very little information about third party crates. In particular, it only reports their names and the root URLs to their docs.
In the simplest scenario, that's not a problem: the crate name is enough to generate the JSON documentation if there is only one instance of that crate in the dependency graph.
It becomes an issue when multiple versions of the same crate appear together in the same JSON document. There will be multiple entries in external_crates with the same name, one for each version of that duplicated dependency.
There is no bullet-proof mechanism to disambuigate between those entries and map them back to specific package entries in the dependency tree of the "root" crate2. That's the problem this MCP tries to solve.

The proposed solution

For rustdoc

rustdoc will provide an additional field in ExternalCrate, named orchestrator_id.
orchestrator_id will be of type Option<String>.
orchestrator_id will be (optionally) provided by the tool orchestrating the overall build. This will be cargo in the most common case, but it may as well be an alternative orchestrator (bazel, buck2, etc.). If an orchestrator decides to provide an id for each build unit, it must guarantee that those ids uniquely identify that unit within the dependency graph.

For cargo

cargo will populate the orchestrator_id field.
The id will be a valid string for the --package flag in cargo (e.g. cargo rustdoc -p <build-id> should always work). This will allow tool builders to walk the graph without ambiguity when working with JSON documentation.

Implementation strategy

How would rustdoc receive this orchestrator_id, in practice?
rustdoc looks at the rmeta file to gather information about third party crates. We want to include orchestrator_id in that file to get it from cargo to rustdoc. This requires rustc to cooperate.

Adding a new (unstable) --orchestrator-id option to rustc seems to be the simplest way forward.
This could then be stabilised as is or, if preferred, moved to a different "channel" (e.g. as part of -Cmetadata, its own -C flag, etc.).

Mentors or Reviewers

A mentor would definitely be appreciated, I never touched any of the code involved here.

Process

The main points of the [Major Change Process][MCP] are as follows:

  • File an issue describing the proposal.
  • A compiler team member or contributor who is knowledgeable in the area can second by writing @rustbot second.
    • Finding a "second" suffices for internal changes. If however, you are proposing a new public-facing feature, such as a -C flag, then full team check-off is required.
    • Compiler team members can initiate a check-off via @rfcbot fcp merge on either the MCP or the PR.
  • Once an MCP is seconded, the Final Comment Period begins. If no objections are raised after 10 days, the MCP is considered approved.

Comments

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

Notes

This proposal supercedes #622, incorporating the feedback and ideas that were surfaced in the associated Zulip stream.

Previous proposal, before first round of feedback

The problem

rustdoc's JSON output for a crate must often refer to items (e.g. functions, traits, types, etc.) defined in one or more its dependencies.

Very little information is captured about those dependencies: their name and the root URL to their docs.

This causes issues when multiple versions of the same crate appear as direct dependencies (e.g. via renames) of the crate we are documenting—e.g. it becomes impossible to find out where items are coming from.

The proposed solution

Add a --build-id flag to rustc.
The value would be provided by cargo and then captured as part of the .rlib metadata, which would in turn make it available to rustdoc, as well as other parts of the compiler that might benefit from it (e.g. diagnostics, such as version mismatches which currently limit themselves to perhaps two different version of CRATE_NAME are being used?).

The provided build-id must:

  • Uniquely identify the crate it points to within the current workspace;
  • Be a valid string for the --package flag in cargo (e.g. cargo rustdoc -p <build-id> should always work). This allows toolmakers to fetch more information on-demand when working with rustdoc's JSON output;
  • Be as human-friendly as possible (e.g. use just the crate name if there is no ambiguity) in order to be used in diagnostics.

Footnotes

  1. This is an intentional design choice, that's been discussed at length in the rustdoc team and that's been endorsed by all JSON-based tool builders that interface with the team.

  2. See this Zulip thread for more details.

Metadata

Metadata

Assignees

No one assigned

    Labels

    T-compilerAdd this label so rfcbot knows to poll the compiler teamdisposition-mergeThe FCP starter wants to merge thisfinished-final-comment-periodThe FCP has finished, action upon the disposition label needs to be takenmajor-changeA proposal to make a major change to rustcmajor-change-acceptedA major change proposal that was acceptedproposed-final-comment-periodAn FCP has been started, cast your votes and raise concerns

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions