Skip to content

chars/bytes confusion in the error emitter #44080

Closed
@est31

Description

@est31

src/librustc_errors/snippet.rs has big comment saying that the column info is provided in characters, not in bytes. However, the error emitter doesn't care about that at all and uses these like byte offsets all over the place. This leads to bugs like #44023 and #44078 .

As an example, look how span printing varies with varying characters used:

Correct case:

12 |       "B   "";
   |  ___________^

Now add an emoji character:

12 |       "😊   "";
   |  ___________^

Note how its off by one char now. This can stack up:

12 |       "😊😊😊😊   "";
   |  ______________^

If I didn't use any spaces at all, I'd run into #44078.

Now this can be fixed by going through the emitter code and looking for all places where the pos is used in a byte position fashion. A much more proper fix instead is to stop trusting that people read comments and encode this via the type system. There is already a mechanism for that inside the compiler, its libsyntax_pos::CharPos! Just convert the types of start_col, end_col members of the MultilineAnnotation and Annotation structs to CharPos, or maybe to BytePos if that's preferred.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-diagnosticsArea: Messages for errors, warnings, and lintsC-bugCategory: This is a bug.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions