Skip to content

Semantics of char comparison operations is not clear #85776

Open
@osa1

Description

@osa1

This is possibly me being silly.. I'm currently working on something where I have to convert chars to u32s for storage, and then convert back. My code is only correct if ∀ (c1 : char) (c2 : char) . c1.cmp(c2) == (c1 as u32).cmp(c2 as u32). In other words conversion from char to u32 must preserve the ordering relation of values. (there is probably a name given to this notion of relation-preserving mapping, but I don't know what it is.. anyone know?)

Looking at char documentation, it says char represents "Unicode scalar value"s, but other than that it doesn't say what does conversion to u32 do, or whether they're represented as u32 internally, and whether comparison is simple u32 comparison (i.e. comparison of unicode scalar values) or something more complicated. It's also not easy to learn about this by looking at the source, becuase PartialOrd implementation of char relies on >, < etc. implementations (1, 2), which are I think hard-coded somewhere in the compiler.

Simple u32 comparison is probably the only sensible implementation, but that's not entirely clear to me. When I click on the "unicode scalar value" link in the documentation it doesn't say how the values must be compared. Without knowing more about unicode or "unicode scalar value" I'm not sure if e.g. a character can have multiple unicode scalar values but those need to be equal, or something like that.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-UnicodeArea: UnicodeC-enhancementCategory: An issue proposing an enhancement or a PR with one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions