Skip to content

Optimise and bug-fix to_digit #475

Closed
@daniel-pfeiffer

Description

@daniel-pfeiffer

Proposal

Problem statement

Implementation of to_digit is convoluted and inefficiently does too much. Yet it still accepts radices 0 & 1. While there is no such system as nullary, unary is a totally different scheme, with only digit 1. That is not implemented by this function. All numbers Rust deals with in any base are in positional notation. There 0 is always the smallest digit. And incrementing the biggest digit gives 10. The smallest radix for which this is possible is two.

Motivating examples or use cases

It is inefficient to reassert the radix for each digit again and again. I propose to split this function into a wrapper to_digit that does its due diligence and a worker to_digit_unchecked. I have checked all callers of to_digit, to see where it is already safe, or can be made safe, to switch to to_digit_unchecked. Maybe each time a comment should be added to the place that guarantees a valid radix, to avert accidentally eliminating the guarantee in future:

bounds already checked outside of loop, so can switch:

  • library/core/src/num/mod.rs 2x

literal base in a variable, only valid values, so can switch:

  • compiler/rustc_session/src/errors.rs 1x
  • rustc_parse/src/lexer/mod.rs 1x

in a loop, should add bounds check and switch:

  • library/core/src/net/parser.rs 2x

literal radices, where the compiler can hopefully eliminate the asserts, so no need, but might do it to save compile time:

  • compiler/rustc_lexer/src/unescape.rs 4x
  • librustdoc/html/render/print_item.rs 6x
  • rustc_parse_format/src/lib.rs 1x
  • many test files

Solution sketch

I propose three variants to choose the style and assumed efficiency you see fit:

pub const fn to_digit(self, radix: u32) -> Option<u32> {
    assert!(radix >= 2, "to_digit: radix is too low (minimum 2)");
    assert!(radix <= 36, "to_digit: radix is too high (maximum 36)");
    self.to_digit_unchecked(radix)
}

/// ### Correctness
///
/// Callers of this function are responsible that this precondition is satisfied:
///
/// - The radix must be between 2 and 36 inclusive.
///
/// Failing that, the returned value cannot be relied on.
#[inline]
// semi-branchless variant
pub const fn to_digit_unchecked(self, radix: u32) -> Option<u32> {
    let is_digit = (self <= '9') as u32;
    let digit =
        is_digit * // branchless if
            // If not a digit, a number greater than radix will be created.
            (self as u32).wrapping_sub('0' as u32)
        + (1 - is_digit) * // branchless else
            // Force the 6th bit to be set to ensure ascii is lower case.
            (self as u32 | 0b10_0000).wrapping_sub('a' as u32).saturating_add(10);
    // FIXME: once then_some is const fn, use it here
    if digit < radix { Some(digit) } else { None }
}

// conventional variant with match
pub const fn to_digit_unchecked(self, radix: u32) -> Option<u32> {
    let digit =
        match self {
            // If not a digit, a number greater than radix will be created.
            ..='9' => (self as u32).wrapping_sub('0' as u32),

            // Force the 6th bit to be set to ensure ascii is lower case.
            _ => (self as u32 | 0b10_0000).wrapping_sub('a' as u32).saturating_add(10)
        };
    // FIXME: once then_some is const fn, use it here
    if digit < radix { Some(digit) } else { None }
}

// conventional variant with if
pub const fn to_digit_unchecked(self, radix: u32) -> Option<u32> {
    let digit =
        if self <= '9' {
            // If not a digit, a number greater than radix will be created.
            (self as u32).wrapping_sub('0' as u32)
        } else {
            // Force the 6th bit to be set to ensure ascii is lower case.
            (self as u32 | 0b10_0000).wrapping_sub('a' as u32).saturating_add(10)
        };
    // FIXME: once then_some is const fn, use it here
    if digit < radix { Some(digit) } else { None }
}

Links and related work

rust-lang/rust#132428

This discussion claims that the new bounds check might break backwards compatibility. I doubt that code would rely on a broken implementation. But if that is a concern, the new assert could be activated with edition 2024. OTOH, with that rationale one could never fix bugs…

Metadata

Metadata

Assignees

No one assigned

    Labels

    T-libs-apiapi-change-proposalA proposal to add or alter unstable APIs in the standard libraries

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions