Description
Proposal
Problem statement
Implementation of to_digit
is convoluted and inefficiently does too much. Yet it still accepts radices 0
& 1
. While there is no such system as nullary, unary is a totally different scheme, with only digit 1
. That is not implemented by this function. All numbers Rust deals with in any base are in positional notation. There 0
is always the smallest digit. And incrementing the biggest digit gives 10. The smallest radix for which this is possible is two.
Motivating examples or use cases
It is inefficient to reassert the radix for each digit again and again. I propose to split this function into a wrapper to_digit
that does its due diligence and a worker to_digit_unchecked
. I have checked all callers of to_digit
, to see where it is already safe, or can be made safe, to switch to to_digit_unchecked
. Maybe each time a comment should be added to the place that guarantees a valid radix, to avert accidentally eliminating the guarantee in future:
bounds already checked outside of loop, so can switch:
- library/core/src/num/mod.rs 2x
literal base in a variable, only valid values, so can switch:
- compiler/rustc_session/src/errors.rs 1x
- rustc_parse/src/lexer/mod.rs 1x
in a loop, should add bounds check and switch:
- library/core/src/net/parser.rs 2x
literal radices, where the compiler can hopefully eliminate the assert
s, so no need, but might do it to save compile time:
- compiler/rustc_lexer/src/unescape.rs 4x
- librustdoc/html/render/print_item.rs 6x
- rustc_parse_format/src/lib.rs 1x
- many test files
Solution sketch
I propose three variants to choose the style and assumed efficiency you see fit:
pub const fn to_digit(self, radix: u32) -> Option<u32> {
assert!(radix >= 2, "to_digit: radix is too low (minimum 2)");
assert!(radix <= 36, "to_digit: radix is too high (maximum 36)");
self.to_digit_unchecked(radix)
}
/// ### Correctness
///
/// Callers of this function are responsible that this precondition is satisfied:
///
/// - The radix must be between 2 and 36 inclusive.
///
/// Failing that, the returned value cannot be relied on.
#[inline]
// semi-branchless variant
pub const fn to_digit_unchecked(self, radix: u32) -> Option<u32> {
let is_digit = (self <= '9') as u32;
let digit =
is_digit * // branchless if
// If not a digit, a number greater than radix will be created.
(self as u32).wrapping_sub('0' as u32)
+ (1 - is_digit) * // branchless else
// Force the 6th bit to be set to ensure ascii is lower case.
(self as u32 | 0b10_0000).wrapping_sub('a' as u32).saturating_add(10);
// FIXME: once then_some is const fn, use it here
if digit < radix { Some(digit) } else { None }
}
// conventional variant with match
pub const fn to_digit_unchecked(self, radix: u32) -> Option<u32> {
let digit =
match self {
// If not a digit, a number greater than radix will be created.
..='9' => (self as u32).wrapping_sub('0' as u32),
// Force the 6th bit to be set to ensure ascii is lower case.
_ => (self as u32 | 0b10_0000).wrapping_sub('a' as u32).saturating_add(10)
};
// FIXME: once then_some is const fn, use it here
if digit < radix { Some(digit) } else { None }
}
// conventional variant with if
pub const fn to_digit_unchecked(self, radix: u32) -> Option<u32> {
let digit =
if self <= '9' {
// If not a digit, a number greater than radix will be created.
(self as u32).wrapping_sub('0' as u32)
} else {
// Force the 6th bit to be set to ensure ascii is lower case.
(self as u32 | 0b10_0000).wrapping_sub('a' as u32).saturating_add(10)
};
// FIXME: once then_some is const fn, use it here
if digit < radix { Some(digit) } else { None }
}
Links and related work
This discussion claims that the new bounds check might break backwards compatibility. I doubt that code would rely on a broken implementation. But if that is a concern, the new assert
could be activated with edition 2024. OTOH, with that rationale one could never fix bugs…