Description
Summary
When the fast-path algorithm cannot be used (see #85198), Rust defaults back to a the Bellerophon algorithm, based off this paper. Examples of floats that can be correctly parsed via the Bellerophon algorithm include "9007199254740992.0"
(1 << 53
), while near-halfway cases such "9007199254740992992e-3"
must fall back to slower algorithms (just less than halfway of (1 << 53) + 1
). Unfortunately, the current implementation of the Bellerophon algorithm requires the use of arbitrary-precision arithmetic, which can lead to a 10,000x performance penalty.
Please see the "Sample Repository" below for the exact specifics, or in order to replicate these changes. This is an initial attempt as part of an ongoing effort to speed up float parsing in Rust, and aims to integrate algorithms I've implemented (currently used in nom and serde-json) back in the core library.
Issue
When parsing floating-point numbers, if the number cannot be exactly parsed by the fast-path algorithm, it falls back to an extended-precision representation (often consisting of 64-bits for the significant digits, or mantissa, and 16-bits for the exponent). For a more detailed description of halfway cases, see the halfway cases section below.
If this extended-precision algorithm can be unambiguously rounded to the nearest native float, by showing that the max error is less than the different to the nearest halfway case, then we have an accurate representation and can skip slower algorithms.
These slower algorithms make use of arbitrary-precision arithmetic to exactly represent the significant digits of the float, and therefore round to the nearest native float. The current implementation of Bellerophon, however, generates the significant digits from a big integer, which leads to significantly reduced performance.
By using a 64-bit representation of the significant digits parsed from the first 19-20 digits of the float, we can improve performance by orders of magnitude.
Halfway Cases
When parsing floats, the most significant problem is determining how to round the resulting value. The IEEE-754 standard specifies rounding to nearest, then tie even.
For example, using this rounding scheme to decimal numbers:
8.9
would round to9.0
.9.1
would round to9.0
.9.5
would round to10.0
.10.5
would round to10.0
.
With parsing from decimal strings to binary, fixed-width floating point numbers, we must round to the nearest float. This becomes tricky when values are near their halfway point. For example, with a single-precision float f32
, we would round as follows:
16777216.9
rounds to16777216.0
16777217.0
rounds to16777216.0
16777217.1
rounds to16777218.0
This is easier illustrated if we represent the float in binary. First, here's the layout of an IEEE-754 single-precision float as bits:
🟦🟩🟩🟩🟩🟩🟩🟩🟩🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪🟪
Where:
- 🟦 is the sign bit.
- 🟩 are the exponent bits.
- 🟪 are the mantissa, or significant digit, bits.
We'll ignore the exponent and sign bits right now, and only consider the mantissa, or significant digits. The lowest exponent bit, also called the hidden bit, is used as an implicit, extra bit of precision for normal floats, meaning we have 24-bits of precision. For 3 numbers, we would therefore have the following representations, where the last bit is truncated off:
16777216.0
=>100000000000000000000000 0
16777217.0
=>100000000000000000000000 1
16777218.0
=>100000000000000000000001 0
Therefore, 16777217.0
is exactly halfway between 16777216.0
and 16777218.0
. Although solving these halfway cases can superficially seem easy, simple algorithms will fail even when parsing the shortest, accurate decimal representation.
Binary Sizes
These were compiled on a target of x86_64-unknown-linux-gnu
, running kernel version 5.11.16-100
, on a Rust version of rustc 1.53.0-nightly (132b4e5d1 2021-04-13)
. The sizes reflect the binary sizes reported by ls -sh
, both before and after running the strip
command. The debug profile was used for opt-levels 0
and 1
, and was as follows:
[profile.dev]
opt-level = "..."
debug = true
lto = false
The release profile was used for opt-levels 2
, 3
, s
and z
and was as follows:
[profile.release]
opt-level = "..."
debug = false
debug-assertions = false
lto = true
core
These are the binary sizes prior to making changes.
opt-level | size | size(stripped) |
---|---|---|
0 | 3.6M | 360K |
1 | 3.5M | 316K |
2 | 1.3M | 236K |
3 | 1.3M | 248K |
s | 1.3M | 244K |
z | 1.3M | 248K |
moderate
These are the binary sizes after making changes to speed up the Bellerophon algorithm.
opt-level | size | size(stripped) |
---|---|---|
0 | 3.6M | 364K |
1 | 3.5M | 316K |
2 | 1.3M | 248K |
3 | 1.3M | 252K |
s | 1.3M | 244K |
z | 1.3M | 244K |
Performance
Overall, the changes to speed up Bellerophon algorithm led to a:
- ~-79% change in performance for the
MODERATE
float. - ~-99.7% change in performance for the
LARGE
float. - ~-99.94% change in performance for the
DENORMAL
float.
And it did not affect the performance of the fast-path algorithm.
These benchmarks were run on an i7-6560U CPU @ 2.20GHz
, on a target of x86_64-unknown-linux-gnu
, running kernel version 5.11.16-100
, on a Rust version of rustc 1.53.0-nightly (132b4e5d1 2021-04-13)
. The performance CPU governor was used for all benchmarks, and were run on A/C power with only tmux and Sublime Text open for all benchmarks. The floats that were parsed are as follows:
// Example fast-path value.
const FAST: &str = "1.2345e22";
// Example disguised fast-path value.
const DISGUISED: &str = "1.2345e30";
// Example moderate path value: clearly not halfway `1 << 53`.
const MODERATE: &str = "9007199254740992.0";
// Example exactly-halfway value `(1<<53) + 1`.
const HALFWAY: &str = "9007199254740993.0";
// Example large, near-halfway value.
const LARGE: &str = "8.988465674311580536566680e307";
// Example denormal, near-halfway value.
const DENORMAL: &str = "8.442911973260991817129021e-309";
core
These are the benchmarks prior to making changes.
float | speed |
---|---|
fast | 32.952ns |
disguised | 129.86ns |
moderate | 237.08ns |
halfway | 371.21ns |
large | 287.81us |
denormal | 122.36us |
moderate
These are the binary sizes after making changes to speed up the Bellerophon algorithm.
float | speed |
---|---|
fast | 26.668ns |
disguised | 34.599ns |
moderate | 49.378ns |
halfway | 224.81ns |
large | 796.34ns |
denormal | 63.763ns |
Correctness Concerns
There are a few correctness concerns, since this uses a potentially truncated representation of the significant digits for error calculation. I've therefore made the error detection stricter, so it rejects more halfway cases than before and correctly compounds error with truncated cases and non-normalized representations after multiplication.
In practice, this only rejects a handful of cases that would be normally accept by the algorithm, with a major benefit to overall performance.
I've also extended the powers-of-10 to handle denormal floats, as well as values that could lead to Infinity, and updated the internal logic to ensure correct rounding.
This passes all of Rust's float parsing tests, as well as carefully crafted examples to try to detect errors, and therefore is unlikely to have any correctness issues.
Sample Repository
I've created a simple, minimal repository tracking these changes on rust-dec2flt, which has a core branch that is identical to Rust's current implementation in the core library. The moderate branch contains the changes to improve parsing speeds for Bellerophon algorithm. This currently relies on changes made to infer binary exponents, however, can be trivially re-written to explicitly store them.