Description
Recently I came up with floating point variable classification. I found the function f32::classify
useful. Regardless of instruction set architecture, this function is implemented currently like this in standard library:
#[stable(feature = "rust1", since = "1.0.0")]
pub fn classify(self) -> FpCategory {
const EXP_MASK: u32 = 0x7f800000;
const MAN_MASK: u32 = 0x007fffff;
let bits = self.to_bits();
match (bits & MAN_MASK, bits & EXP_MASK) {
(0, 0) => FpCategory::Zero,
(_, 0) => FpCategory::Subnormal,
(0, EXP_MASK) => FpCategory::Infinite,
(_, EXP_MASK) => FpCategory::Nan,
_ => FpCategory::Normal,
}
}
However, this standard library's function compiles to very long bunches of instructions. On RISC-V RV64GC, it compiles into:
very long assembly code
example::classify_std:
fmv.w.x ft0, a0
fsw ft0, -20(s0)
lwu a0, -20(s0)
sd a0, -48(s0)
j .LBB0_1
.LBB0_1:
lui a0, 2048
addiw a0, a0, -1
ld a1, -48(s0)
and a0, a0, a1
lui a2, 522240
and a2, a2, a1
sw a0, -32(s0)
sw a2, -28(s0)
mv a2, zero
bne a0, a2, .LBB0_3
j .LBB0_2
.LBB0_2:
lw a0, -28(s0)
mv a1, zero
beq a0, a1, .LBB0_7
j .LBB0_3
.LBB0_3:
lwu a0, -28(s0)
mv a1, zero
sd a0, -56(s0)
beq a0, a1, .LBB0_8
j .LBB0_4
.LBB0_4:
lui a0, 522240
ld a1, -56(s0)
bne a1, a0, .LBB0_6
j .LBB0_5
.LBB0_5:
lw a0, -32(s0)
mv a1, zero
beq a0, a1, .LBB0_9
j .LBB0_10
.LBB0_6:
addi a0, zero, 4
sb a0, -33(s0)
j .LBB0_11
.LBB0_7:
addi a0, zero, 2
sb a0, -33(s0)
j .LBB0_11
.LBB0_8:
addi a0, zero, 3
sb a0, -33(s0)
j .LBB0_11
.LBB0_9:
addi a0, zero, 1
sb a0, -33(s0)
j .LBB0_11
.LBB0_10:
mv a0, zero
sb a0, -33(s0)
j .LBB0_11
.LBB0_11:
lb a0, -33(s0)
ret
To solve this problem, RISC-V provided with fclass.{s|d|q}
instructions. According to RISC-V's spec Section 11.9, instruction fclass.s rd, rs1
examines rs1
as 32-bit floating point number, and store its type into rd
. By this way we use register rd
and to judge which enum value of Rust standard library we should return.
I'd like to explain this procedure in Rust code. The new way looks like this:
pub fn classify_riscv_rvf(input: f32) -> FpCategory {
let ans: usize;
// step 1: map this f32 value into RISC-V defined integer type number
// this procedure could be built in into compiler
unsafe { llvm_asm!(
"fclass.s a0, fa0"
:"={a0}"(ans)
:"{fa0}"(input)
:
:"intel"
) };
// step 2: convert from return flags to FpCategory enum value
if ans & 0b10000001 != 0 {
return FpCategory::Infinite;
}
if ans & 0b01000010 != 0 {
return FpCategory::Normal;
}
if ans & 0b00100100 != 0 {
return FpCategory::Subnormal;
}
if ans & 0b00011000 != 0 {
return FpCategory::Zero;
}
FpCategory::Nan
}
It compiles into the following assembly code which is shorter and could be executed faster:
example::classify_riscv_rvf:
fclass.s a0, fa0
andi a2, a0, 129
addi a1, zero, 1
beqz a2, .LBB0_2
.LBB0_1:
add a0, zero, a1
ret
.LBB0_2:
andi a2, a0, 66
addi a1, zero, 4
bnez a2, .LBB0_1
andi a2, a0, 36
addi a1, zero, 3
bnez a2, .LBB0_1
andi a0, a0, 24
snez a0, a0
slli a1, a0, 1
add a0, zero, a1
ret
For f64
types, we could use fclass.d
instruction other than fclass.s
for f32
s. If in the future we had a chance to introduce f128
primitive type, there's also fclass.q
instruction. After using these instructions, it improves speed on this function in RISC-V platforms. This enhancement is especially significant for embedded devices. I suggest to change the implementation of this function in the standard library. We may implement it by any of following ways:
- Implement
fclassf32
andfclassf64
intrinsics function intocore::instrinsics
, and call them inf32::classify
orf64::classify
. These functions can be implemented with special instruction or fallback in other platforms; - Use inline assembly directly in standard library and add a
#[cfg(..)]
to compile it only in RISC-V targets with floating extensionF
orD
respectively, or fallback in other platforms.