Skip to content

AMDGPU][True16][CodeGen] FP_Round f64 to f16 in true16 #128911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 8, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 10 additions & 4 deletions llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3579,15 +3579,22 @@ SDValue AMDGPUTargetLowering::LowerFP_TO_FP16(SDValue Op, SelectionDAG &DAG) con
return SDValue();
}

assert(N0.getSimpleValueType() == MVT::f64);
return LowerF64ToF16Safe(N0, DL, DAG);
}

// return node in i32
SDValue AMDGPUTargetLowering::LowerF64ToF16Safe(SDValue Src, const SDLoc &DL,
SelectionDAG &DAG) const {
assert(Src.getSimpleValueType() == MVT::f64);

// f64 -> f16 conversion using round-to-nearest-even rounding mode.
// TODO: We can generate better code for True16.
const unsigned ExpMask = 0x7ff;
const unsigned ExpBiasf64 = 1023;
const unsigned ExpBiasf16 = 15;
SDValue Zero = DAG.getConstant(0, DL, MVT::i32);
SDValue One = DAG.getConstant(1, DL, MVT::i32);
SDValue U = DAG.getNode(ISD::BITCAST, DL, MVT::i64, N0);
SDValue U = DAG.getNode(ISD::BITCAST, DL, MVT::i64, Src);
SDValue UH = DAG.getNode(ISD::SRL, DL, MVT::i64, U,
DAG.getConstant(32, DL, MVT::i64));
UH = DAG.getZExtOrTrunc(UH, DL, MVT::i32);
Expand Down Expand Up @@ -3661,8 +3668,7 @@ SDValue AMDGPUTargetLowering::LowerFP_TO_FP16(SDValue Op, SelectionDAG &DAG) con
Sign = DAG.getNode(ISD::AND, DL, MVT::i32, Sign,
DAG.getConstant(0x8000, DL, MVT::i32));

V = DAG.getNode(ISD::OR, DL, MVT::i32, Sign, V);
return DAG.getZExtOrTrunc(V, DL, Op.getValueType());
return DAG.getNode(ISD::OR, DL, MVT::i32, Sign, V);
}

SDValue AMDGPUTargetLowering::LowerFP_TO_INT(const SDValue Op,
Expand Down
3 changes: 3 additions & 0 deletions llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,9 @@ class AMDGPUTargetLowering : public TargetLowering {
SDValue LowerFP_TO_FP16(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFP_TO_INT(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerF64ToF16Safe(SDValue Src, const SDLoc &DL,
SelectionDAG &DAG) const;

SDValue LowerSIGN_EXTEND_INREG(SDValue Op, SelectionDAG &DAG) const;

protected:
Expand Down
12 changes: 11 additions & 1 deletion llvm/lib/Target/AMDGPU/SIISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6903,7 +6903,17 @@ SDValue SITargetLowering::lowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const {
if (Op.getOpcode() != ISD::FP_ROUND)
return Op;

SDValue FpToFp16 = DAG.getNode(ISD::FP_TO_FP16, DL, MVT::i32, Src);
if (!Subtarget->has16BitInsts()) {
SDValue FpToFp16 = DAG.getNode(ISD::FP_TO_FP16, DL, MVT::i32, Src);
SDValue Trunc = DAG.getNode(ISD::TRUNCATE, DL, MVT::i16, FpToFp16);
return DAG.getNode(ISD::BITCAST, DL, MVT::f16, Trunc);
}
Comment on lines +6906 to +6910
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this. FP_TO_FP16 does the right thing for an f64 source? Should the unsafe path be before this?

Copy link
Contributor Author

@broxigarchen broxigarchen Apr 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Matt. Before this patch, FP_TO_FP16 is used for all targets for FP_ROUND f64 source so I think it suppose to do the right thing. However, I took a quick look at lit tests and we seems don't have any lit test that covers FP_ROUND f64 source in the old targets that do not support f16 types.

So I would say the FP_TO_FP16 here keeps the same code path for old targets. I think it's better to merge as it and If we want to add test coverage or any fixfor old targets we can do it in another patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Matt. Do you have any comment on this? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we seems don't have any lit test that covers FP_ROUND f64 source in the old targets that do not support f16 types.

This is covered, fptrunc.ll has it. fptrunc.f16.ll should also cover it, but that one brokenly only tests with -enable-unsafe-fp-math

It seems to work because we have explicit custom lowering f64 source. Seems like an unnecessary complication over the default expansion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I just realized SI is gfx6. Will create another patch to address these

if (getTargetMachine().Options.UnsafeFPMath) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in a follow up should move to use the flags from the node

SDValue Flags = Op.getOperand(1);
SDValue Src32 = DAG.getNode(ISD::FP_ROUND, DL, MVT::f32, Src, Flags);
return DAG.getNode(ISD::FP_ROUND, DL, MVT::f16, Src32, Flags);
}
SDValue FpToFp16 = LowerF64ToF16Safe(Src, DL, DAG);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still think this is an awkward split

SDValue Trunc = DAG.getNode(ISD::TRUNCATE, DL, MVT::i16, FpToFp16);
return DAG.getNode(ISD::BITCAST, DL, MVT::f16, Trunc);
}
Expand Down
Loading