Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions llvm/include/llvm/ADT/APFloat.h
Original file line number Diff line number Diff line change
Expand Up @@ -412,6 +412,10 @@ class APFloatBase {
/// format interpretation for llvm.convert.to.arbitrary.fp and
/// llvm.convert.from.arbitrary.fp intrinsics.
LLVM_ABI static bool isValidArbitraryFPFormat(StringRef Format);

/// Returns the fltSemantics for a given arbitrary FP format string,
/// or nullptr if invalid.
LLVM_ABI static const fltSemantics *getArbitraryFPSemantics(StringRef Format);
};

namespace detail {
Expand Down
6 changes: 6 additions & 0 deletions llvm/include/llvm/CodeGen/ISDOpcodes.h
Original file line number Diff line number Diff line change
Expand Up @@ -1014,6 +1014,12 @@ enum NodeType {
STRICT_BF16_TO_FP,
STRICT_FP_TO_BF16,

/// CONVERT_FROM_ARBITRARY_FP - This operator converts from an arbitrary
/// floating-point represented as an integer to a native FP type.
/// The first operand is the integer containing the source FP bits.
/// The second operand is a constant indicating the source FP semantics.
CONVERT_FROM_ARBITRARY_FP,

/// Perform various unary floating-point operations inspired by libm. For
/// FPOWI, the result is undefined if the integer operand doesn't fit into
/// sizeof(int).
Expand Down
237 changes: 237 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3528,6 +3528,243 @@ bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
Results.push_back(Op);
break;
}
case ISD::CONVERT_FROM_ARBITRARY_FP: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can this be a LLVM-level rewrite (see something like LowerBufferFatPointers) instead? That'd allow for more optimizations of what's already somewhat complex bit manipulation? If there's a target-specific semantic you want to let through, I'd add TargetTransformInfo::isLegalArbitraryFpConversion
  2. Especially if that gets moved out of DAG->DAG, it should account for fast-math

Copy link
Contributor Author

@MrSidims MrSidims Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add target specific lowering as well for AMDGPU and SPIR-V backends in follow up patches. In SPIR-V there is a public multi-vendor extension SPV_EXT_float8 that introduces fp8 <-> ieee float conversions. For AMDGPU I see quite a few entries in VOP3Instructions.td, but it requires me to do some research to see what intructions from there can be mapped to llvm.convert.from.arbitrary.fp and llvm.convert.to.arbitrary.fp (the later will be added right after this PR is merged as it reuses some of the utility functions) intrinsics.

Also I believe there are NVPTX capabilities for this and there is SPV_INTEL_float4 extension, but those will be covered by appropriate folks I guess.

Can this be a LLVM-level rewrite (see something like LowerBufferFatPointers) instead

Depending on where to place such rewriting. If in middle end - it would require a pass to check which target triple the module has to skip those targets which have native support. I personally don't mind such skip, but I know that some folks would object, saying that it's not LLVM way.

Copy link
Contributor

@arsenm arsenm Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mixed views about IR expansions. I simultaneously think they're a hack, and we'd be better off if we did more legalization on IR than in SelectionDAG/GISel. These cases do not require control flow, so they can follow SOP and do the DAG expansion. The downsides of the DAG expansion is that DAG combiner will always be worse than any IR optimizations.

They also add an unstructured ordering property to the "modular IR"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re the fp8 operations, I can advise on those - I did a lot of the plumbing up in MLIR

This also means we'll probably have the awkward impedance mismatch where the AMDGPU target really dislikes APIs that have the from <N x i8> but that'll be the best form for this sort of convert.from.arbitrary intrinsic to represent vectors.

(I'm also going to flag the interesting notes that many of these operations come in vector flavors - stuff like "take 16 bits, treat that as 2 x fp8, and expand that to 2 x float" - or, really, take a specified half of 32 bits". That's all stuf that can be done in the optimizer, but it's a long-term hazard.)

// Expand conversion from arbitrary FP format stored in an integer to a
// native IEEE float type using integer bit manipulation.
//
// TODO: currently only conversions from FP4, FP6 and FP8 formats from OCP
// specification are expanded. Remaining arbitrary FP types: Float8E4M3,
// Float8E3M4, Float8E5M2FNUZ, Float8E4M3FNUZ, Float8E4M3B11FNUZ,
// Float8E8M0FNU.
EVT DstVT = Node->getValueType(0);

SDValue IntVal = Node->getOperand(0);
const uint64_t SemEnum = Node->getConstantOperandVal(1);
const auto Sem = static_cast<APFloatBase::Semantics>(SemEnum);

// Supported source formats.
switch (Sem) {
case APFloatBase::S_Float8E5M2:
case APFloatBase::S_Float8E4M3FN:
case APFloatBase::S_Float6E3M2FN:
case APFloatBase::S_Float6E2M3FN:
case APFloatBase::S_Float4E2M1FN:
break;
default:
DAG.getContext()->emitError("CONVERT_FROM_ARBITRARY_FP: not implemented "
"source format (semantics enum " +
Twine(SemEnum) + ")");
Results.push_back(DAG.getPOISON(DstVT));
break;
}
if (!Results.empty())
break;

const fltSemantics &SrcSem = APFloatBase::EnumToSemantics(Sem);

const unsigned SrcBits = APFloat::getSizeInBits(SrcSem);
const unsigned SrcPrecision = APFloat::semanticsPrecision(SrcSem);
const unsigned SrcMant = SrcPrecision - 1;
const unsigned SrcExp = SrcBits - SrcMant - 1;
const int SrcBias = 1 - APFloat::semanticsMinExponent(SrcSem);

const fltNonfiniteBehavior NFBehavior = SrcSem.nonFiniteBehavior;
const fltNanEncoding NanEnc = SrcSem.nanEncoding;

// Destination format parameters.
const fltSemantics &DstSem = DstVT.getFltSemantics();

const unsigned DstBits = APFloat::getSizeInBits(DstSem);
const unsigned DstMant = APFloat::semanticsPrecision(DstSem) - 1;
const unsigned DstExpBits = DstBits - DstMant - 1;
const int DstMinExp = APFloat::semanticsMinExponent(DstSem);
const int DstBias = 1 - DstMinExp;
const uint64_t DstExpAllOnes = (1ULL << DstExpBits) - 1;

// Work in an integer type matching the destination float width.
// Use zero-extend to preserve the raw bit-pattern.
EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), DstBits);
SDValue Src = DAG.getZExtOrTrunc(IntVal, dl, IntVT);

EVT SetCCVT = getSetCCResultType(IntVT);

SDValue Zero = DAG.getConstant(0, dl, IntVT);
SDValue One = DAG.getConstant(1, dl, IntVT);

// Extract bit fields.
const uint64_t MantMask = (SrcMant > 0) ? ((1ULL << SrcMant) - 1) : 0;
const uint64_t ExpMask = (1ULL << SrcExp) - 1;

SDValue MantField = DAG.getNode(ISD::AND, dl, IntVT, Src,
DAG.getConstant(MantMask, dl, IntVT));

SDValue ExpField =
DAG.getNode(ISD::AND, dl, IntVT,
DAG.getNode(ISD::SRL, dl, IntVT, Src,
DAG.getShiftAmountConstant(SrcMant, IntVT, dl)),
DAG.getConstant(ExpMask, dl, IntVT));

SDValue SignBit =
DAG.getNode(ISD::SRL, dl, IntVT, Src,
DAG.getShiftAmountConstant(SrcBits - 1, IntVT, dl));

// Precompute sign shifted to MSB of destination.
SDValue SignShifted =
DAG.getNode(ISD::SHL, dl, IntVT, SignBit,
DAG.getShiftAmountConstant(DstBits - 1, IntVT, dl));

// Classify the input value based on compile-time format properties.
SDValue ExpAllOnes = DAG.getConstant(ExpMask, dl, IntVT);
SDValue IsExpAllOnes =
DAG.getSetCC(dl, SetCCVT, ExpField, ExpAllOnes, ISD::SETEQ);
SDValue IsExpZero = DAG.getSetCC(dl, SetCCVT, ExpField, Zero, ISD::SETEQ);
SDValue IsMantZero = DAG.getSetCC(dl, SetCCVT, MantField, Zero, ISD::SETEQ);
SDValue IsMantNonZero =
DAG.getSetCC(dl, SetCCVT, MantField, Zero, ISD::SETNE);

// NaN detection.
SDValue IsNaN;
if (NFBehavior == fltNonfiniteBehavior::FiniteOnly) {
// FiniteOnly formats (E2M1FN, E3M2FN, E2M3FN) never produce NaN.
IsNaN = DAG.getBoolConstant(false, dl, SetCCVT, IntVT);
} else if (NFBehavior == fltNonfiniteBehavior::IEEE754) {
// E5M2 produces NaN when exp == all-ones AND mantissa != 0.
IsNaN = DAG.getNode(ISD::AND, dl, SetCCVT, IsExpAllOnes, IsMantNonZero);
} else {
// NanOnly + AllOnes (E4M3FN): NaN when all exp and mantissa bits are 1.
assert(NanEnc == fltNanEncoding::AllOnes);
SDValue MantAllOnes = DAG.getConstant(MantMask, dl, IntVT);
SDValue IsMantAllOnes =
DAG.getSetCC(dl, SetCCVT, MantField, MantAllOnes, ISD::SETEQ);
IsNaN = DAG.getNode(ISD::AND, dl, SetCCVT, IsExpAllOnes, IsMantAllOnes);
}

// Inf detection.
SDValue IsInf;
if (NFBehavior == fltNonfiniteBehavior::IEEE754) {
// E5M2: Inf when exp == all-ones AND mantissa == 0.
IsInf = DAG.getNode(ISD::AND, dl, SetCCVT, IsExpAllOnes, IsMantZero);
} else {
// NanOnly and FiniteOnly formats have no Inf representation.
IsInf = DAG.getBoolConstant(false, dl, SetCCVT, IntVT);
}

// Zero detection.
SDValue IsZero = DAG.getNode(ISD::AND, dl, SetCCVT, IsExpZero, IsMantZero);

// Denorm detection: exp == 0 AND mant != 0.
SDValue IsDenorm =
DAG.getNode(ISD::AND, dl, SetCCVT, IsExpZero, IsMantNonZero);

// Normal value conversion.
// dst_exp = exp_field + (DstBias - SrcBias)
// dst_mant = mant << (DstMant - SrcMant)
const int BiasAdjust = DstBias - SrcBias;
SDValue NormDstExp = DAG.getNode(
ISD::ADD, dl, IntVT, ExpField,
DAG.getConstant(APInt(DstBits, BiasAdjust, true), dl, IntVT));

SDValue NormDstMant;
if (DstMant > SrcMant) {
SDValue NormDstMantShift =
DAG.getShiftAmountConstant(DstMant - SrcMant, IntVT, dl);
NormDstMant =
DAG.getNode(ISD::SHL, dl, IntVT, MantField, NormDstMantShift);
} else {
NormDstMant = MantField;
}

// Assemble normal result.
SDValue DstMantShift = DAG.getShiftAmountConstant(DstMant, IntVT, dl);
SDValue NormExpShifted =
DAG.getNode(ISD::SHL, dl, IntVT, NormDstExp, DstMantShift);
SDValue NormResult = DAG.getNode(
ISD::OR, dl, IntVT,
DAG.getNode(ISD::OR, dl, IntVT, SignShifted, NormExpShifted),
NormDstMant);

// Denormal value conversion.
// For a denormal source (exp_field == 0, mant != 0), normalize by finding
// the MSB position of mant using CTLZ, then compute the correct
// exponent and mantissa for the destination format.
SDValue DenormResult;
{
const unsigned IntVTBits = DstBits;
SDValue LeadingZeros =
DAG.getNode(ISD::CTLZ_ZERO_UNDEF, dl, IntVT, MantField);

// dst_exp_denorm = (IntVTBits + DstBias - SrcBias - SrcMant) -
// LeadingZeros
const int DenormExpConst =
(int)IntVTBits + DstBias - SrcBias - (int)SrcMant;
SDValue DenormDstExp = DAG.getNode(
ISD::SUB, dl, IntVT,
DAG.getConstant(APInt(DstBits, DenormExpConst, true), dl, IntVT),
LeadingZeros);

// MSB position of the mantissa (0-indexed from LSB).
SDValue MantMSB =
DAG.getNode(ISD::SUB, dl, IntVT,
DAG.getConstant(IntVTBits - 1, dl, IntVT), LeadingZeros);

// leading_one = 1 << MantMSB
SDValue LeadingOne = DAG.getNode(ISD::SHL, dl, IntVT, One, MantMSB);

// frac = mant XOR leading_one (strip the implicit 1)
SDValue Frac = DAG.getNode(ISD::XOR, dl, IntVT, MantField, LeadingOne);

// shift_amount = DstMant - MantMSB
// = DstMant - (IntVTBits - 1 - LeadingZeros)
// = LeadingZeros - (IntVTBits - 1 - DstMant)
const unsigned ShiftSub = IntVTBits - 1 - DstMant; // always >= 0
SDValue ShiftAmount = DAG.getNode(ISD::SUB, dl, IntVT, LeadingZeros,
DAG.getConstant(ShiftSub, dl, IntVT));

SDValue DenormDstMant =
DAG.getNode(ISD::SHL, dl, IntVT, Frac, ShiftAmount);

// Assemble denorm as sign | (denorm_dst_exp << DstMant) | denorm_dst_mant
SDValue DenormExpShifted =
DAG.getNode(ISD::SHL, dl, IntVT, DenormDstExp, DstMantShift);
DenormResult = DAG.getNode(
ISD::OR, dl, IntVT,
DAG.getNode(ISD::OR, dl, IntVT, SignShifted, DenormExpShifted),
DenormDstMant);
}

// Select between normal and denorm paths.
SDValue FiniteResult =
DAG.getSelect(dl, IntVT, IsDenorm, DenormResult, NormResult);

// Build special-value results.
// NaN -> canonical quiet NaN: sign=0, exp=all-ones, qNaN bit set.
// Encoding: (DstExpAllOnes << DstMant) | (1 << (DstMant - 1))
const uint64_t QNaNBit = (DstMant > 0) ? (1ULL << (DstMant - 1)) : 0;
SDValue NaNResult =
DAG.getConstant((DstExpAllOnes << DstMant) | QNaNBit, dl, IntVT);

// Inf -> destination Inf.
// sign | (DstExpAllOnes << DstMant)
SDValue InfResult =
DAG.getNode(ISD::OR, dl, IntVT, SignShifted,
DAG.getConstant(DstExpAllOnes << DstMant, dl, IntVT));

// Zero -> signed zero.
// Sign bit only.
SDValue ZeroResult = SignShifted;

// Final selection goes in order: NaN takes priority, then Inf, then Zero.
SDValue Result = FiniteResult;
Result = DAG.getSelect(dl, IntVT, IsZero, ZeroResult, Result);
Result = DAG.getSelect(dl, IntVT, IsInf, InfResult, Result);
Result = DAG.getSelect(dl, IntVT, IsNaN, NaNResult, Result);

// Bitcast integer result to destination float type.
Result = DAG.getNode(ISD::BITCAST, dl, DstVT, Result);

Results.push_back(Result);
break;
}
case ISD::FCANONICALIZE: {
// This implements llvm.canonicalize.f* by multiplication with 1.0, as
// suggested in
Expand Down
16 changes: 16 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2763,6 +2763,9 @@ void DAGTypeLegalizer::SoftPromoteHalfResult(SDNode *N, unsigned ResNo) {
case ISD::STRICT_UINT_TO_FP:
case ISD::SINT_TO_FP:
case ISD::UINT_TO_FP: R = SoftPromoteHalfRes_XINT_TO_FP(N); break;
case ISD::CONVERT_FROM_ARBITRARY_FP:
R = SoftPromoteHalfRes_CONVERT_FROM_ARBITRARY_FP(N);
break;
case ISD::POISON:
case ISD::UNDEF: R = SoftPromoteHalfRes_UNDEF(N); break;
case ISD::ATOMIC_SWAP: R = BitcastToInt_ATOMIC_SWAP(N); break;
Expand Down Expand Up @@ -3050,6 +3053,19 @@ SDValue DAGTypeLegalizer::SoftPromoteHalfRes_XINT_TO_FP(SDNode *N) {
return DAG.getNode(GetPromotionOpcode(NVT, OVT), dl, MVT::i16, Res);
}

SDValue
DAGTypeLegalizer::SoftPromoteHalfRes_CONVERT_FROM_ARBITRARY_FP(SDNode *N) {
EVT OVT = N->getValueType(0);
EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), OVT);
SDLoc dl(N);

SDValue Res = DAG.getNode(ISD::CONVERT_FROM_ARBITRARY_FP, dl, NVT,
N->getOperand(0), N->getOperand(1));

// Round the value to the softened type.
return DAG.getNode(GetPromotionOpcode(NVT, OVT), dl, MVT::i16, Res);
}

SDValue DAGTypeLegalizer::SoftPromoteHalfRes_UNDEF(SDNode *N) {
return DAG.getUNDEF(MVT::i16);
}
Expand Down
9 changes: 9 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2076,6 +2076,9 @@ bool DAGTypeLegalizer::PromoteIntegerOperand(SDNode *N, unsigned OpNo) {
case ISD::FP16_TO_FP:
case ISD::VP_UINT_TO_FP:
case ISD::UINT_TO_FP: Res = PromoteIntOp_UINT_TO_FP(N); break;
case ISD::CONVERT_FROM_ARBITRARY_FP:
Res = PromoteIntOp_CONVERT_FROM_ARBITRARY_FP(N);
break;
case ISD::STRICT_FP16_TO_FP:
case ISD::STRICT_UINT_TO_FP: Res = PromoteIntOp_STRICT_UINT_TO_FP(N); break;
case ISD::ZERO_EXTEND: Res = PromoteIntOp_ZERO_EXTEND(N); break;
Expand Down Expand Up @@ -2685,6 +2688,12 @@ SDValue DAGTypeLegalizer::PromoteIntOp_UINT_TO_FP(SDNode *N) {
ZExtPromotedInteger(N->getOperand(0))), 0);
}

SDValue DAGTypeLegalizer::PromoteIntOp_CONVERT_FROM_ARBITRARY_FP(SDNode *N) {
return SDValue(DAG.UpdateNodeOperands(N, GetPromotedInteger(N->getOperand(0)),
N->getOperand(1)),
0);
}

SDValue DAGTypeLegalizer::PromoteIntOp_STRICT_UINT_TO_FP(SDNode *N) {
return SDValue(DAG.UpdateNodeOperands(N, N->getOperand(0),
ZExtPromotedInteger(N->getOperand(1))), 0);
Expand Down
3 changes: 3 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
Original file line number Diff line number Diff line change
Expand Up @@ -397,6 +397,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
SDValue PromoteIntOp_TRUNCATE(SDNode *N);
SDValue PromoteIntOp_UINT_TO_FP(SDNode *N);
SDValue PromoteIntOp_STRICT_UINT_TO_FP(SDNode *N);
SDValue PromoteIntOp_CONVERT_FROM_ARBITRARY_FP(SDNode *N);
SDValue PromoteIntOp_ZERO_EXTEND(SDNode *N);
SDValue PromoteIntOp_VP_ZERO_EXTEND(SDNode *N);
SDValue PromoteIntOp_MSTORE(MaskedStoreSDNode *N, unsigned OpNo);
Expand Down Expand Up @@ -787,6 +788,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
SDValue SoftPromoteHalfRes_FNEG(SDNode *N);
SDValue SoftPromoteHalfRes_AssertNoFPClass(SDNode *N);
SDValue SoftPromoteHalfRes_XINT_TO_FP(SDNode *N);
SDValue SoftPromoteHalfRes_CONVERT_FROM_ARBITRARY_FP(SDNode *N);
SDValue SoftPromoteHalfRes_UNDEF(SDNode *N);
SDValue SoftPromoteHalfRes_VECREDUCE(SDNode *N);
SDValue SoftPromoteHalfRes_VECREDUCE_SEQ(SDNode *N);
Expand Down Expand Up @@ -838,6 +840,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
SDValue ScalarizeVecRes_BUILD_VECTOR(SDNode *N);
SDValue ScalarizeVecRes_EXTRACT_SUBVECTOR(SDNode *N);
SDValue ScalarizeVecRes_FP_ROUND(SDNode *N);
SDValue ScalarizeVecRes_CONVERT_FROM_ARBITRARY_FP(SDNode *N);
SDValue ScalarizeVecRes_UnaryOpWithExtraInput(SDNode *N);
SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N);
SDValue ScalarizeVecRes_LOAD(LoadSDNode *N);
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -456,6 +456,7 @@ SDValue VectorLegalizer::LegalizeOp(SDValue Op) {
case ISD::USUBO:
case ISD::SMULO:
case ISD::UMULO:
case ISD::CONVERT_FROM_ARBITRARY_FP:
case ISD::FCANONICALIZE:
case ISD::FFREXP:
case ISD::FMODF:
Expand Down
Loading