Skip to content

Conversation

@joyalbin
Copy link

@joyalbin joyalbin commented Oct 27, 2025

This PR is to fix the trD_compute_frag type in Xe Epilogue based on the type configured in cst_callbacks.

ElementOutput in the LinearCombination and ElementOutput in the CollectiveEpilogue are two different things.
ElementOutput in CollectiveEpilogue is always same as ElementD, but the ElementOutput in LinearCombination need not to be same as ElementD.

ElementOutput in the LinearCombination represent the output of the LinearCombination operation (alphaAB, betaC).
At the same time, ElementOutput/ElementD in the CollectiveEpilogue represented the actual output which must be stored back.

@Antonyvance Antonyvance added the urgent PR requires a urgent attention (for release or blocking another PR) label Oct 27, 2025
@Antonyvance Antonyvance added this to the 0.6 milestone Oct 27, 2025
@tdeng5
Copy link

tdeng5 commented Oct 28, 2025

Please add some UTs to cover enough scenarios, like: D=AxB + C => {FP32, FP16} = {FP32, FP16} + {FP32, FP16}

Please also combine #563 into this PR.

@joyalbin joyalbin force-pushed the fix_trD_compute_type branch from 9d2da71 to e60bf79 Compare October 28, 2025 16:53
@Antonyvance Antonyvance merged commit a0172fd into intel:main Oct 28, 2025
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

urgent PR requires a urgent attention (for release or blocking another PR)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants