fix: correct _matrix_uv_float32 implementation in matrix_exp #76131
+48
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.

Fix matrix_exp precision issue for float32
PR Category: Performance Optimization
PR Types: Improvements
PR Overview
This PR fixes a critical bug in the
_matrix_uv_float32function that causes precision issues inpaddle.linalg.matrix_expfor float32 inputs.Motivation and Context
The
paddle.linalg.matrix_expfunction for float32 inputs had implementation defects in the_matrix_uv_float32helper function, which is used to compute the Pade approximation in the scaling and squaring method. This caused:mat_a4, but Pade-7 approximant requiresmat_a6These issues resulted in numerical precision errors (~1e-6) that, while within acceptable float32 tolerances, were inconsistent with the float64 implementation and did not follow the standard Higham scaling-and-squaring algorithm.
Solution
Updated
_matrix_uv_float32function inpython/paddle/tensor/linalg.pyto:_matrix_mats(mat_a, 4, dtype)to_matrix_mats(mat_a, 6, dtype)mat_i, mat_a2, mat_a4, mat_a6to_matrix_exp_pade7u13, v13for matrices requiring higher-order approximation3.925724783138660to the conditions tuple(u3, u5, u7, u13)and(v3, v5, v7, v13)in selectionThis brings the float32 implementation in line with the float64 implementation and ensures correct behavior according to the Higham algorithm [1].
Changes
Modified Files
python/paddle/tensor/linalg.py_matrix_uv_float32(lines 5165-5193)Key Changes
Testing
Unit Test
The fix has been validated against the existing test case in
test/legacy_test/test_linalg_matrix_exp.py, which uses tolerance:RTOL = {'float32': 1e-06, 'float64': 1e-13}ATOL = {'float32': 1e-06, 'float64': 1e-13}Manual Verification
Tested with the issue reproduction case:
Results
Checklist
References
[1] Nicholas J. Higham, "The scaling and squaring method for the matrix exponential revisited", SIAM Journal on Matrix Analysis and Applications, 2005.
Additional Notes
The remaining ~1e-6 difference with PyTorch's
torch.matrix_expis expected and acceptable for float32 computations due to:This is within Paddle's own testing standards and consistent with numerical computing best practices.