workaround for a VC++ bug in VS 17.14 #24878

snnn · 2025-05-27T22:23:25Z

Symptom

The following unit tests failed when building ONNX Runtime with Visual Studio 17.14 in Release or RelWithDebInfo configuration.

SparseTensorConversionTests.TestDenseToSparseConversion
MeanVarianceNormalizationTest.AllAxes
MVNContribOpTest.MeanVarianceNormalizationCPUTest_Version1_TO_8

Minimal Reproducible Example

// ConsoleApplication1.cpp
#include <iostream>
#include <span> // Used for comparison/workaround
#include <vector>
#include <algorithm>
#include <numeric>
#include <functional>

#include <gsl/gsl> // Microsoft GSL library

// Function exhibiting the bug with gsl::span
std::vector<float> MeanStdev_gsl(gsl::span<const float> v, float mean) {
    std::vector<float> diff(v.size()); // diff is sized correctly
    // The following transform behaves incorrectly
    std::transform(v.begin(), v.end(), diff.begin(),
                   std::bind(std::minus<float>(), std::placeholders::_1, mean));
    return diff;
}

// Same function with std::span (works correctly)
std::vector<float> MeanStdev_std(std::span<const float> v, float mean) {
    std::vector<float> diff(v.size());
    std::transform(v.begin(), v.end(), diff.begin(),
                   std::bind(std::minus<float>(), std::placeholders::_1, mean));
    return diff;
}

int main() {
    std::vector<float> X{
        1.0f, 1.0f, 2.0f, 4.0f, 1.0f, 4.0f, 0.0f, 4.0f,
        -1.0f, 1.0f, 2.0f, -1.0f, -1.0f, -2.0f, 4.0f, -5.0f, // X[15] is -5.0f
        5.0f, -3.0f, -1.0f, 0.0f, 3.0f, 3.0f, 0.0f, 0.0f  // 24 elements total
    };
    float mean_val = 0.875f; // Example mean

    std::cout << "Testing with gsl::span (problematic):" << std::endl;
    const auto mean_stdev_gsl = MeanStdev_gsl(X, mean_val);
    for (size_t i = 0; i < mean_stdev_gsl.size(); ++i) {
        std::cout << mean_stdev_gsl[i] << (i == mean_stdev_gsl.size() - 1 ? "" : " ");
    }
    std::cout << std::endl;
    // Last non-zero element X[15] - mean = -5.0f - 0.875f = -5.875f
    // Output shows processing stops after 16 elements.

    std::cout << "\nTesting with std::span (correct):" << std::endl;
    const auto mean_stdev_std = MeanStdev_std(X, mean_val);
    for (size_t i = 0; i < mean_stdev_std.size(); ++i) {
        std::cout << mean_stdev_std[i] << (i == mean_stdev_std.size() - 1 ? "" : " ");
    }
    std::cout << std::endl;

    return 0;
}

Observed Behavior (with gsl::span):

When compiling the MeanStdev_gsl function (which uses gsl::span) in Release mode, the std::transform operation does not process all elements of the input span.
For the provided example X (24 elements), the output is:

4.125 -3.875 -1.875 -0.875 2.125 2.125 -0.875 -0.875 -1.875 0.125 1.125 -1.875 -1.875 -2.875 3.125 -5.875 0 0 0 0 0 0 0 0

Only the first 16 elements of the input vector X appear to be processed by std::transform. The remaining 8 elements in the diff vector retain their default-initialized value (0.0f). The calculation for diff[0] is also incorrect if based on X[0]; the first value 4.125 corresponds to X[16] - mean if the output was shifted, but that's less likely than premature termination. More accurately, the 16th value computed (-5.875) correctly corresponds to X[15] - mean

The correct output (achieved when using std::span) would be:

0.125 0.125 1.125 3.125 0.125 3.125 -0.875 3.125 -1.875 0.125 1.125 -1.875 -1.875 -2.875 3.125 -5.875 4.125 -3.875 -1.875 -0.875 2.125 2.125 -0.875 -0.875

Detailed Analysis of the Issue:

Vector Sizing: The std::vector<float> diff inside the MeanStdev_gsl function is correctly sized based on v.size(). If v is a span of 24 floats, diff is allocated to hold 24 floats. This indicates that v.size() is correctly read at this point.
Premature Loop Termination: The std::transform operation, when gsl::span is used, appears to terminate its loop prematurely. In the example with 24 elements, only 16 elements are processed.
Assembly Code Discrepancy:
- gsl::span version (Incorrect): The provided assembly shows that the compiler generates a scalar loop (using movss, subss) for std::transform. The loop control logic involves:
  - rcx: Input pointer, starting at v.data().
  - rdi: Holds the size of the span in bytes (e.g., 24 elements * 4 bytes/element = 96 bytes). This rdi is derived from v.size() which was also used for diff's allocation.
  - r8: Calculated as v.data() + rdi (i.e., v.data() + size_in_bytes), which should be the correct end pointer for the input span.
  - The loop condition is cmp rcx, r8 followed by jne (loop if not equal). Despite r8 seemingly being calculated for the full range (e.g., 24 elements), the loop effectively terminates as if the range were shorter (16 elements). This suggests that the compiler might be incorrectly optimizing or misinterpreting the loop's trip count or the value of rdi (or the span size it represents) specifically when setting up or executing the std::transform loop with gsl::span and std::bind.
- std::span version (Correct): The assembly for the std::span version typically shows the compiler successfully auto-vectorizing the loop (using movups, subps), and it processes all elements correctly.

Suspected Cause: The issue seems to be a subtle code generation bug in the MSVC optimizer when handling std::transform with gsl::span (a non-standard library type, though widely used) in conjunction with std::bind. The compiler correctly determines v.size() for memory allocation but appears to use an effectively different (and incorrect) size for controlling the iteration of the std::transform loop. This leads to processing fewer elements than expected. I believe the problem can be reproduced without using GSL, though I haven't found an example.

asm1.txt
asm2.txt

Resolution

This PR change the MeanStdev function to not using gsl::span as a workaround of the original issue. After this change, the two MVN related tests can pass.

a-akoval · 2025-05-28T12:52:16Z

We experienced the same problem with std::transform compiler optimization.

The differences are:

We used raw pointers to int32_t as a source range in std::transform, and vector<int64_t> as an output.
24 elements would be successfully processed in our case, unlike 23 or any other number that is not a multiple of 8. I.e. in our case the last 7 elements would overwrite the first 7 in the output.
The possible resolutions were:
- using std::span instead of raw pointers
- using named inline function instead of lambda passed to std::transform
I was able to reproduce the problem within a unit test in the project, but it is not trivially reproducible in a separate test application, even though the same compiler and linker options were applied.

Based on our investigation there is no premature loop termination. All elements are processed, but their processing is split into two blocks:

First block uses SIMD instructions (movups in your case for floats, pmovsxdq in our case for integers) and handles all elements except the remainder (amount is equal to remainder of division by 8 in our case, and 8 in your case for some reason).
Then the remaining elements are processed using scalar operations (movss in your case).

The problem is that the register with the destination address is not updated after the first block. Hence the last remaining elements overwrite the first ones in the resulting output, and the last elements in the result left default initialized (zero).

a-akoval · 2025-05-28T13:18:13Z

Another problem is that SparseTensorConversionTests.TestDenseToSparseConversion still fails on my local build after I applied the fix from this PR. The compiler version is 19.44.35207.1 for x64.
The problem in that case with another STL helper std::copy, which is used internally by Google protobuf's RepeatedField<Element>::Add(Iter begin, Iter end) and which behaves exactly like std::transform above when copying data: https://github.com/protocolbuffers/protobuf/blob/v3.21.12/src/google/protobuf/repeated_field.h#L726

The simplest solution is to replace Add(begin, end) with Add(element) in the test code.
This works:

        // tp.mutable_int32_data()->Add(values.cbegin(), values.cend());
        for (const auto& v : values) tp.mutable_int32_data()->Add(v);

Another solution is to upgrade the protobuf version used from outdated 3.21.12 to some recent version, as implementation of RepeatedField<Element>::Add(Iter begin, Iter end) has changed. That especially makes sense given that the recommended version in the onnx itself is 5.29.2.

vmoroz · 2025-05-30T22:00:57Z

@snnn, @a-akoval , I had similar issue in my project and reported it here:
https://developercommunity.visualstudio.com/t/Auto-vectorization-breaks-std::copy-in-V/10912564
Please consider to upvote it.

The following unit tests failed when building ONNX Runtime with Visual Studio 17.14 in Release or RelWithDebInfo configuration. - SparseTensorConversionTests.TestDenseToSparseConversion - MeanVarianceNormalizationTest.AllAxes - MVNContribOpTest.MeanVarianceNormalizationCPUTest_Version1_TO_8 This PR provides a workaround for the two MVN tests.

workaround for a VC++ bug in VS 17.14

0697b55

fs-eire approved these changes May 27, 2025

View reviewed changes

Merge remote-tracking branch 'origin/main' into snnn/fix_bug

1d28a40

snnn merged commit 9705b17 into main May 29, 2025
86 of 88 checks passed

snnn deleted the snnn/fix_bug branch May 29, 2025 21:24

snnn mentioned this pull request Jul 12, 2025

onnxruntime v1.22.1 conda-forge/onnxruntime-feedstock#150

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

workaround for a VC++ bug in VS 17.14 #24878

workaround for a VC++ bug in VS 17.14 #24878

Uh oh!

snnn commented May 27, 2025 •

edited

Loading

Uh oh!

a-akoval commented May 28, 2025 •

edited

Loading

Uh oh!

a-akoval commented May 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

vmoroz commented May 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

workaround for a VC++ bug in VS 17.14 #24878

workaround for a VC++ bug in VS 17.14 #24878

Uh oh!

Conversation

snnn commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Symptom

Minimal Reproducible Example

Observed Behavior (with gsl::span):

Detailed Analysis of the Issue:

Resolution

Uh oh!

a-akoval commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a-akoval commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

vmoroz commented May 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

snnn commented May 27, 2025 •

edited

Loading

a-akoval commented May 28, 2025 •

edited

Loading

a-akoval commented May 28, 2025 •

edited

Loading