Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BasicAA] Incorrect alias analysis causing miscompile in slp-vectorizer #98978

Closed
wlei-llvm opened this issue Jul 16, 2024 · 8 comments · Fixed by #100130
Closed

[BasicAA] Incorrect alias analysis causing miscompile in slp-vectorizer #98978

wlei-llvm opened this issue Jul 16, 2024 · 8 comments · Fixed by #100130

Comments

@wlei-llvm
Copy link
Contributor

wlei-llvm commented Jul 16, 2024

Hi! We recently hit a miscompile issue that slp-vectorizer turned two 8 byte(memory overlap) load/store into one 16 byte load/store and we further pinpointed to an incorrect alias analysis.
Here is the reduced LLVM IR to repro: reduced.ll.txt
Cmd:

opt -S -passes=slp-vectorizer  < reduced.ll

Incorrect codegen result: (notice the load <2 x i64>/ store <2 x i64> )

15:                                               ; preds = %15, %14
  %16 = phi ptr [ %0, %15 ], [ %13, %14 ]
  %17 = phi ptr [ null, %15 ], [ %8, %14 ]
  %18 = load <2 x i64>, ptr %16, align 1
  %19 = shufflevector <2 x i64> %18, <2 x i64> poison, <2 x i32> <i32 1, i32 0>
  store <2 x i64> %19, ptr %17, align 1
  br i1 false, label %15, label %.loopexit42.2

It's a very tricky case, here are some explanations and pictures to help demo the issue:

  1. To be simple, let's start using the node%3 as root, the node value is a alias check for the same value alias(<%3, %3>) (this is because it's under the MaybeCrossIteration(BasicAliasAnalysis.cpp) to check if it's no-alias across different iterations )

  2. Before recursively checking the dependencies on def-use chain, it initializes the cache with NoAlias(IIUC, this is an assumption based NoAlias BasicAliasAnalysis.cpp#L1691-L1693)
    demo1

  3. Keep checking the alias recursively, i,e. check <%23, %23>, --> <%7, %7> --><%3, 3> until it finds a circle. Then it runs the code in BasicAliasAnalysis.cpp#L1693-L1702, which fetches the initial assumption based value in the cache for %3.

  4. Then the recursion return back to %7. IIUC, Now %7 is done all the dependencies check, it runs the code to update status to definitive. Since it depends on an assumption based value(%3), it's pushed into the AssumptionBasedResults(code) which IIUC later if the assumption is disproven, we will remove all the results based on this assumption.

  5. Continue returning to %23 and process %23's right child %8.
    demo2

  6. (Bug occurs!) when processing %8's child %7, as previously %7 is marked as definitive, %8 is not considered as a assumption based(code) and never pushed into AssumptionBasedResult, this is incorrect because actually %7 is assumption based(its dependence %3 is assumption based and have't been proven yet), and <%8, %8> is missing to be pushed into AssumptionBasedResult.

demo3

  1. Return and process %3's right child %99, say <%99> is not a NoAlias, now the previous %3's NoAlias assumption is disproven, all the AssumptionBaseResult based on %3 are disproven, i,e. <%23, %23>, <%7,%7> are removed from cache and needed to recompute, but as %8 is missing in AssumptionBasedResult, <%8, %8> is still in the cache. Later any query uses the <%8, %8> is incorrect, in this case, it causes the incorrect <%16, %17> alias that leads to the miscomplile.

In summary:
There seems to be a bug in this assumption-disproven mechanism: An assumption-based result is marked as a definitive result once all dependencies are processed. However, this approach can be problematic if the node is in a cycle, a definitive result may still be assumption-based, as its dependencies' aliases may not all be definitive. Computing a result
based on an incorrect definitive result can lead to escaping invalidation in the cache, i.e. when assumptions got invalidated correctly due to other alias on dependencies, part of the chain may fail to invalidate because an assumption-based result was prematurely marked as definitive.

Therefore, I'm posting here to ask for help how to properly fix this, thanks in advance!

cc @nikic @WenleiHe @fhahn @hiraditya @jdoerfert @preames

@WenleiHe
Copy link
Member

@nikic @preames thoughts on this one? we hit this during llvm-15 -> llvm-17 upgrade causing a miscompile on zstd compression library, and the miscompile was bisected to https://reviews.llvm.org/D136175, but based on Lei's investigation, that particular patch may not be the root cause and we may have a lurking bug in basic AA.

@hiraditya
Copy link
Collaborator

cc: @alexey-bataev if you can share your analysis, that'd be great!

@alexey-bataev
Copy link
Member

cc: @alexey-bataev if you can share your analysis, that'd be great!

Cannot add anything specific here, SLP relies on aliasing analysis to prove that it is safe to vectorize the code. Looks like here AA says that store i64 %19, ptr %17, align 1, and following load %21 do not alias, so it decides it is safe to vectorize.

@WenleiHe
Copy link
Member

cc: @alexey-bataev if you can share your analysis, that'd be great!

Cannot add anything specific here, SLP relies on aliasing analysis to prove that it is safe to vectorize the code. Looks like here AA says that store i64 %19, ptr %17, align 1, and following load %21 do not alias, so it decides it is safe to vectorize.

Right SLP isn't doing anything wrong. The bug is in basic AA.

@nikic nikic self-assigned this Jul 19, 2024
@nikic
Copy link
Contributor

nikic commented Jul 19, 2024

I think we'll need to add an additional state between "assumption" and "definitive". c25fdb4 is an incomplete sketch that fixes the test case (but misses handling to convert SemiDefinitive -> Definitive later and some other things). I'll try to look into this on Monday.

nikic added a commit to nikic/llvm-project that referenced this issue Jul 23, 2024
nikic added a commit to nikic/llvm-project that referenced this issue Jul 23, 2024
If a result is potentially based on a not yet proven assumption,
BasicAA will remember it inside AssumptionBasedResults and remove
the cache entry it an assumption higher up is later disproven.
However, we currently miss the case where another cache entry ends
up depending on such an AssumptionBased result.

Fix this by introducing an additional AssumptionBased state for
cache entries. If such a result is used, we'll still increment
AAQI.NumAssumptionUses, which means that the using entry will
also become AssumptionBased and be cleared if the assumption is
disproven.

At the end of the root query, convert remaining AssumptionBased
results into definitive results.

Fixes llvm#98978.
nikic added a commit to nikic/llvm-project that referenced this issue Jul 23, 2024
If a result is potentially based on a not yet proven assumption,
BasicAA will remember it inside AssumptionBasedResults and remove
the cache entry it an assumption higher up is later disproven.
However, we currently miss the case where another cache entry ends
up depending on such an AssumptionBased result.

Fix this by introducing an additional AssumptionBased state for
cache entries. If such a result is used, we'll still increment
AAQI.NumAssumptionUses, which means that the using entry will
also become AssumptionBased and be cleared if the assumption is
disproven.

At the end of the root query, convert remaining AssumptionBased
results into definitive results.

Fixes llvm#98978.
nikic added a commit to nikic/llvm-project that referenced this issue Jul 23, 2024
If a result is potentially based on a not yet proven assumption,
BasicAA will remember it inside AssumptionBasedResults and remove
the cache entry it an assumption higher up is later disproven.
However, we currently miss the case where another cache entry ends
up depending on such an AssumptionBased result.

Fix this by introducing an additional AssumptionBased state for
cache entries. If such a result is used, we'll still increment
AAQI.NumAssumptionUses, which means that the using entry will
also become AssumptionBased and be cleared if the assumption is
disproven.

At the end of the root query, convert remaining AssumptionBased
results into definitive results.

Fixes llvm#98978.
@nikic
Copy link
Contributor

nikic commented Jul 23, 2024

I've put up #100130 with a candidate fix.

nikic added a commit to nikic/llvm-project that referenced this issue Jul 24, 2024
nikic added a commit to nikic/llvm-project that referenced this issue Jul 24, 2024
If a result is potentially based on a not yet proven assumption,
BasicAA will remember it inside AssumptionBasedResults and remove
the cache entry it an assumption higher up is later disproven.
However, we currently miss the case where another cache entry ends
up depending on such an AssumptionBased result.

Fix this by introducing an additional AssumptionBased state for
cache entries. If such a result is used, we'll still increment
AAQI.NumAssumptionUses, which means that the using entry will
also become AssumptionBased and be cleared if the assumption is
disproven.

At the end of the root query, convert remaining AssumptionBased
results into definitive results.

Fixes llvm#98978.
@nikic nikic closed this as completed in 9107338 Jul 25, 2024
@nikic nikic added this to the LLVM 19.X Release milestone Jul 25, 2024
@nikic
Copy link
Contributor

nikic commented Jul 25, 2024

/cherry-pick 9107338

llvmbot pushed a commit to llvmbot/llvm-project that referenced this issue Jul 25, 2024
)

If a result is potentially based on a not yet proven assumption,
BasicAA will remember it inside AssumptionBasedResults and remove
the cache entry if an assumption higher up is later disproved.
However, we currently miss the case where another cache entry ends
up depending on such an AssumptionBased result.

Fix this by introducing an additional AssumptionBased state for
cache entries. If such a result is used, we'll still increment
AAQI.NumAssumptionUses, which means that the using entry will
also become AssumptionBased and be cleared if the assumption is
disproved.

At the end of the root query, convert remaining AssumptionBased
results into definitive results.

Fixes llvm#98978.

(cherry picked from commit 9107338)
@llvmbot
Copy link
Collaborator

llvmbot commented Jul 25, 2024

/pull-request #100553

yuxuanchen1997 pushed a commit that referenced this issue Jul 25, 2024
Summary:
If a result is potentially based on a not yet proven assumption,
BasicAA will remember it inside AssumptionBasedResults and remove
the cache entry if an assumption higher up is later disproved.
However, we currently miss the case where another cache entry ends
up depending on such an AssumptionBased result.

Fix this by introducing an additional AssumptionBased state for
cache entries. If such a result is used, we'll still increment
AAQI.NumAssumptionUses, which means that the using entry will
also become AssumptionBased and be cleared if the assumption is
disproved.

At the end of the root query, convert remaining AssumptionBased
results into definitive results.

Fixes #98978.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60250771
tru pushed a commit to llvmbot/llvm-project that referenced this issue Jul 26, 2024
)

If a result is potentially based on a not yet proven assumption,
BasicAA will remember it inside AssumptionBasedResults and remove
the cache entry if an assumption higher up is later disproved.
However, we currently miss the case where another cache entry ends
up depending on such an AssumptionBased result.

Fix this by introducing an additional AssumptionBased state for
cache entries. If such a result is used, we'll still increment
AAQI.NumAssumptionUses, which means that the using entry will
also become AssumptionBased and be cleared if the assumption is
disproved.

At the end of the root query, convert remaining AssumptionBased
results into definitive results.

Fixes llvm#98978.

(cherry picked from commit 9107338)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment