Skip to content

[ThreadSync] Use Z3 for constraint equivalence checking#1760

Merged
LeiWang1999 merged 5 commits intotile-ai:mainfrom
LeiWang1999:feature/z3-constraint-equivalence-check
Jan 30, 2026
Merged

[ThreadSync] Use Z3 for constraint equivalence checking#1760
LeiWang1999 merged 5 commits intotile-ai:mainfrom
LeiWang1999:feature/z3-constraint-equivalence-check

Conversation

@LeiWang1999
Copy link
Member

@LeiWang1999 LeiWang1999 commented Jan 30, 2026

This PR replaces the const_int_bound based range comparison with Z3 SMT solver for checking thread constraint equivalence in FindConflict.

The previous implementation used const_int_bound to compare thread variable ranges, which only checks the min/max bounds without considering additional constraints (e.g., threadIdx_x % 4 == 0). This led to incorrect equivalence judgments when constraints narrowed the actual executing thread set.

Approach

We use Z3 to formally verify constraint set equivalence through bidirectional implication:

Let $P(t)$ denote the predicate for the previous access's constraint set and $C(t)$ denote the predicate for the current access's constraint set, where $t = (\texttt{threadIdx.x}, \texttt{threadIdx.y}, \texttt{threadIdx.z})$.

We check:

  1. $P(t) \Rightarrow C(t)$: Every thread executing prev also executes curr
  2. $C(t) \Rightarrow P(t)$: Every thread executing curr also executes prev

If both hold, then $P(t) \Leftrightarrow C(t)$, meaning the exact same set of threads execute both accesses. Combined with has_same_index (identical buffer index expressions), this guarantees each thread only accesses memory locations it wrote itself, eliminating cross-thread RAW/WAR conflicts.

Implementation

The implication $P \Rightarrow C$ is checked by proving $\neg P \vee C$ using the Z3 prover:

bool prev_implies_curr = analyzer.z3_prover.CanProve(tir::Or(tir::Not(prev_constr), curr_constr));
bool curr_implies_prev = analyzer.z3_prover.CanProve(tir::Or(tir::Not(curr_constr), prev_constr));

Changes

  • Added ToConjunction() method to ConstrSet for converting constraint sets to a single conjunction expression
  • Replaced const_int_bound range comparison with Z3-based equivalence checking in FindConflict

Summary by CodeRabbit

  • Refactor

    • Improved constraint comparison and storage-synchronization logic with a stronger equivalence check to reduce false conflicts.
  • API

    • Added a method to obtain a combined constraint expression from a constraint set.
  • Tests

    • Added a CUDA-targeted test for a modulo-index shared-memory pattern and tightened validation for synchronization placement in generated kernels.

✏️ Tip: You can customize this high-level summary in your review settings.

Replace const_int_bound based range comparison with Z3 SMT solver
for checking thread constraint equivalence in FindConflict.
@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 30, 2026

📝 Walkthrough

Walkthrough

Adds ConstrSet::ToConjunction() to produce a combined constraint expression; switches thread-storage synchronization to a Z3-based bidirectional implication check between previous and current constraint sets; and adds a CUDA test exercising a modulo-based thread-index condition to verify sync insertion.

Changes

Cohort / File(s) Summary
ConstrSet Extension
src/transform/common/constr_visitor.h
Added PrimExpr ToConjunction() const to return the logical conjunction (AND) of all constraints in a ConstrSet; returns true when empty.
Thread Storage Sync
src/transform/thread_storage_sync.cc
Replaced naive min/max range-equality short-circuit with a Z3-based equivalence check: build prev/curr constraint expressions, bind thread variables to domains, and verify bidirectional implication (P ⇒ C and C ⇒ P) to decide equivalence before short-circuiting.
Tests
testing/python/transform/test_tilelang_transform_thread_sync.py, testing/python/issue/test_tilelang_issue_1106.py
Added CUDA test test_sync_if_with_same_index_with_modulo_if using a modulo-based thread index condition; refined __syncthreads placement validation in issue test to ensure it's not inside a for-loop. Removed an unused from tvm import te import.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant ThreadSync as "ThreadStorageSync"
participant ConstrSet as "ConstrSet::ToConjunction()"
participant Z3 as "Z3 Solver"
participant Decision as "Equivalence Decision"

ThreadSync->>ConstrSet: build prev_constr (P) and curr_constr (C)
ConstrSet-->>ThreadSync: return PrimExpr P, PrimExpr C
ThreadSync->>Z3: bind thread vars to domains; check P ⇒ C
Z3-->>ThreadSync: result (unsat/sat)
ThreadSync->>Z3: bind domains; check C ⇒ P
Z3-->>ThreadSync: result (unsat/sat)
ThreadSync->>Decision: both implications hold?
Decision-->>ThreadSync: equivalent => no conflict / not equivalent => conflict

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • kurisu6912

Poem

🐰
I nibble constraints into one bright line,
Call Z3 to see if both directions sign,
If P and C both nod in tune,
Threads dance on — no sync too soon! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[ThreadSync] Use Z3 for constraint equivalence checking' accurately describes the main change: replacing naive range-based comparison with Z3-based equivalence checking in the thread storage sync logic.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Added comments to improve understanding of the logic determining whether constraints are equivalent or in conflict in the thread storage synchronization process.
@LeiWang1999
Copy link
Member Author

This is also a fix for #1758

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@testing/python/issue/test_tilelang_issue_1106.py`:
- Around line 43-45: The test harness call has been commented out; restore the
standard test entry so CI discovers this test by uncommenting or re-adding
tilelang.testing.main() in the if __name__ == "__main__": block and avoid
calling test_issue_1106() directly; ensure the file uses the guard with
tilelang.testing.main() while leaving direct test calls removed or conditional
for local debugging so the test runner invokes the suite normally.
- Around line 36-40: The current string-based checks using for_start and for_end
are fragile and miss not-found cases; update the test to first validate both
patterns exist (ensure source.find("for (int i = 0;") and
source.find("__syncthreads") return >= 0), then rename for_end to
syncthreads_idx for clarity, locate the opening brace of the for loop (e.g.,
find '{' after for_start), compute the matching closing brace by scanning and
counting braces (brace depth) to get loop_close_idx, and assert syncthreads_idx
> loop_close_idx to guarantee __syncthreads is after the outer loop instead of
relying on fixed slices or exact whitespace.

Comment on lines +36 to +40
for_start = source.find("for (int i = 0;")
for_end = source.find("__syncthreads")
assert for_end > for_start, "__syncthreads should be after the for loop, not inside it"
# Check that __syncthreads appears after the closing brace of the outer for loop
assert source[for_end - 4 : for_end - 2] == "}\n", "__syncthreads should not be inside any for loop"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

String-based validation is fragile and lacks robustness checks.

Several concerns with the validation logic:

  1. Missing not-found handling: find() returns -1 when the pattern isn't found. If for_start is -1 and for_end is a positive index, the assertion for_end > for_start passes incorrectly.

  2. Misleading variable name: for_end actually stores the position of __syncthreads, not the end of the for loop.

  3. Brittle slice check: source[for_end - 4 : for_end - 2] == "}\n" depends on exact whitespace/formatting and could break if code generation changes indentation or spacing.

🛠️ Proposed fix with explicit validation
 def test_issue_1106():
     m = 200
     kernel = get_kernel(m)
     source = kernel.get_kernel_source()
     # Ensure __syncthreads is not inside the for loop
     for_start = source.find("for (int i = 0;")
-    for_end = source.find("__syncthreads")
-    assert for_end > for_start, "__syncthreads should be after the for loop, not inside it"
-    # Check that __syncthreads appears after the closing brace of the outer for loop
-    assert source[for_end - 4 : for_end - 2] == "}\n", "__syncthreads should not be inside any for loop"
+    syncthreads_pos = source.find("__syncthreads")
+    assert for_start != -1, "Expected for loop not found in generated source"
+    assert syncthreads_pos != -1, "__syncthreads not found in generated source"
+    assert syncthreads_pos > for_start, "__syncthreads should be after the for loop, not inside it"
+    # Check that __syncthreads appears after the closing brace of the outer for loop
+    preceding_content = source[:syncthreads_pos].rstrip()
+    assert preceding_content.endswith("}"), "__syncthreads should not be inside any for loop"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for_start = source.find("for (int i = 0;")
for_end = source.find("__syncthreads")
assert for_end > for_start, "__syncthreads should be after the for loop, not inside it"
# Check that __syncthreads appears after the closing brace of the outer for loop
assert source[for_end - 4 : for_end - 2] == "}\n", "__syncthreads should not be inside any for loop"
for_start = source.find("for (int i = 0;")
syncthreads_pos = source.find("__syncthreads")
assert for_start != -1, "Expected for loop not found in generated source"
assert syncthreads_pos != -1, "__syncthreads not found in generated source"
assert syncthreads_pos > for_start, "__syncthreads should be after the for loop, not inside it"
# Check that __syncthreads appears after the closing brace of the outer for loop
preceding_content = source[:syncthreads_pos].rstrip()
assert preceding_content.endswith("}"), "__syncthreads should not be inside any for loop"
🤖 Prompt for AI Agents
In `@testing/python/issue/test_tilelang_issue_1106.py` around lines 36 - 40, The
current string-based checks using for_start and for_end are fragile and miss
not-found cases; update the test to first validate both patterns exist (ensure
source.find("for (int i = 0;") and source.find("__syncthreads") return >= 0),
then rename for_end to syncthreads_idx for clarity, locate the opening brace of
the for loop (e.g., find '{' after for_start), compute the matching closing
brace by scanning and counting braces (brace depth) to get loop_close_idx, and
assert syncthreads_idx > loop_close_idx to guarantee __syncthreads is after the
outer loop instead of relying on fixed slices or exact whitespace.

Comment on lines 43 to +45
if __name__ == "__main__":
tilelang.testing.main()
# tilelang.testing.main()
test_issue_1106()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test harness is commented out.

Commenting out tilelang.testing.main() and calling test_issue_1106() directly may prevent this test from being discovered and run by CI/CD test frameworks that rely on the standard harness.

If this is intentional for debugging, consider restoring the harness before merging.

🔧 Suggested fix
 if __name__ == "__main__":
-    # tilelang.testing.main()
-    test_issue_1106()
+    tilelang.testing.main()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if __name__ == "__main__":
tilelang.testing.main()
# tilelang.testing.main()
test_issue_1106()
if __name__ == "__main__":
tilelang.testing.main()
🤖 Prompt for AI Agents
In `@testing/python/issue/test_tilelang_issue_1106.py` around lines 43 - 45, The
test harness call has been commented out; restore the standard test entry so CI
discovers this test by uncommenting or re-adding tilelang.testing.main() in the
if __name__ == "__main__": block and avoid calling test_issue_1106() directly;
ensure the file uses the guard with tilelang.testing.main() while leaving direct
test calls removed or conditional for local debugging so the test runner invokes
the suite normally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants