Skip to content

Conversation

@oraluben
Copy link
Contributor

@oraluben oraluben commented Oct 5, 2025

Resolves #833
Closes #756

pip install . -v  # CUDA / Metal build on Linux / Mac
USE_ROCM=ON pip install . -v  # ROCm build
USE_CUDA=OFF pip install . -v  # CPU build

Detailed information could be found at the doc

This PR basically migrates to a fully cmake-based build system and eliminates python-based building process, and therefore simplifies the building and installing stage.

User and developers of tilelang could just install from source (without setting PYTHONPATH and so on).

  • Use one wheel for different python version via stable abi
    cp38-abi3 wheels for >= python 3.8 (needs Workaround limit api too high in tvm tvm#12)
  • CUDA wheel
    • ROCm wheel
    • Unify CUDA and ROCm wheel? Is that possible?
  • Metal wheel
  • Build cython ext
    • Fix Remove in-tree jit compile
  • git hash and cuda/rocm/metal extension in version. Support build-time toolchain info for CUDA. (e.g. in the artifacts)
    Now tilelang reads version (for cache, etc.) from the package version.
  • Validation
  • Cleanup
    • Remove tox-based build scripts (currently we may have three host platform (linux+{x86,aarch64}, metal), and for each platform we only need 1 wheel. That makes tox unnecessary for building different python versions and insufficient for building against different platform) and refractor the scripts to build locally.
  • CI related fixes
    • auditwheel and delocate for linux / darwin wheels
  • Update doc
    • Build wheel
    • Editable install
    • Use custom tvm
    • ...

Summary by CodeRabbit

  • New Features

    • Added a scheduled cross-platform CI workflow for daily and release wheel builds; introduced dynamic build-time version metadata.
  • Documentation

    • Installation guide updated: raised Python minimum, tightened CUDA requirements, clarified bundled vs external backend and nightly guidance.
  • Refactor

    • Modernized CMake and packaging flow; unified backend selection and simplified native/Python integration and library discovery.
  • Chores

    • Removed legacy install/build scripts, multi-version tox/docker workflows, and replaced monolithic packaging script with modern toolchain.

@github-actions
Copy link

github-actions bot commented Oct 5, 2025

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run bash format.sh in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work!

🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 5, 2025

Walkthrough

Migrates packaging to scikit-build-core, modernizes CMake (≥3.26) and TVM integration, removes legacy setup/tox/install scripts, requires a prebuilt cython_wrapper at import, centralizes third‑party discovery and dynamic version metadata, updates CI/Docker to uv/cibuildwheel flows, and adds a Dist GitHub Actions workflow.

Changes

Cohort / File(s) Summary
New CI distribution workflow
​.github/workflows/dist.yml
Adds Dist workflow using cibuildwheel matrix (ubuntu/macOS/arm), captures built wheel name and uploads artifact; sets concurrency/cancel, PYTHON_VERSION=3.12, CUDA_VERSION matrix, and NO_VERSION_LABEL behavior.
CMake & scikit-build migration
CMakeLists.txt, cmake/load_tvm.cmake, pyproject.toml, version_provider.py
Raises CMake min to 3.26, enables modern defaults and ccache, centralizes TVM loading (cmake/load_tvm.cmake), introduces object/shared targets and cython_wrapper wiring, switches build backend to scikit-build-core, and implements dynamic version metadata provider.
Packaging & distribution scripts
maint/scripts/pypi_distribution.sh, maint/scripts/local_distribution.sh, maint/scripts/docker_local_distribute.sh, maint/scripts/docker_pypi_distribute.sh, maint/scripts/pypi.manylinux.Dockerfile
Replace ad-hoc flows with strict set -eux scripts, adopt uv pip/venv handling, generate sdist/wheel via modern tooling, and run wheel repair (auditwheel/delocate) with multi-arch builder logic.
Large removals: legacy orchestration
setup.py, tox.ini, install_*.sh (install_cpu.sh,install_cuda.sh,install_rocm.sh,install_metal.sh), maint/scripts/*tox*.sh, maint/scripts/pypi.Dockerfile
Remove monolithic setup.py, legacy installers and tox-based multi-Python builders, and older Dockerfile; replaced by scikit-build/CI-driven workflows and scripts.
Env & library discovery refactor
tilelang/env.py, tilelang/libinfo.py, tilelang/__init__.py, tilelang/version.py (removed)
Add SITE_PACKAGES/THIRD_PARTY_ROOT/TL_LIBS and prepend_pythonpath; switch lib lookup to TL_LIBS with new find_lib_path(name: str, py_ext=False); move __version__ sourcing to importlib.metadata and remove tilelang/version.py.
Cython/JIT adapter simplification
tilelang/jit/adapter/cython/adapter.py
Remove dynamic runtime Cython compilation/cache logic; require/import prebuilt cython_wrapper, raising on ImportError.
Call-site import updates
tilelang/autotuner/tuner.py, tilelang/cache/kernel_cache.py
Update imports to take __version__ from tilelang instead of tilelang.version.
Lib loading API change
tilelang/libinfo.py
Replace DLL candidate discovery with TL_LIBS-driven search and platform-aware filename selection; change find_lib_path signature to find_lib_path(name: str, py_ext=False).
Requirements, .gitignore & deps cleanup
.gitignore, requirements-build.txt, requirements-dev.txt, requirements.txt
Rework build/dev deps for scikit-build/CMake flow (add scikit-build-core, uv, auditwheel/delocate), adjust runtime deps (add ml_dtypes), broaden *dist/ ignore, and remove many legacy entries.
Docs & CI tweaks
docs/get_started/Installation.md, .github/workflows/{cuda-ci.yml,metal-ci.yml,rocm-ci.yml}
Tighten prerequisites (Python/CUDA), update Docker instructions, migrate CI to uv-based Python flows, refine submodule handling and build environment variables.
Script hygiene & small tooling changes
maint/scripts/*
Add strict shell options, centralize docker run to call scripts, remove obsolete GPU flags, and simplify build orchestration steps/messages.
Third‑party update
3rdparty/tvm
Bump TVM submodule pointer to a newer commit.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Dev as Developer
  participant GH as GitHub Actions
  participant Repo as Repository
  participant UV as uv
  participant CIBW as cibuildwheel / scikit-build-core
  participant CMake as CMake (>=3.26)
  participant TVM as TVM (load_tvm.cmake)
  participant Repair as auditwheel/delocate
  participant Store as Artifact Store

  Dev->>GH: push / release
  GH->>Repo: checkout + submodules
  GH->>UV: setup Python env & install build deps via uv
  GH->>CIBW: run cibuildwheel (matrix)
  CIBW->>CMake: configure & build (load_tvm.cmake, cython_wrapper)
  CMake->>TVM: resolve TVM_SOURCE / INCLUDES
  CMake-->>CIBW: produce raw wheel(s)
  CIBW->>Repair: repair wheel(s)
  GH->>Store: upload artifact(s)
Loading
sequenceDiagram
  autonumber
  participant User as pip / installer
  participant SB as scikit-build-core
  participant CMake as CMake
  participant TVM as TVM

  User->>SB: pip install -v .
  SB->>CMake: configure & build targets (tilelang_module, cython_wrapper, ...)
  CMake->>TVM: load via cmake/load_tvm.cmake
  TVM-->>CMake: provide headers/targets
  CMake-->>SB: built artifacts
  SB-->>User: package installed / wheel produced
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • LeiWang1999

Poem

I nibble at CMake, hop through each line,
scikit-build hums while wheels align.
UV brews a venv, auditwheel sews seams,
TVM paths tidy, and version adds gleams.
A rabbit ships wheels — hippity-hop, build-time dreams! 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title “[Build] Migrate to scikit-build-core” succinctly captures the primary objective of replacing the existing Python-centric build workflow with scikit-build-core integration, matching the PR’s core change without unnecessary detail or ambiguity.
Linked Issues Check ✅ Passed The PR fully migrates the build system to scikit-build-core by removing setup.py and Python-centric packaging code, introducing CMake-based CMakeLists, updating pyproject.toml with scikit-build-core tooling, and implementing dynamic version metadata as specified in issue #833, and it adds ARM (aarch64) wheel support via the ubuntu-22.04-arm build matrix in CI as required by issue #756.
Out of Scope Changes Check ✅ Passed All code modifications in this PR are directly related to migrating the build and packaging workflow to scikit-build-core and supporting installation and runtime use from site-packages, with no unrelated or extraneous changes outside the scope of the linked issues.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf0edde and 44e9644.

📒 Files selected for processing (1)
  • .github/workflows/dist.yml (1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/dist.yml

20-20: label "macos-16" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: format-check

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@oraluben oraluben mentioned this pull request Oct 5, 2025
8 tasks
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 54b40fa and 5d3415f.

📒 Files selected for processing (3)
  • .github/workflows/dist.yml (1 hunks)
  • 3rdparty/tvm (1 hunks)
  • pyproject.toml (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • 3rdparty/tvm
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/dist.yml

52-52: property "repair" is not defined in object type {ls-whl: {conclusion: string; outcome: string; outputs: {string => string}}}

(expression)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build-test-metal
  • GitHub Check: build-test-amd
  • GitHub Check: build-wheels (ubuntu-22.04-arm)
  • GitHub Check: build-wheels (ubuntu-22.04)
  • GitHub Check: build-wheels (macos-14)

@oraluben
Copy link
Contributor Author

oraluben commented Oct 12, 2025

This should be ready for review.

Main parts that this PR touched:

  1. Build system
    1. Now the project builds fully by cmake driven by sk-build-core. pip install . is supposed to cover all use case.
    2. Build dir is set to ./build intentionally, to help IDE indexing, but it's not suppose to use libs in build dir.
    3. All libs will be installed into python's site-packages, no matter if it's a editable install.
    4. Add a patch in tile-ai/tvm Workaround limit api too high in tvm tvm#12 so that tvm's cython ext could be used by py>=3.8. But upstream tvm seems to have something new? cc @Hzfengsy
    5. Thanks to @XuehaiPan , we're using a pure manylinux2014 builder for x86_64, which means we can support torch<2.6 easier. (But manylinux2014 only have a up-to-12.4 cuda, does that mean we need another wheel for CUDA-13?)
    6. Use ccache if possilbe.
  2. Runtime compile and cache
    1. cython_wrapper.so will be compiled at build time, and installed directly under site-packages, to support import cython_wrapper. This is not the best practice, but I think we have plan to migrate to tvm-ffi soon?
    2. Then I removed the runtime compilation logic of cython_wrapper.so. Developers who want to modify that file needs to use an editable install and re-install with pip to make sure the file is re-generated. This takes ~30 seconds each time, because pip needs to pack and unpack each time it installs. I wonder if this looks good to you @LeiWang1999 , or shell we restore the old logic?
    3. Now tilelang use the package version (e.g. 0.1.6.post1+cu121.gitxxxxxxxx) as the version used in its cache folder.
  3. CI and releases
    1. Added a CI job to generate wheels for linux+{x86,aarch64} and mac. I plan to set it to a daily cron job and for new releases, for nightly users to use and for maintainers to be uploaded to pypi. I'm not sure if this aligned with your workflow. @LeiWang1999
    2. And do we need different release for CUDA 12/13?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d3415f and 96030a8.

📒 Files selected for processing (1)
  • .github/workflows/dist.yml (1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/dist.yml

53-53: property "repair" is not defined in object type {ls-whl: {conclusion: string; outcome: string; outputs: {string => string}}}

(expression)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 110e27a and cf0edde.

📒 Files selected for processing (1)
  • .github/workflows/dist.yml (1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/dist.yml

21-21: label "macos-16" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-test-metal
  • GitHub Check: build-wheels (ubuntu-22.04-arm)
  • GitHub Check: build-wheels (ubuntu-22.04)

Copy link
Member

@LeiWang1999 LeiWang1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LeiWang1999
Copy link
Member

  1. I think it’s currently difficult to have a single wheel that supports both CUDA 12 and CUDA 13, and it seems that there aren’t many users on CUDA 13 yet. We will hold off on providing a CUDA 13 release until there is clear demand from the community.
  2. We do have a plan to migrate to tvm_ffi — see issue Plan for TileLang integration with TVM-FFI #970.
  3. Phasing out the JIT compilation of cython sounds good to me, as re‑compiling Cython on the fly is a very rare requirement during the development, not to mention users instead of developers.
  4. The daily build CI looks very helpful.

@LeiWang1999 LeiWang1999 merged commit d89ba5b into tile-ai:main Oct 13, 2025
6 of 8 checks passed
@oraluben oraluben deleted the sk-build-core branch October 23, 2025 03:58
RubiaCx pushed a commit to RubiaCx/tilelang that referenced this pull request Nov 24, 2025
* cleanup

* init

* build first wheel that may not work

* build cython ext

* fix tvm build

* use sabi

* update rpath to support auditwheel

* pass editible build

* update ci

* fix warnings

* do not use ccache in self host runner

* test local uv cache

* test pip index

* update lib search to respect new lib location

* fix

* update ci

* enable cuda by default

* update src map

* fix

* fix

* fix

* Generate version with backend and git information at build time

* copy tvm_cython to wheels

* fix tvm lib search

* fmt

* remove unused

* auto detect ccache

* add back backend-related files

* remove jit cython adaptor to simplify code

* fmt

* fix ci

* ci fix 2

* ci fix 3

* workaround metal

* ci fix 4

* fmt

* fmt

* Revert "ci fix 4"

This reverts commit d1de829.

* tmp

* fix metal

* trivial cleanup

* add detailed build-time version for cuda

* add back mlc

* Restore wheel info and other trivial updates

* update

* fix cuda

* upd

* fix metal ci

* test for ga build

* test for nvidia/cuda

* test ubuntu 20

* fix

* fix

* Do not use `uv build`

* fix

* fix

* log toolchain version

* merge wheel

* update

* debug

* fix

* update

* skip rocm

* update artifacts each

* fix

* fix

* add mac

* fix cache

* fix cache

* fix cache

* reset and add comment

* upd

* fix git version

* update deps

* trivial update

* use in-tree build dir and install to src to speedup editable build

* Revert "use in-tree build dir and install to src to speedup editable build"

This reverts commit 6ab87b0.

* add build-dir

* update docs

* remove old scrips

* [1/n] cleanup scripts

* [Lint]: [pre-commit.ci] auto fixes [...]

* fix and update

* wait for tvm fix

* revert some tmp fix

* fix

* fix

* spell

* doc update

* test cibuildwheel

* fix and test macos on ci

* Update .github/workflows/dist.yml

Co-authored-by: Xuehai Pan <[email protected]>

* fix

* test ga event

* cleanup

* bump tvm to support api3

* test final version

* add cron

* Update .github/workflows/dist.yml

Co-authored-by: Xuehai Pan <[email protected]>

* fix

* test ccache for metal cibuildwheel

* test newer macos

* finish

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Xuehai Pan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Migrate to scikit-build-core ARM wheels

3 participants