Skip to content

[Feature]: Add llama.cpp integration to TheRock#1652

Open
catan2001 wants to merge 7 commits into
ROCm:mainfrom
catan2001:llamacpp-integration
Open

[Feature]: Add llama.cpp integration to TheRock#1652
catan2001 wants to merge 7 commits into
ROCm:mainfrom
catan2001:llamacpp-integration

Conversation

@catan2001
Copy link
Copy Markdown
Contributor

@catan2001 catan2001 commented Oct 1, 2025

Motivation

Add llama.cpp as an external project to TheRock.

llama.cpp offers a lightweight, high-performance implementation for LLM inference, capable of running efficiently on both CPUs and ROCm-enabled GPUs. Including it in TheRock makes it easier for clients to experiment with and deploy language models, accelerating prototyping and evaluation.

Related to: #1449

Technical Details

llama.cpp is managed using the repo_management.py, llamacpp_repo.py and llamacpp_build.py scripts, in a manner similar to how PyTorch is handled as an external project. To execute it, simply run:

python llamacpp_repo.py
python llamacpp_build.py --rocm-dir <by default /opt/rocm> --gpu-targets <e.g. gfx1030,gfx1100>

For a detailed explanation of its workflow, refer to README.md. The project is currently built using python scripts by cloning upstream repo.

Test Plan

  • Built TheRock with llama.cpp included.
  • Ran a subset of llama.cpp test binaries to confirm successful compilation on Linux.
  • TODO: create smoke-tests and run full test suite.

Test Result

  • llama.cpp binaries compiled successfully on Linux.
  • Basic functionality verified via sample test runs.
  • Full validation and benchmarking still pending.

Submission Checklist

@catan2001 catan2001 changed the title [FEATURE]: Add llama.cpp integration to TheRock [Feature]: Add llama.cpp integration to TheRock Oct 2, 2025
@catan2001 catan2001 marked this pull request as draft October 2, 2025 12:27
@catan2001
Copy link
Copy Markdown
Contributor Author

I changed PR to Draft since currently llama.cpp is integrated with the submodules instead of it being managed via Python scripts.
I will re-open it for review once the script based management is in place.

@catan2001 catan2001 force-pushed the llamacpp-integration branch 2 times, most recently from f6f2acf to bf5ecf4 Compare October 13, 2025 12:59
@catan2001 catan2001 force-pushed the llamacpp-integration branch from 0cd9505 to 54c91cb Compare October 23, 2025 07:52
@catan2001
Copy link
Copy Markdown
Contributor Author

catan2001 commented Oct 23, 2025

I've updated the test scripts and verified that they work correctly on Linux. As part of this, I created a Python script that automates running the integrated llama.cpp tests, which could be useful for CI/CD workflows. The script currently skips test-backend-ops due to a floating-point exception issue on GFX1030 (tested using GFX1031 via HSA_OVERRIDE_GFX_VERSION), but everything runs smoothly on GFX1201 and GFX1100. It also skips all of the tests that require additional downloading since it would slow down the CI/CD pipeline.

I'm still unsure whether we need additional sanity checks. My initial thought was to verify that the built binaries are linked against HIP/ROCm, but that might be overkill. Perhaps running a minimal binary to confirm functionality would be sufficient. Also, I've switched to using the upstream llama.cpp repo instead of ROCm's fork, which was based on a much older commit. Maybe it would be the best if someone could check what I did already.

@catan2001 catan2001 marked this pull request as ready for review October 23, 2025 13:20
@catan2001
Copy link
Copy Markdown
Contributor Author

I did some changes related to llama.cpp PR. When someone has a chance, I'd appreciate a review.

Thanks in advance!

- Solved the libomp.so linking issue by setting CMake RPATHs:
    -DCMAKE_BUILD_RPATH=\"${THEROCK_BINARY_DIR}/external-builds/llama.cpp/llamacpp/dist/lib/llvm/lib\"
    -DCMAKE_INSTALL_RPATH=\"${THEROCK_BINARY_DIR}/external-builds/llama.cpp/llamacpp/dist/lib/llvm/lib\"

- Removed llama.cpp as a Git submodule.

- Integrated `llamacpp_repo.py` and `repo_management.py` scripts to manage llama.cpp."
…ists integration into TheRock.

- Removed CMakeLists.txt and artifact config for llama.cpp
- Added new Python-based build script: llamacpp_build.py
- Migrated and renamed repo management scripts:
  - llamacpp_repo.py
  - repo_management.py
- Restored root CMakeLists.txt to reflect new integration approach
…ma.cpp

- Modify llamacpp_build to support GPU, reordered llvm/amdgcn, added an option to build for multiple gpus and clean argument
- Modify llamacpp_repo to clone by default upstream llama.cpp
- Add logic to support Windows installation.
- Switch ROCm discovery mechanism in the build script to use `rocm-sdk`.
- Update ROCm target detection to also rely on `rocm-sdk`.
- Enforce the use of Ninja as the build system on Windows.
- Introduce the `LLAMACPP_BUILD_DIR` environment variable for `llamacpp_test.py`
  to locate builds placed outside the default directory.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: TODO

Development

Successfully merging this pull request may close these issues.

2 participants