-
Couldn't load subscription status.
- Fork 1.8k
[None][feat] KV Cache Connector API #6488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
55 commits
Select commit
Hold shift + click to select a range
da1cbe4
Initial cpp binding stuff
jthomson04 4bd4df4
Basic connector tests
jthomson04 f83f0ed
Hook into torch runtime and py executor
jthomson04 d4e5178
Expose block pools as torch tensor
jthomson04 0c9fa7a
Little fixes
jthomson04 229e6c5
Scheduler Output bindings
jthomson04 614cb01
more little fixes - dont instantiate twice
jthomson04 6c26369
MEGA REFACTOR, move scheduler and worker into their own class, do ini…
jthomson04 d545a5d
Get num new matched tokens
jthomson04 9a1ba68
Suspend requests for async onboard
jthomson04 1f0a35b
async load and resume
jthomson04 e75e6c4
Little cleanup
jthomson04 66aa5a7
scheduler output for build_connector_meta
jthomson04 2f80a23
Worker-side hooks
jthomson04 a521b6a
Move a ton of stuff out of c++ into python
jthomson04 b80ed13
small refactorings and docs
jthomson04 50bcec3
A whole bunch of unit tests
jthomson04 e305010
precommit
jthomson04 fe45192
Fix wait_for_save
jthomson04 7ca84a2
start on integration tests
jthomson04 b85f749
Integration tests for async save and load
jthomson04 e16a38d
Simplify add token stuff
jthomson04 7081fe7
Tests for scheduler metadata
jthomson04 7d7dabe
Chunked prefill tests
jthomson04 65f58a4
simplify register_kv_caches handling
jthomson04 4140d52
remove changes to add token and update token
jthomson04 812fcf4
add support for the overlap scheduler + little refactoring
jthomson04 7b3795f
little cleanup
jthomson04 48e08ed
Little refactor, provide kv cache as a single contiguous tensor
jthomson04 1c3fe6f
Gate cuda graph support
jthomson04 914b34b
Include cache block ids in request_finished
jthomson04 5a5ea47
Little bugfixes and implement a basic example
jthomson04 2056e70
Address reviewer comments
jthomson04 96b71c4
more improvements + refactoring + docstrings
jthomson04 d0ad8a6
Nanobind support (finally)
jthomson04 c6afb96
Merge remote-tracking branch 'origin/main' into jthomson04/connector-api
jthomson04 b7d2ee6
coderabbit + refactor
jthomson04 03fa470
CI Integration, only support guarantee no evict, various coderabbit s…
jthomson04 921dd94
update state after alloc
jthomson04 35eeb02
Fix scheduler output
jthomson04 0fa51a7
Merge branch 'main' into jthomson04/connector-api
Tabrizian b142dde
fix license headers
jthomson04 36b1d0b
Merge branch 'main' into jthomson04/connector-api
Tabrizian 0b210f0
fix tests and test list
jthomson04 209052a
Dont pass connector manager through add_sequence
jthomson04 5059269
Merge remote-tracking branch 'origin/main' into jthomson04/connector-api
jthomson04 6018fc9
Merge remote-tracking branch 'origin/main' into jthomson04/connector-api
jthomson04 df6350d
Init scheduler and worker concurrently
jthomson04 f24eff2
Merge branch 'main' into jthomson04/connector-api
jthomson04 d5f7f1d
maybe fix CI
jthomson04 ebfe401
Add fix for llm stability
jthomson04 f9a3960
Merge remote-tracking branch 'origin/main' into jthomson04/connector-api
jthomson04 60b3ad9
Merge branch 'main' into jthomson04/connector-api
jthomson04 a383d03
Dont call request_finished unless request has already been scheduled
jthomson04 19b03c4
Merge remote-tracking branch 'origin/main' into jthomson04/connector-api
jthomson04 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| /* | ||
| * Copyright (c) 2022-2024, NVIDIA CORPORATION. All rights reserved. | ||
| * | ||
| * Licensed under the Apache License, Version 2.0 (the "License"); | ||
| * you may not use this file except in compliance with the License. | ||
| * You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| #pragma once | ||
jthomson04 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| #include "tensorrt_llm/batch_manager/common.h" | ||
| #include "tensorrt_llm/batch_manager/llmRequest.h" | ||
| #include "tensorrt_llm/runtime/common.h" | ||
|
|
||
| #include <utility> | ||
| #include <vector> | ||
|
|
||
| using SizeType32 = tensorrt_llm::runtime::SizeType32; | ||
| using RequestIdType = tensorrt_llm::batch_manager::LlmRequest::RequestIdType; | ||
|
|
||
| /// See tensorrt_llm/_torch/pyexecutor/connector.py for details on the Connector API. | ||
|
|
||
| namespace tensorrt_llm::batch_manager::kv_connector | ||
| { | ||
|
|
||
| /// @brief The KV connector manager. This is passed into the C++ KV Cache Manager when adding sequences. | ||
| class KvCacheConnectorManager | ||
Shixiaowei02 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| { | ||
| public: | ||
| KvCacheConnectorManager() = default; | ||
| virtual ~KvCacheConnectorManager() = default; | ||
|
|
||
| /// @brief Handle the getNumNewMatchedTokens call inside the C++ KV Cache Manager. | ||
| /// @return The number of tokens that can be loaded from remote KV cache. | ||
| virtual SizeType32 getNumNewMatchedTokens(LlmRequest const& request, SizeType32 numComputedTokens) = 0; | ||
| }; | ||
|
|
||
| } // namespace tensorrt_llm::batch_manager::kv_connector | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
48 changes: 48 additions & 0 deletions
48
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheConnector.cpp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| /* | ||
| * SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| * SPDX-License-Identifier: Apache-2.0 | ||
| * | ||
| * Licensed under the Apache License, Version 2.0 (the "License"); | ||
| * you may not use this file except in compliance with the License. | ||
| * You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
jthomson04 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| #include "tensorrt_llm/nanobind/batch_manager/kvCacheConnector.h" | ||
|
|
||
| #include <nanobind/trampoline.h> | ||
| #include <torch/extension.h> | ||
|
|
||
| namespace | ||
| { | ||
| using KvCacheConnectorManager = tensorrt_llm::batch_manager::kv_connector::KvCacheConnectorManager; | ||
|
|
||
| namespace tb = tensorrt_llm::batch_manager; | ||
|
|
||
| class PyKvCacheConnectorManager : KvCacheConnectorManager | ||
| { | ||
| public: | ||
| NB_TRAMPOLINE(KvCacheConnectorManager, 1); | ||
|
|
||
| SizeType32 getNumNewMatchedTokens(tb::LlmRequest const& request, SizeType32 numComputedTokens) override | ||
| { | ||
| NB_OVERRIDE_PURE_NAME("get_num_new_matched_tokens", getNumNewMatchedTokens, request, numComputedTokens); | ||
| } | ||
| }; | ||
jthomson04 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| } // namespace | ||
|
|
||
| void tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManagerConnectorBindings::initBindings(nb::module_& m) | ||
| { | ||
| nb::class_<tb::kv_connector::KvCacheConnectorManager, PyKvCacheConnectorManager>(m, "KvCacheConnectorManager") | ||
| .def(nb::init<>()) | ||
| .def("get_num_new_matched_tokens", &tb::kv_connector::KvCacheConnectorManager::getNumNewMatchedTokens, | ||
| nb::arg("request"), nb::arg("num_computed_tokens")); | ||
| } | ||
39 changes: 39 additions & 0 deletions
39
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheConnector.h
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| /* | ||
| * SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| * SPDX-License-Identifier: Apache-2.0 | ||
| * | ||
| * Licensed under the Apache License, Version 2.0 (the "License"); | ||
| * you may not use this file except in compliance with the License. | ||
| * You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| #pragma once | ||
|
|
||
| #include "tensorrt_llm/batch_manager/kvCacheConnector.h" | ||
| #include <nanobind/nanobind.h> | ||
|
|
||
| namespace nb = nanobind; | ||
|
|
||
| namespace tensorrt_llm::batch_manager::kv_cache_manager | ||
| { | ||
| class KVCacheManagerConnectorBindings | ||
| { | ||
| public: | ||
| static void initBindings(nb::module_& m); | ||
| }; | ||
| } // namespace tensorrt_llm::batch_manager::kv_cache_manager | ||
|
|
||
| namespace tensorrt_llm::pybind::batch_manager::kv_connector | ||
| { | ||
|
|
||
| using namespace tensorrt_llm::batch_manager::kv_connector; | ||
|
|
||
| } // namespace tensorrt_llm::pybind::batch_manager::kv_connector |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Copyright looks incorrect