This repository was archived by the owner on Jun 3, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 192
[Text Generation][V2] NonKVCachePipeline #1483
Merged
Merged
Changes from all commits
Commits
Show all changes
72 commits
Select commit
Hold shift + click to select a range
3e00175
Pipelines Refactor - Initial Impl (#1287)
bfineran 224e116
[Pipeline Refactor] Additional functionality, engine operator, linear…
dsikka 58b0758
[v2] EngineOperator updates to make continuous batching easier (#1371)
bfineran e1ff108
[Pipeline Refactor] Update routes, text generation initial functional…
dsikka 59457b7
[Pipeline Refactor] Additional Operators, Route update and completed …
dsikka f18d5f3
add split/join functionality
dsikka 2c4d231
update router to include split/join in parent class, refactor pipelin…
dsikka 672ca20
process multiple generations
dsikka 304eb35
initial commit
dbogunowicz 71515ac
fix error
dbogunowicz 6f1b175
Merge remote-tracking branch 'origin/features/v2/run_multiple' into f…
dbogunowicz 041174b
[Pipeline Refactor] Split/Join Functionality for multiple prompts (#1…
dsikka a508342
unit testing for text generation operators
dsikka cbb0e86
additional changes
dsikka 2541581
unit testing completion
dsikka 8c8989d
remove debug
dsikka f8d75e3
fix
dsikka fd1e466
add todo
dsikka 64c0552
more clean-up
dsikka 913665a
fix test
dsikka e15521f
add docstrings/comments
dsikka 379481e
break out tests to individual unit test files; add conftest and make …
dsikka a90a20a
Merge remote-tracking branch 'origin/features/v2/unit_testing' into f…
dbogunowicz 0a50d1d
[Pipeline Refactor] Unit Testing for Text Generation Operators (#1392)
dsikka c0c4240
Merge branch 'v2' into feature/damian/v2/factor_out_transformation_utils
dbogunowicz 4f248dd
Delete tests/deepsparse/v2/unit/text_generation/test_msic.py
dbogunowicz 20980a7
[Continuous Batching] Queue Implementation to support batching groupi…
bfineran d81012d
[Continuous Batching] Executor thread for running continuous batching…
bfineran 5c48505
[ContinuousBatching] ContinuousBatchingScheduler Implementation (#1375)
bfineran e1b7f37
[continuous batching] singleton pattern for scheduler (#1391)
bfineran 98f7a6d
Merge branch 'v2' into feature/damian/v2/factor_out_transformation_utils
dbogunowicz bbd534d
[Pipeline Refactor][Text-Generation] Create a helper function for cre…
dbogunowicz d1683b4
Merge branch 'v2' into feature/damian/v2/factor_out_transformation_utils
dbogunowicz 51c4ee6
pipeline runs, but incorrectly
dbogunowicz fa96efb
it works for a single sequence
dbogunowicz e41ddf8
cleanup. now lets figure out how to run multiple sequences
dbogunowicz b80a417
[Pipeline Refactor][Text-Generation] Refactor `transformers` helpers …
dbogunowicz 1b9238a
[Text Generation][V2] End-to-end tests (#1402)
dbogunowicz 89f11e5
Merge remote-tracking branch 'origin/v2' into feature/damian/no_kv_cache
dbogunowicz 9b441f5
integration tests pass
dbogunowicz c858b1f
[Pipeline Refactor][Text Generation][Continuous Batching] Integration…
dsikka bb3ff41
[Pipeline Refactor] Operator Registry (#1420)
dsikka 19434e7
Merge remote-tracking branch 'origin/v2' into feature/damian/no_kv_cache
dbogunowicz 90de2b3
fix tricky rebase
dbogunowicz 66ca295
one more cleanup
dbogunowicz dcded1d
got tests to work after rebase. implementing SPLIT and JOIN in linear…
dbogunowicz 127aa00
pipeline working, with GraphRouter. Needs some more testing
dbogunowicz af57698
ready for review
dbogunowicz 4397c80
cleanup
dbogunowicz 105b1d5
simplify after PR review round
dbogunowicz e15a24b
[Pipeline Refactor] Fix Operator scheduling to fix issue with slow ex…
dsikka 36f742b
[Pipeline Refactor] Add `Pipeline.create` method to initialize pipeli…
dsikka c0267d9
[Pipeline Refactor] async (#1380)
dsikka cfa61b7
Merge branch 'main' into v2
dsikka 2d9b0a1
rebase fixes (#1458)
dsikka a2aaa51
more fixes (#1459)
dsikka 39be9a0
Merge remote-tracking branch 'origin/v2' into feature/damian/no_kv_cache
dbogunowicz dcab3f9
bring back functionalities that were lost in v2 during rebasing
dbogunowicz e0a9dee
Merge remote-tracking branch 'origin/main' into feature/damian/no_kv_…
dbogunowicz bc1b11e
Merge remote-tracking branch 'origin/main' into feature/damian/no_kv_…
dbogunowicz e5d2f39
Update src/deepsparse/transformers/helpers.py
dbogunowicz 9ed5b06
ready for review
dbogunowicz 1ac1f5c
bring tests back"
dbogunowicz a734459
quality
dbogunowicz 60fa00f
original readme
dbogunowicz 14b0dc0
Merge branch 'main' into feature/damian/no_kv_cache_retrieve
dbogunowicz 9371990
addressing Dipikas comments
dbogunowicz 4eed463
Update src/deepsparse/transformers/pipelines/text_generation/pipeline…
dbogunowicz 0b17bd8
Merge branch 'main' into feature/damian/no_kv_cache_retrieve
dbogunowicz 111d533
addressing PR review
dbogunowicz 4370c52
Merge branch 'main' into feature/damian/no_kv_cache_retrieve
dbogunowicz 8d352fc
Merge branch 'main' into feature/damian/no_kv_cache_retrieve
dbogunowicz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
67 changes: 67 additions & 0 deletions
67
src/deepsparse/transformers/pipelines/text_generation/nl_engine_operator_no_kv_cache.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,67 @@ | ||
| # Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, | ||
| # software distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| from typing import Any | ||
|
|
||
| import numpy | ||
| from pydantic import BaseModel | ||
|
|
||
| from deepsparse.operators.engine_operator import EngineOperator, EngineOperatorInputs | ||
| from deepsparse.transformers.helpers import overwrite_transformer_onnx_model_inputs | ||
|
|
||
|
|
||
| __all__ = [ | ||
| "NLEngineOperatorNoCache", | ||
| "NLEngineInputsNoCache", | ||
| ] | ||
|
|
||
|
|
||
| class NLEngineInputsNoCache(BaseModel): | ||
| input_ids: Any | ||
| attention_mask: Any | ||
|
|
||
|
|
||
| class NLEngineOperatorNoCache(EngineOperator): | ||
| """ | ||
| Operator the Natural Language Engine, that operates without | ||
| KV Cache. This means that this operator merely maps input_ids | ||
| and attention_mask to logits | ||
| """ | ||
|
|
||
| input_schema = NLEngineInputsNoCache | ||
| output_schema = None | ||
|
|
||
| def __init__(self, sequence_length: int, **kwargs): | ||
| overwrite_transformer_onnx_model_inputs( | ||
| path=kwargs.get("model_path"), | ||
| batch_size=kwargs.get("batch_size", 1), | ||
| max_length=sequence_length, | ||
| ) | ||
| super().__init__(**kwargs) | ||
|
|
||
| def run(self, inp: NLEngineInputsNoCache, **kwargs) -> Any: | ||
| engine_inputs = [inp.input_ids, inp.attention_mask] | ||
| logits = ( | ||
| super() | ||
| .run(EngineOperatorInputs(engine_inputs=engine_inputs), **kwargs) | ||
| .get("engine_outputs") | ||
| ) | ||
|
|
||
| # By default, the engine outputs logits for all tokens in the sequence. | ||
| # Let's filter out the logits for the padding tokens. | ||
| logits = numpy.compress(inp.attention_mask.flatten(), logits[0], axis=1) | ||
|
|
||
| return {"logits": [logits], "kv_cache": None, "tokens": None}, { | ||
| "prompt_logits": [logits] | ||
| } |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.