Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.

Conversation

@dsikka
Copy link
Contributor

@dsikka dsikka commented Nov 2, 2023

Summary

  • Note: this PR has been updated to use the new operator scheduling
  • Update pipeline to add a run_async function to be used by the deepsparse server. This introduces an asyncio loop to execute each operator and allows us to await operation completion, such that multiple requests can be accepted without blocking
  • Update to make sure run_async can handle multiple prompts/works with split/join

Testing

The following script makes multiple calls (with different number of prompts) using the run_async function

import asyncio

from deepsparse.transformers.pipelines.text_generation import TextGenerationInput
from deepsparse.v2.text_generation.pipeline import TextGenerationPipeline
from deepsparse.v2.utils import InferenceState


model_path = "hf:mgoin/TinyStories-1M-deepsparse"
pipeline = TextGenerationPipeline(model_path, prompt_sequence_length=3)

prompts = [["Hello there!", "The sun shined bright"], ["The dog barked"]]


async def func(index):
    print("Hello World", index)
    inference_state = InferenceState()
    inference_state.create_state({})
    pipeline_state = pipeline.pipeline_state

    input_value = TextGenerationInput(
        prompt=prompts[index], generation_kwargs={"max_length": 10}
    )
    return await pipeline.run_async(
        input_value,
        pipeline_state=pipeline_state,
        inference_state=inference_state
    )

async def main():
    print(await asyncio.gather(*[func(i) for i in range(len(prompts))]))

asyncio.run((main()))

Output:

Hello World 0
Hello World 1
[TextGenerationOutput(created=datetime.datetime(2023, 11, 10, 14, 27, 43, 825126), prompts=['Hello there!', 'The sun shined bright'], generations=[GeneratedText(text='”\n\nThe little girl was so excited', score=None, finished=True, finished_reason='length'), GeneratedText(text=' and the sun was shining brightly.\n\nThe', score=None, finished=True, finished_reason='length')]), TextGenerationOutput(created=datetime.datetime(2023, 11, 10, 14, 27, 43, 825809), prompts=['The dog barked'], generations=[GeneratedText(text=' and ran away. He was so happy that he', score=None, finished=True, finished_reason='length')])]

Base automatically changed from features/v2/generation to v2 November 3, 2023 15:15
Copy link
Contributor

@tdg5 tdg5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some suggestions and nits, but overall looks good to me.

Also, in case you still had any doubts, I convinced myself that this code is not only non-blocking, but also runs concurrently by increasing OperatorScheduler's max_workers to 4 and adding a few more prompts which yielded some decently shuffled finishing orders such as

1                                                                                                                                                                                                                                                            
2                                                                                                                                                                                                                                                            
3                                                                                                                                                                                                                                                            
0                                                                                                                                                                                                                                                            
5                                                                                                                                                                                                                                                            
4                                                                                                                                                                                                                                                            
6                                                                                                                                                                                                                                                            
8                                                                                                                                                                                                                                                            
9                                                                                                                                                                                                                                                            
7

I do have some bad news though, I hit an Assertion failure in the engine a few times (~ 1 in 10 times maybe? As far as I can tell, only when I reconfigured max_workers=4) when test driving your code. I leave it to you to figure out if the problem is with this code, with the engine, or with the model maybe? I'd guess the engine, but what do I know 🙈

Assertion at src/lib/engine/units/cached_concat_gemm.cpp:593

Backtrace:
 0# wand::detail::assert_fail(char const*, char const*, int) in /home/tdg5/src/nm/deepsparse/src/deepsparse/avx2/libonnxruntime.so.1.15.1
 1# 0x00007f441a52d2ca:
    [5b5d415c415d415ec3ba51020000488d3551e927fe488d3d294619fee815b7a2]
    [01ba17000000488d3539e927fe488d3d8a451afee8fdb6a2014989c4e941ffff]
 2# 0x00007f441a2f02cd:
    [cccccc4883c718e9d7162400cccccccccccccc41544883c6184989fce882cf23]
    [004c89e0415cc3cccccccccccccccccccccccc4883c718e9c71b2500cccccccc]
 3# 0x00007f441a620460:
    [c4185b5dc3cccccccccccccccccccccc4154488b064989fc488b30488b06ff50]
    [204c89e0415cc3cccccccccccccccccc534883ec10488b1f480fbe43403cff74]
 4# 0x00007f441a6d1c2b:
    [49837f10007460488b5c24084c8d4424104c89e94c89e24c89fe4889df41ff57]
    [184883c4284889d85b5d415c415d415e415fc366904829c64c89efe8c5fcffff]
 5# 0x00007f441a6e49e3:
    [4d184c89ea4c89ff488d73084c8d4440fd49c1e0034c01c14c034530e8bcd0fe]
    [ff488b13488b4500488d04d0488b54240848c744240800000000488b38488910]
 6# 0x00007f441a6e05b8:
    [4883ec304c8b37498b7e18e8c871ffff498b5618498b7638498b7e28e8974300]
    [00498b5638488d442408488944241848895424104889542420498b1648894424]
 7# 0x00007f4419cc0d43:
    [800000000049837e28007478488d942480000000488d742460498d7e1841ff56]
    [30e927fdffff0f1f8000000000b8fffffffff00fc14508e9e2fcfffff0834008]
 8# 0x00007f4418f9f667:
    [2f0303000f855803000083400801488d8c24800000004c89f24c89fee82810d2]
    [00488bbc24880000004885ff7405e84603ecff488b5c2468488b6c24604839eb]
 9# 0x00007f4418fa97e2:
    [00834008014c8b442410488d7c2450488d542470488d8c2490000000e80d5aff]
    [ff4c8b6424684d85e47426488b1d7c8d02034885db0f8593020000418b442408]
10# 0x00007f44198092cd:
    [380f84f9020000488d4c2440488d5424384c89e7498d71204c8d44246041ff51]
    [38488d65d84c89e05b415c415d415e415f5dc34c8db424d00000004c8d6c2460]
11# 0x00007f441988ddaf:
    [020000000000000f850c0b0000498b06488dbc24280100004c89ea4c89f6ff50]
    [10488b8424280100004c8ba4241001000048c784242801000000000000488984]
12# 0x00007f4419886a4c:
    [cccccccc41544989f24c89c84889d6498b52084d89c14989c04989fce8037000]
    [00488b44241049833c24000f94004c89e0415cc341544989fc488b76104889d7]
13# 0x00007f44198b2194:
    [ff48c78528ffffff00000000488b00488b34d8488b0652488b9518ffffffff50]
    [10584c8b8528ffffff488b8530ffffff5a48c78530ffffff0000000048898528]
14# 0x00007f441988c3ca:
    [10c5f97f8dc0fbffff4d85ff0f85e4feffff4c8db5b0fbffff4c89f7e885c6ff]
    [ffe93cffffff488bbd78fbffffe8645002004c8bbd78fbffff4c89ffe8255002]
15# 0x00007f44198d2911:
    [b550feffffff7520ff7530ff7528ff7510ffb530feffff524c89d250e87e8cfb]
    [ff488b9d30ffffff4883c4504885db0f84ba040000bf28000000e82011600248]
16# 0x00007f44198d5c34:
    [78ffffff0100000048c7458000000000c745900000803fc5fa7f4598e8bbc7ff]
    [ff488b45804883c4404889c74885c0752eeb4a660f1f840000000000488d7f10]
17# 0x00007f44198d629e:
    [8b8424f800000050488b4c24504c8b442458488b542448488b742440e8a1f1ff]
    [ff4881c4c80000004c89e05b5d415c415d415e415fc34889c5e915feffffcccc]
18# 0x00007f4418e8e213:
    [ffff488bb0800600008b80c8010000ffb580faffffffb5c0faffff50e8ec7ea4]
    [00488b8508fbffff488bbdf8faffff4883c42048c78508fbffff000000004889]
19# 0x00007f4418e65899:
    [488bb538feffff6a00488dbd88feffff4c89e1524c89ea5041534152e8367802]
    [00488b8588feffff488bbd80feffff4883c43048c78588feffff000000004889]
20# deepsparse::ort_engine::ort_execute(bool, std::vector<Ort::Value, std::allocator<Ort::Value> > const&, std::vector<Ort::Value, std::allocator<Ort::Value> >&) in /home/tdg5/src/nm/deepsparse/src/deepsparse/avx2/libdeepsparse.so                                                                                                                                 
21# deepsparse::ort_engine::execute_common(bool, std::vector<Ort::Value, std::allocator<Ort::Value> > const&, std::vector<Ort::Value, std::allocator<Ort::Value> >&, std::vector<std::unique_ptr<wand::engine::bench::benchmark_info, std::default_delete<wand::engine::bench::benchmark_info> >, std::allocator<std::unique_ptr<wand::engine::bench::benchmark_info, st
d::default_delete<wand::engine::bench::benchmark_info> > > >&, std::shared_ptr<wand::kv_cache_t>, bool) in /home/tdg5/src/nm/deepsparse/src/deepsparse/avx2/libdeepsparse.so
22# deepsparse::ort_engine::execute(std::vector<Ort::Value, std::allocator<Ort::Value> > const&, std::vector<std::unique_ptr<wand::engine::bench::benchmark_info, std::default_delete<wand::engine::bench::benchmark_info> >, std::allocator<std::unique_ptr<wand::engine::bench::benchmark_info, std::default_delete<wand::engine::bench::benchmark_info> > > >&, std::
shared_ptr<wand::kv_cache_t>) in /home/tdg5/src/nm/deepsparse/src/deepsparse/avx2/libdeepsparse.so
23# 0x00007f44381b4a87:
    [0000004c8dac24d0000000488b5424204889ef48894424184d89e84889c141ff]
    [d1488b4c24704c8b642478c5f9efc0488b842480000000c5f96fa424f0000000]
24# 0x00007f44381b525b:
    [741248833d33e10c00000f855501000083400801488d7c24104889e1e8e4f6ff]
    [ff488b442410488b6c241848c744241000000000498904244885ed0f84a60000]
25# 0x00007f4438197996:
    [89e6c5f9efc04c89f14c89ea4889ee488b40384c89e7c5f97f0c24c5fa7f03ff]
    [d0488b7c24084885ff7405e8fa1a0e004883c4204c89e05b5d415c415d415ec3]
26# 0x00007f44381f14ea:
    [1e090000755083400801488b742460488d7c2408488d4c2410488d54244841ff]
    [d0488b7c24184885ff7405e8a67f08004c8b6424084d85e40f840bffffff4983]
27# 0x00007f44381bd7ab:
    [8db424600100004889742438e844bf0b004c8db424a00100004c89f741ff5424]
    [30488b7c24384889c34889442478e8b24dffff4883fb010f85cb1d000048837c]
28# 0x0000000000548e53:
    [010f845dbef3ffa8204989fc4c8b41080f8543bef3ffa802488b7f18742041ff]
    [d04889c34885db488b5558741e4885d20f853cbef3ff4889d85b5d415cc34885]
29# _PyObject_MakeTpCall in python
30# 0x000000000042381d:
    [48b8ffffffffffffff7f4989c84889fa4c21f04889ee4c89e74889c1e862671d]
    [004989c6e993540e00488d5424284531c0b9010000004889ee4c89e7e842671d]

@tdg5
Copy link
Contributor

tdg5 commented Nov 13, 2023

Re: The assertion error I hit earlier... I ran 10 prompts 700 times w/ max_workers=1 and I never hit the assertion error. So the related bug has definitely got something to do with concurrency, but I can't offer any more insight into whether it is an issue with the engine or with this code when run with concurrency.

@dsikka
Copy link
Contributor Author

dsikka commented Nov 21, 2023

since the way we're scheduling operators has changed, need to reassess the async functionality

@dsikka dsikka requested review from bfineran and tdg5 November 27, 2023 21:45
@dsikka
Copy link
Contributor Author

dsikka commented Nov 27, 2023

This PR has been updated to use the new operator scheduling with the run async function.

loop = asyncio.get_running_loop()

next_step = self.router.START_ROUTE
operator_output = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there's any reason for an operator to have an output of None, but if so, you might consider using another sentinel value here and in the check below 🤷🏻

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you could also consider adding another local variable to avoid overloading the use of operator_output
something like

Suggested change
operator_output = None
operator_output = None
processed_start_route = False

operator_output = None

while next_step != self.router.END_ROUTE:
# Either a dictionary key or valid index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment refers to next_step maybe? If so I'd suggest moving the comment to be where next_step is first declared

while next_step != self.router.END_ROUTE:
# Either a dictionary key or valid index

if next_step == self.router.SPLIT_ROUTE:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code flow is a bit funky, what do you think about making this an elif in the same block as if next_step == self.router.START_ROUTE: and adding another predicate like this one when deciding next_step?

I think that would look like this:

    async def run_async(self, *args, inference_state: InferenceState, **kwargs):
        """
        Run through the operators using the provided router and scheduler.
        The input to a given operator is the output of the previous operator.

        :param inference_state: inference_state for the pipeline.
        :param pipeline_state: pipeline_state for the pipeline. The values in the state
            are created during pipeline creation and are read-only during inference.
        """
        loop = asyncio.get_running_loop()

        next_step = self.router.START_ROUTE
        operator_output = None

        while next_step != self.router.END_ROUTE:
            # Either a dictionary key or valid index

            if next_step == self.router.START_ROUTE:
                outputs = run_func(
                    *args,
                    func=self._scheduler_group.submit,
                    operator=self.ops[next_step],
                    inference_state=inference_state,
                    pipeline_state=self.pipeline_state,
                    loop=loop,
                    **kwargs,
                )
                await outputs
                operator_output = outputs.result()
            elif next_step == self.router.SPLIT_ROUTE:
                if operator_output is None:
                    raise ValueError(
                        f"{self.router.SPLIT_ROUTE} should appear after "
                        f"{self.ROUTER.START_ROUTE}"
                    )

                operator_output = await self._apply_split(
                    operator_output, inference_state, loop=loop
                )
            else:
                outputs = self._run_next(
                    inp=operator_output,
                    next_step=next_step,
                    inference_state=inference_state,
                    loop=loop,
                )
                await outputs
                operator_output = outputs.result()
         
            if next_step == self.router.SPLIT_ROUTE:
                next_step = self.router.route[self.router.JOIN_ROUTE]
                continue

            if isinstance(operator_output, tuple):
                state_update = operator_output[-1]
                operator_output = operator_output[0]
            next_step = self.router.next(next_step, self.ops, operator_output)
            if state_update:
                inference_state.update_state(state_update)
        return operator_output   

Maybe a little easier to reason about, but maybe not.

Copy link
Contributor

@tdg5 tdg5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dsikka dsikka merged commit c0267d9 into v2 Dec 5, 2023
@dsikka dsikka deleted the features/v2/async branch December 5, 2023 19:24
bfineran added a commit that referenced this pull request Dec 6, 2023
* Pipelines Refactor - Initial Impl (#1287)

* [Pipeline Refactor] Additional functionality, engine operator, linear router and image classification pipeline/operators/example (#1325)

* initial functionality and working example with image classification

* remove testing image

* update args

* initial functionality and working example with image classification

* remove testing image

* pr comments

* defines schemas for operators and test

* add image classification test, PR comments

* fix input/output handling in pipeline and operator base classes to be more generic; remove context

* add additional operator input message

* typo fix

* [v2] EngineOperator updates to make continuous batching easier (#1371)

* [v2] EngineOperator updates to make continuous batching easier

* test fixes

* [Pipeline Refactor] Update routes, text generation initial functionality (#1348)

* initial functionality and working example with image classification

* remove testing image

* rebase fixes

* initial functionality and working example with image classification

* text gen

* updates func

* prompt inference, initial functionality

* remove image; update state docstring

* Fix typo

* add todo for split/join

* remove context, clean-up args, remove prefill_preprocess_operaator

* fix docstrings

* [Pipeline Refactor] Additional Operators, Route update and completed generation functionality (#1356)

* initial functionality and working example with image classification

* remove testing image

* rebase fixes

* initial functionality and working example with image classification

* text gen

* updates func

* prompt inference, initial functionality

* remove image; update state docstring

* Fix typo

* add todo for split/join

* remove context, clean-up args, remove prefill_preprocess_operaator

* fix docstrings

* initial functionality and working example with image classification

* updates func

* prompt inference, initial functionality

* finish generation operators and update routes

* further breakdown operators

* add operators

* fix can_operate condition

* update can_operate to not rely on the inference_state

* rebase + update

* fix condition

* fix capacity settting again

* typo fixes

* [Pipeline Refactor] Split/Join Functionality for multiple prompts (#1384)

* add split/join functionality

* update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function

* process multiple generations

* move map to base class

* [Pipeline Refactor] Unit Testing for Text Generation Operators (#1392)

* unit testing for text generation operators

* additional changes

* unit testing completion

* remove debug

* fix

* add todo

* more clean-up

* fix test

* add docstrings/comments

* break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed

* fix name

* [Continuous Batching] Queue Implementation to support batching grouping and prioritization (#1373)

* [Continuous Batching] Queue Implementation to support batching grouping and prioritization

* has_key method

* thread safety

* add blocking option for pop_batch

* update docstring

* allow mutex to be shared across continuous batching objects

* revert last commit

* [Continuous Batching] Executor thread for running continuous batching (#1374)

* [Continuous Batching] Executor thread for running continuous batching

* quality

* ensure that executor stops when main thread does - clean up test hack

* [ContinuousBatching] ContinuousBatchingScheduler Implementation (#1375)

* [ContinuousBatching] ContinuousBatchingScheduler Implementation

* cleanup unnecessary stop condition

* [continuous batching] singleton pattern for scheduler (#1391)

* [continuous batching] singleton pattern for scheduler

* catch from review

* [Pipeline Refactor][Text-Generation] Create a helper function for creating engine_inputs (#1364)

* rebasing off my initial commit

* cleanups

* unit testing for text generation operators

* additional changes

* unit testing completion

* remove debug

* fix

* add todo

* more clean-up

* fix test

* add docstrings/comments

* break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed

* Delete tests/deepsparse/v2/unit/text_generation/test_msic.py

---------

Co-authored-by: Dipika Sikka <[email protected]>

* [Pipeline Refactor][Text-Generation] Refactor `transformers` helpers functions (#1394)

* add split/join functionality

* update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function

* process multiple generations

* initial commit

* fix error

* unit testing for text generation operators

* additional changes

* unit testing completion

* remove debug

* fix

* add todo

* more clean-up

* fix test

* add docstrings/comments

* break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed

* Delete tests/deepsparse/v2/unit/text_generation/test_msic.py

* pipeline runs, but incorrectly

* Revert "pipeline runs, but incorrectly"

This reverts commit 51c4ee6.

* PR review comments

---------

Co-authored-by: Dipika Sikka <[email protected]>

* [Text Generation][V2] End-to-end tests (#1402)

* initial commit

* initial commit

* its working now

* beautification

* thank you Dipika <3

* ready to review

* [Pipeline Refactor][Text Generation][Continuous Batching] Integration (#1409)

* update split/join

* use map

* update

* run end-to-end

* clean-up

* fix bug with batch size, introduce SplitRoute dataclass

* update tests to use new inputs/outputs

* use the normal scheduler for internal kv_cache

* add pipeline inpuits

* clean-up

* change engine type, update docstrings, update override function to be more generic

* move subgraph functionality to its own function; clean-up cont batching in text gen pipeline

* update linear pathway to also use subgraph execution

* rebase fix

* fix tests

* [Pipeline Refactor] Operator Registry (#1420)

* initial registry functionality

* use sparsezoo mixin

* [Pipeline Refactor] Fix Operator scheduling to fix issue with slow execution  (#1453)

* fix scheduling to fix issue with engine running very slowly; introduce new completed attribute for Subgraph instead of checking instance type

* fix warning message

* [Pipeline Refactor] Add `Pipeline.create` method to initialize pipelines (#1457)

* add pipeline create method for pipeline creation using the operator registry

* add instance check

* [Pipeline Refactor] async (#1380)

* initial functionality and working example with image classification

* remove testing image

* rebase fixes

* initial functionality and working example with image classification

* text gen

* updates func

* prompt inference, initial functionality

* remove image; update state docstring

* Fix typo

* add todo for split/join

* remove context, clean-up args, remove prefill_preprocess_operaator

* fix docstrings

* initial functionality and working example with image classification

* updates func

* prompt inference, initial functionality

* finish generation operators and update routes

* further breakdown operators

* add operators

* fix can_operate condition

* update can_operate to not rely on the inference_state

* rebase + update

* fix condition

* async initial functionality

* fix capacity settting again

* add blocking

* more testing

* update to use split/join

* fix

* rebase fix

* remove index

* change event loop

* rebase fix

* update async run to use new operator scheduling properly

* rebase fixes (#1458)

* more fixes (#1459)

---------

Co-authored-by: Benjamin Fineran <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
dbogunowicz added a commit that referenced this pull request Jan 2, 2024
* Pipelines Refactor - Initial Impl (#1287)

* [Pipeline Refactor] Additional functionality, engine operator, linear router and image classification pipeline/operators/example (#1325)

* initial functionality and working example with image classification

* remove testing image

* update args

* initial functionality and working example with image classification

* remove testing image

* pr comments

* defines schemas for operators and test

* add image classification test, PR comments

* fix input/output handling in pipeline and operator base classes to be more generic; remove context

* add additional operator input message

* typo fix

* [v2] EngineOperator updates to make continuous batching easier (#1371)

* [v2] EngineOperator updates to make continuous batching easier

* test fixes

* [Pipeline Refactor] Update routes, text generation initial functionality (#1348)

* initial functionality and working example with image classification

* remove testing image

* rebase fixes

* initial functionality and working example with image classification

* text gen

* updates func

* prompt inference, initial functionality

* remove image; update state docstring

* Fix typo

* add todo for split/join

* remove context, clean-up args, remove prefill_preprocess_operaator

* fix docstrings

* [Pipeline Refactor] Additional Operators, Route update and completed generation functionality (#1356)

* initial functionality and working example with image classification

* remove testing image

* rebase fixes

* initial functionality and working example with image classification

* text gen

* updates func

* prompt inference, initial functionality

* remove image; update state docstring

* Fix typo

* add todo for split/join

* remove context, clean-up args, remove prefill_preprocess_operaator

* fix docstrings

* initial functionality and working example with image classification

* updates func

* prompt inference, initial functionality

* finish generation operators and update routes

* further breakdown operators

* add operators

* fix can_operate condition

* update can_operate to not rely on the inference_state

* rebase + update

* fix condition

* fix capacity settting again

* typo fixes

* add split/join functionality

* update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function

* process multiple generations

* initial commit

* fix error

* [Pipeline Refactor] Split/Join Functionality for multiple prompts (#1384)

* add split/join functionality

* update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function

* process multiple generations

* move map to base class

* unit testing for text generation operators

* additional changes

* unit testing completion

* remove debug

* fix

* add todo

* more clean-up

* fix test

* add docstrings/comments

* break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed

* [Pipeline Refactor] Unit Testing for Text Generation Operators (#1392)

* unit testing for text generation operators

* additional changes

* unit testing completion

* remove debug

* fix

* add todo

* more clean-up

* fix test

* add docstrings/comments

* break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed

* fix name

* Delete tests/deepsparse/v2/unit/text_generation/test_msic.py

* [Continuous Batching] Queue Implementation to support batching grouping and prioritization (#1373)

* [Continuous Batching] Queue Implementation to support batching grouping and prioritization

* has_key method

* thread safety

* add blocking option for pop_batch

* update docstring

* allow mutex to be shared across continuous batching objects

* revert last commit

* [Continuous Batching] Executor thread for running continuous batching (#1374)

* [Continuous Batching] Executor thread for running continuous batching

* quality

* ensure that executor stops when main thread does - clean up test hack

* [ContinuousBatching] ContinuousBatchingScheduler Implementation (#1375)

* [ContinuousBatching] ContinuousBatchingScheduler Implementation

* cleanup unnecessary stop condition

* [continuous batching] singleton pattern for scheduler (#1391)

* [continuous batching] singleton pattern for scheduler

* catch from review

* [Pipeline Refactor][Text-Generation] Create a helper function for creating engine_inputs (#1364)

* rebasing off my initial commit

* cleanups

* unit testing for text generation operators

* additional changes

* unit testing completion

* remove debug

* fix

* add todo

* more clean-up

* fix test

* add docstrings/comments

* break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed

* Delete tests/deepsparse/v2/unit/text_generation/test_msic.py

---------

Co-authored-by: Dipika Sikka <[email protected]>

* pipeline runs, but incorrectly

* it works for a single sequence

* cleanup. now lets figure out how to run multiple sequences

* [Pipeline Refactor][Text-Generation] Refactor `transformers` helpers functions (#1394)

* add split/join functionality

* update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function

* process multiple generations

* initial commit

* fix error

* unit testing for text generation operators

* additional changes

* unit testing completion

* remove debug

* fix

* add todo

* more clean-up

* fix test

* add docstrings/comments

* break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed

* Delete tests/deepsparse/v2/unit/text_generation/test_msic.py

* pipeline runs, but incorrectly

* Revert "pipeline runs, but incorrectly"

This reverts commit 51c4ee6.

* PR review comments

---------

Co-authored-by: Dipika Sikka <[email protected]>

* [Text Generation][V2] End-to-end tests (#1402)

* initial commit

* initial commit

* its working now

* beautification

* thank you Dipika <3

* ready to review

* integration tests pass

* [Pipeline Refactor][Text Generation][Continuous Batching] Integration (#1409)

* update split/join

* use map

* update

* run end-to-end

* clean-up

* fix bug with batch size, introduce SplitRoute dataclass

* update tests to use new inputs/outputs

* use the normal scheduler for internal kv_cache

* add pipeline inpuits

* clean-up

* change engine type, update docstrings, update override function to be more generic

* move subgraph functionality to its own function; clean-up cont batching in text gen pipeline

* update linear pathway to also use subgraph execution

* rebase fix

* fix tests

* [Pipeline Refactor] Operator Registry (#1420)

* initial registry functionality

* use sparsezoo mixin

* fix tricky rebase

* one more cleanup

* got tests to work after rebase. implementing SPLIT and JOIN in linearouter now

* pipeline working, with GraphRouter. Needs some more testing

* ready for review

* cleanup

* simplify after PR review round

* [Pipeline Refactor] Fix Operator scheduling to fix issue with slow execution  (#1453)

* fix scheduling to fix issue with engine running very slowly; introduce new completed attribute for Subgraph instead of checking instance type

* fix warning message

* [Pipeline Refactor] Add `Pipeline.create` method to initialize pipelines (#1457)

* add pipeline create method for pipeline creation using the operator registry

* add instance check

* [Pipeline Refactor] async (#1380)

* initial functionality and working example with image classification

* remove testing image

* rebase fixes

* initial functionality and working example with image classification

* text gen

* updates func

* prompt inference, initial functionality

* remove image; update state docstring

* Fix typo

* add todo for split/join

* remove context, clean-up args, remove prefill_preprocess_operaator

* fix docstrings

* initial functionality and working example with image classification

* updates func

* prompt inference, initial functionality

* finish generation operators and update routes

* further breakdown operators

* add operators

* fix can_operate condition

* update can_operate to not rely on the inference_state

* rebase + update

* fix condition

* async initial functionality

* fix capacity settting again

* add blocking

* more testing

* update to use split/join

* fix

* rebase fix

* remove index

* change event loop

* rebase fix

* update async run to use new operator scheduling properly

* rebase fixes (#1458)

* more fixes (#1459)

* bring back functionalities that were lost in v2 during rebasing

* Update src/deepsparse/transformers/helpers.py

* ready for review

* bring tests back"

* quality

* original readme

* addressing Dipikas comments

* Update src/deepsparse/transformers/pipelines/text_generation/pipeline_no_kv_cache.py

* addressing PR review

---------

Co-authored-by: Benjamin Fineran <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants