[Pipeline Refactor] async #1380

dsikka · 2023-11-02T20:14:27Z

Summary

Note: this PR has been updated to use the new operator scheduling
Update pipeline to add a run_async function to be used by the deepsparse server. This introduces an asyncio loop to execute each operator and allows us to await operation completion, such that multiple requests can be accepted without blocking
Update to make sure run_async can handle multiple prompts/works with split/join

Testing

The following script makes multiple calls (with different number of prompts) using the run_async function

import asyncio

from deepsparse.transformers.pipelines.text_generation import TextGenerationInput
from deepsparse.v2.text_generation.pipeline import TextGenerationPipeline
from deepsparse.v2.utils import InferenceState


model_path = "hf:mgoin/TinyStories-1M-deepsparse"
pipeline = TextGenerationPipeline(model_path, prompt_sequence_length=3)

prompts = [["Hello there!", "The sun shined bright"], ["The dog barked"]]


async def func(index):
    print("Hello World", index)
    inference_state = InferenceState()
    inference_state.create_state({})
    pipeline_state = pipeline.pipeline_state

    input_value = TextGenerationInput(
        prompt=prompts[index], generation_kwargs={"max_length": 10}
    )
    return await pipeline.run_async(
        input_value,
        pipeline_state=pipeline_state,
        inference_state=inference_state
    )

async def main():
    print(await asyncio.gather(*[func(i) for i in range(len(prompts))]))

asyncio.run((main()))

Output:

Hello World 0
Hello World 1
[TextGenerationOutput(created=datetime.datetime(2023, 11, 10, 14, 27, 43, 825126), prompts=['Hello there!', 'The sun shined bright'], generations=[GeneratedText(text='”\n\nThe little girl was so excited', score=None, finished=True, finished_reason='length'), GeneratedText(text=' and the sun was shining brightly.\n\nThe', score=None, finished=True, finished_reason='length')]), TextGenerationOutput(created=datetime.datetime(2023, 11, 10, 14, 27, 43, 825809), prompts=['The dog barked'], generations=[GeneratedText(text=' and ran away. He was so happy that he', score=None, finished=True, finished_reason='length')])]

src/deepsparse/v2/pipeline.py

src/deepsparse/v2/schedulers/scheduler_group.py

src/deepsparse/v2/schedulers/scheduler.py

tdg5

I made some suggestions and nits, but overall looks good to me.

Also, in case you still had any doubts, I convinced myself that this code is not only non-blocking, but also runs concurrently by increasing OperatorScheduler's max_workers to 4 and adding a few more prompts which yielded some decently shuffled finishing orders such as

I do have some bad news though, I hit an Assertion failure in the engine a few times (~ 1 in 10 times maybe? As far as I can tell, only when I reconfigured max_workers=4) when test driving your code. I leave it to you to figure out if the problem is with this code, with the engine, or with the model maybe? I'd guess the engine, but what do I know 🙈

Assertion at src/lib/engine/units/cached_concat_gemm.cpp:593

Backtrace:
 0# wand::detail::assert_fail(char const*, char const*, int) in /home/tdg5/src/nm/deepsparse/src/deepsparse/avx2/libonnxruntime.so.1.15.1
 1# 0x00007f441a52d2ca:
    [5b5d415c415d415ec3ba51020000488d3551e927fe488d3d294619fee815b7a2]
    [01ba17000000488d3539e927fe488d3d8a451afee8fdb6a2014989c4e941ffff]
 2# 0x00007f441a2f02cd:
    [cccccc4883c718e9d7162400cccccccccccccc41544883c6184989fce882cf23]
    [004c89e0415cc3cccccccccccccccccccccccc4883c718e9c71b2500cccccccc]
 3# 0x00007f441a620460:
    [c4185b5dc3cccccccccccccccccccccc4154488b064989fc488b30488b06ff50]
    [204c89e0415cc3cccccccccccccccccc534883ec10488b1f480fbe43403cff74]
 4# 0x00007f441a6d1c2b:
    [49837f10007460488b5c24084c8d4424104c89e94c89e24c89fe4889df41ff57]
    [184883c4284889d85b5d415c415d415e415fc366904829c64c89efe8c5fcffff]
 5# 0x00007f441a6e49e3:
    [4d184c89ea4c89ff488d73084c8d4440fd49c1e0034c01c14c034530e8bcd0fe]
    [ff488b13488b4500488d04d0488b54240848c744240800000000488b38488910]
 6# 0x00007f441a6e05b8:
    [4883ec304c8b37498b7e18e8c871ffff498b5618498b7638498b7e28e8974300]
    [00498b5638488d442408488944241848895424104889542420498b1648894424]
 7# 0x00007f4419cc0d43:
    [800000000049837e28007478488d942480000000488d742460498d7e1841ff56]
    [30e927fdffff0f1f8000000000b8fffffffff00fc14508e9e2fcfffff0834008]
 8# 0x00007f4418f9f667:
    [2f0303000f855803000083400801488d8c24800000004c89f24c89fee82810d2]
    [00488bbc24880000004885ff7405e84603ecff488b5c2468488b6c24604839eb]
 9# 0x00007f4418fa97e2:
    [00834008014c8b442410488d7c2450488d542470488d8c2490000000e80d5aff]
    [ff4c8b6424684d85e47426488b1d7c8d02034885db0f8593020000418b442408]
10# 0x00007f44198092cd:
    [380f84f9020000488d4c2440488d5424384c89e7498d71204c8d44246041ff51]
    [38488d65d84c89e05b415c415d415e415f5dc34c8db424d00000004c8d6c2460]
11# 0x00007f441988ddaf:
    [020000000000000f850c0b0000498b06488dbc24280100004c89ea4c89f6ff50]
    [10488b8424280100004c8ba4241001000048c784242801000000000000488984]
12# 0x00007f4419886a4c:
    [cccccccc41544989f24c89c84889d6498b52084d89c14989c04989fce8037000]
    [00488b44241049833c24000f94004c89e0415cc341544989fc488b76104889d7]
13# 0x00007f44198b2194:
    [ff48c78528ffffff00000000488b00488b34d8488b0652488b9518ffffffff50]
    [10584c8b8528ffffff488b8530ffffff5a48c78530ffffff0000000048898528]
14# 0x00007f441988c3ca:
    [10c5f97f8dc0fbffff4d85ff0f85e4feffff4c8db5b0fbffff4c89f7e885c6ff]
    [ffe93cffffff488bbd78fbffffe8645002004c8bbd78fbffff4c89ffe8255002]
15# 0x00007f44198d2911:
    [b550feffffff7520ff7530ff7528ff7510ffb530feffff524c89d250e87e8cfb]
    [ff488b9d30ffffff4883c4504885db0f84ba040000bf28000000e82011600248]
16# 0x00007f44198d5c34:
    [78ffffff0100000048c7458000000000c745900000803fc5fa7f4598e8bbc7ff]
    [ff488b45804883c4404889c74885c0752eeb4a660f1f840000000000488d7f10]
17# 0x00007f44198d629e:
    [8b8424f800000050488b4c24504c8b442458488b542448488b742440e8a1f1ff]
    [ff4881c4c80000004c89e05b5d415c415d415e415fc34889c5e915feffffcccc]
18# 0x00007f4418e8e213:
    [ffff488bb0800600008b80c8010000ffb580faffffffb5c0faffff50e8ec7ea4]
    [00488b8508fbffff488bbdf8faffff4883c42048c78508fbffff000000004889]
19# 0x00007f4418e65899:
    [488bb538feffff6a00488dbd88feffff4c89e1524c89ea5041534152e8367802]
    [00488b8588feffff488bbd80feffff4883c43048c78588feffff000000004889]
20# deepsparse::ort_engine::ort_execute(bool, std::vector<Ort::Value, std::allocator<Ort::Value> > const&, std::vector<Ort::Value, std::allocator<Ort::Value> >&) in /home/tdg5/src/nm/deepsparse/src/deepsparse/avx2/libdeepsparse.so                                                                                                                                 
21# deepsparse::ort_engine::execute_common(bool, std::vector<Ort::Value, std::allocator<Ort::Value> > const&, std::vector<Ort::Value, std::allocator<Ort::Value> >&, std::vector<std::unique_ptr<wand::engine::bench::benchmark_info, std::default_delete<wand::engine::bench::benchmark_info> >, std::allocator<std::unique_ptr<wand::engine::bench::benchmark_info, st
d::default_delete<wand::engine::bench::benchmark_info> > > >&, std::shared_ptr<wand::kv_cache_t>, bool) in /home/tdg5/src/nm/deepsparse/src/deepsparse/avx2/libdeepsparse.so
22# deepsparse::ort_engine::execute(std::vector<Ort::Value, std::allocator<Ort::Value> > const&, std::vector<std::unique_ptr<wand::engine::bench::benchmark_info, std::default_delete<wand::engine::bench::benchmark_info> >, std::allocator<std::unique_ptr<wand::engine::bench::benchmark_info, std::default_delete<wand::engine::bench::benchmark_info> > > >&, std::
shared_ptr<wand::kv_cache_t>) in /home/tdg5/src/nm/deepsparse/src/deepsparse/avx2/libdeepsparse.so
23# 0x00007f44381b4a87:
    [0000004c8dac24d0000000488b5424204889ef48894424184d89e84889c141ff]
    [d1488b4c24704c8b642478c5f9efc0488b842480000000c5f96fa424f0000000]
24# 0x00007f44381b525b:
    [741248833d33e10c00000f855501000083400801488d7c24104889e1e8e4f6ff]
    [ff488b442410488b6c241848c744241000000000498904244885ed0f84a60000]
25# 0x00007f4438197996:
    [89e6c5f9efc04c89f14c89ea4889ee488b40384c89e7c5f97f0c24c5fa7f03ff]
    [d0488b7c24084885ff7405e8fa1a0e004883c4204c89e05b5d415c415d415ec3]
26# 0x00007f44381f14ea:
    [1e090000755083400801488b742460488d7c2408488d4c2410488d54244841ff]
    [d0488b7c24184885ff7405e8a67f08004c8b6424084d85e40f840bffffff4983]
27# 0x00007f44381bd7ab:
    [8db424600100004889742438e844bf0b004c8db424a00100004c89f741ff5424]
    [30488b7c24384889c34889442478e8b24dffff4883fb010f85cb1d000048837c]
28# 0x0000000000548e53:
    [010f845dbef3ffa8204989fc4c8b41080f8543bef3ffa802488b7f18742041ff]
    [d04889c34885db488b5558741e4885d20f853cbef3ff4889d85b5d415cc34885]
29# _PyObject_MakeTpCall in python
30# 0x000000000042381d:
    [48b8ffffffffffffff7f4989c84889fa4c21f04889ee4c89e74889c1e862671d]
    [004989c6e993540e00488d5424284531c0b9010000004889ee4c89e7e842671d]

tdg5 · 2023-11-13T19:32:49Z

Re: The assertion error I hit earlier... I ran 10 prompts 700 times w/ max_workers=1 and I never hit the assertion error. So the related bug has definitely got something to do with concurrency, but I can't offer any more insight into whether it is an issue with the engine or with this code when run with concurrency.

dsikka · 2023-11-21T19:11:38Z

since the way we're scheduling operators has changed, need to reassess the async functionality

dsikka · 2023-11-27T21:45:49Z

This PR has been updated to use the new operator scheduling with the run async function.

tdg5 · 2023-11-29T22:15:30Z

src/deepsparse/v2/pipeline.py

+        loop = asyncio.get_running_loop()
+
+        next_step = self.router.START_ROUTE
+        operator_output = None


I'm not sure if there's any reason for an operator to have an output of None, but if so, you might consider using another sentinel value here and in the check below 🤷🏻

I guess you could also consider adding another local variable to avoid overloading the use of operator_output
something like

Suggested change

operator_output = None

operator_output = None

processed_start_route = False

tdg5 · 2023-11-29T22:16:25Z

src/deepsparse/v2/pipeline.py

+        operator_output = None
+
+        while next_step != self.router.END_ROUTE:
+            # Either a dictionary key or valid index


I think this comment refers to next_step maybe? If so I'd suggest moving the comment to be where next_step is first declared

tdg5 · 2023-11-29T22:29:01Z

src/deepsparse/v2/pipeline.py

+        while next_step != self.router.END_ROUTE:
+            # Either a dictionary key or valid index
+
+            if next_step == self.router.SPLIT_ROUTE:


This code flow is a bit funky, what do you think about making this an elif in the same block as if next_step == self.router.START_ROUTE: and adding another predicate like this one when deciding next_step?

I think that would look like this:

async def run_async(self, *args, inference_state: InferenceState, **kwargs): """ Run through the operators using the provided router and scheduler. The input to a given operator is the output of the previous operator. :param inference_state: inference_state for the pipeline. :param pipeline_state: pipeline_state for the pipeline. The values in the state are created during pipeline creation and are read-only during inference. """ loop = asyncio.get_running_loop() next_step = self.router.START_ROUTE operator_output = None while next_step != self.router.END_ROUTE: # Either a dictionary key or valid index if next_step == self.router.START_ROUTE: outputs = run_func( *args, func=self._scheduler_group.submit, operator=self.ops[next_step], inference_state=inference_state, pipeline_state=self.pipeline_state, loop=loop, **kwargs, ) await outputs operator_output = outputs.result() elif next_step == self.router.SPLIT_ROUTE: if operator_output is None: raise ValueError( f"{self.router.SPLIT_ROUTE} should appear after " f"{self.ROUTER.START_ROUTE}" ) operator_output = await self._apply_split( operator_output, inference_state, loop=loop ) else: outputs = self._run_next( inp=operator_output, next_step=next_step, inference_state=inference_state, loop=loop, ) await outputs operator_output = outputs.result() if next_step == self.router.SPLIT_ROUTE: next_step = self.router.route[self.router.JOIN_ROUTE] continue if isinstance(operator_output, tuple): state_update = operator_output[-1] operator_output = operator_output[0] next_step = self.router.next(next_step, self.ops, operator_output) if state_update: inference_state.update_state(state_update) return operator_output

Maybe a little easier to reason about, but maybe not.

tdg5

LGTM

src/deepsparse/v2/pipeline.py

* Pipelines Refactor - Initial Impl (#1287) * [Pipeline Refactor] Additional functionality, engine operator, linear router and image classification pipeline/operators/example (#1325) * initial functionality and working example with image classification * remove testing image * update args * initial functionality and working example with image classification * remove testing image * pr comments * defines schemas for operators and test * add image classification test, PR comments * fix input/output handling in pipeline and operator base classes to be more generic; remove context * add additional operator input message * typo fix * [v2] EngineOperator updates to make continuous batching easier (#1371) * [v2] EngineOperator updates to make continuous batching easier * test fixes * [Pipeline Refactor] Update routes, text generation initial functionality (#1348) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * [Pipeline Refactor] Additional Operators, Route update and completed generation functionality (#1356) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * initial functionality and working example with image classification * updates func * prompt inference, initial functionality * finish generation operators and update routes * further breakdown operators * add operators * fix can_operate condition * update can_operate to not rely on the inference_state * rebase + update * fix condition * fix capacity settting again * typo fixes * [Pipeline Refactor] Split/Join Functionality for multiple prompts (#1384) * add split/join functionality * update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function * process multiple generations * move map to base class * [Pipeline Refactor] Unit Testing for Text Generation Operators (#1392) * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * fix name * [Continuous Batching] Queue Implementation to support batching grouping and prioritization (#1373) * [Continuous Batching] Queue Implementation to support batching grouping and prioritization * has_key method * thread safety * add blocking option for pop_batch * update docstring * allow mutex to be shared across continuous batching objects * revert last commit * [Continuous Batching] Executor thread for running continuous batching (#1374) * [Continuous Batching] Executor thread for running continuous batching * quality * ensure that executor stops when main thread does - clean up test hack * [ContinuousBatching] ContinuousBatchingScheduler Implementation (#1375) * [ContinuousBatching] ContinuousBatchingScheduler Implementation * cleanup unnecessary stop condition * [continuous batching] singleton pattern for scheduler (#1391) * [continuous batching] singleton pattern for scheduler * catch from review * [Pipeline Refactor][Text-Generation] Create a helper function for creating engine_inputs (#1364) * rebasing off my initial commit * cleanups * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * Delete tests/deepsparse/v2/unit/text_generation/test_msic.py --------- Co-authored-by: Dipika Sikka <[email protected]> * [Pipeline Refactor][Text-Generation] Refactor `transformers` helpers functions (#1394) * add split/join functionality * update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function * process multiple generations * initial commit * fix error * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * Delete tests/deepsparse/v2/unit/text_generation/test_msic.py * pipeline runs, but incorrectly * Revert "pipeline runs, but incorrectly" This reverts commit 51c4ee6. * PR review comments --------- Co-authored-by: Dipika Sikka <[email protected]> * [Text Generation][V2] End-to-end tests (#1402) * initial commit * initial commit * its working now * beautification * thank you Dipika <3 * ready to review * [Pipeline Refactor][Text Generation][Continuous Batching] Integration (#1409) * update split/join * use map * update * run end-to-end * clean-up * fix bug with batch size, introduce SplitRoute dataclass * update tests to use new inputs/outputs * use the normal scheduler for internal kv_cache * add pipeline inpuits * clean-up * change engine type, update docstrings, update override function to be more generic * move subgraph functionality to its own function; clean-up cont batching in text gen pipeline * update linear pathway to also use subgraph execution * rebase fix * fix tests * [Pipeline Refactor] Operator Registry (#1420) * initial registry functionality * use sparsezoo mixin * [Pipeline Refactor] Fix Operator scheduling to fix issue with slow execution (#1453) * fix scheduling to fix issue with engine running very slowly; introduce new completed attribute for Subgraph instead of checking instance type * fix warning message * [Pipeline Refactor] Add `Pipeline.create` method to initialize pipelines (#1457) * add pipeline create method for pipeline creation using the operator registry * add instance check * [Pipeline Refactor] async (#1380) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * initial functionality and working example with image classification * updates func * prompt inference, initial functionality * finish generation operators and update routes * further breakdown operators * add operators * fix can_operate condition * update can_operate to not rely on the inference_state * rebase + update * fix condition * async initial functionality * fix capacity settting again * add blocking * more testing * update to use split/join * fix * rebase fix * remove index * change event loop * rebase fix * update async run to use new operator scheduling properly * rebase fixes (#1458) * more fixes (#1459) --------- Co-authored-by: Benjamin Fineran <[email protected]> Co-authored-by: Dipika Sikka <[email protected]>

* Pipelines Refactor - Initial Impl (#1287) * [Pipeline Refactor] Additional functionality, engine operator, linear router and image classification pipeline/operators/example (#1325) * initial functionality and working example with image classification * remove testing image * update args * initial functionality and working example with image classification * remove testing image * pr comments * defines schemas for operators and test * add image classification test, PR comments * fix input/output handling in pipeline and operator base classes to be more generic; remove context * add additional operator input message * typo fix * [v2] EngineOperator updates to make continuous batching easier (#1371) * [v2] EngineOperator updates to make continuous batching easier * test fixes * [Pipeline Refactor] Update routes, text generation initial functionality (#1348) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * [Pipeline Refactor] Additional Operators, Route update and completed generation functionality (#1356) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * initial functionality and working example with image classification * updates func * prompt inference, initial functionality * finish generation operators and update routes * further breakdown operators * add operators * fix can_operate condition * update can_operate to not rely on the inference_state * rebase + update * fix condition * fix capacity settting again * typo fixes * add split/join functionality * update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function * process multiple generations * initial commit * fix error * [Pipeline Refactor] Split/Join Functionality for multiple prompts (#1384) * add split/join functionality * update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function * process multiple generations * move map to base class * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * [Pipeline Refactor] Unit Testing for Text Generation Operators (#1392) * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * fix name * Delete tests/deepsparse/v2/unit/text_generation/test_msic.py * [Continuous Batching] Queue Implementation to support batching grouping and prioritization (#1373) * [Continuous Batching] Queue Implementation to support batching grouping and prioritization * has_key method * thread safety * add blocking option for pop_batch * update docstring * allow mutex to be shared across continuous batching objects * revert last commit * [Continuous Batching] Executor thread for running continuous batching (#1374) * [Continuous Batching] Executor thread for running continuous batching * quality * ensure that executor stops when main thread does - clean up test hack * [ContinuousBatching] ContinuousBatchingScheduler Implementation (#1375) * [ContinuousBatching] ContinuousBatchingScheduler Implementation * cleanup unnecessary stop condition * [continuous batching] singleton pattern for scheduler (#1391) * [continuous batching] singleton pattern for scheduler * catch from review * [Pipeline Refactor][Text-Generation] Create a helper function for creating engine_inputs (#1364) * rebasing off my initial commit * cleanups * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * Delete tests/deepsparse/v2/unit/text_generation/test_msic.py --------- Co-authored-by: Dipika Sikka <[email protected]> * pipeline runs, but incorrectly * it works for a single sequence * cleanup. now lets figure out how to run multiple sequences * [Pipeline Refactor][Text-Generation] Refactor `transformers` helpers functions (#1394) * add split/join functionality * update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function * process multiple generations * initial commit * fix error * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * Delete tests/deepsparse/v2/unit/text_generation/test_msic.py * pipeline runs, but incorrectly * Revert "pipeline runs, but incorrectly" This reverts commit 51c4ee6. * PR review comments --------- Co-authored-by: Dipika Sikka <[email protected]> * [Text Generation][V2] End-to-end tests (#1402) * initial commit * initial commit * its working now * beautification * thank you Dipika <3 * ready to review * integration tests pass * [Pipeline Refactor][Text Generation][Continuous Batching] Integration (#1409) * update split/join * use map * update * run end-to-end * clean-up * fix bug with batch size, introduce SplitRoute dataclass * update tests to use new inputs/outputs * use the normal scheduler for internal kv_cache * add pipeline inpuits * clean-up * change engine type, update docstrings, update override function to be more generic * move subgraph functionality to its own function; clean-up cont batching in text gen pipeline * update linear pathway to also use subgraph execution * rebase fix * fix tests * [Pipeline Refactor] Operator Registry (#1420) * initial registry functionality * use sparsezoo mixin * fix tricky rebase * one more cleanup * got tests to work after rebase. implementing SPLIT and JOIN in linearouter now * pipeline working, with GraphRouter. Needs some more testing * ready for review * cleanup * simplify after PR review round * [Pipeline Refactor] Fix Operator scheduling to fix issue with slow execution (#1453) * fix scheduling to fix issue with engine running very slowly; introduce new completed attribute for Subgraph instead of checking instance type * fix warning message * [Pipeline Refactor] Add `Pipeline.create` method to initialize pipelines (#1457) * add pipeline create method for pipeline creation using the operator registry * add instance check * [Pipeline Refactor] async (#1380) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * initial functionality and working example with image classification * updates func * prompt inference, initial functionality * finish generation operators and update routes * further breakdown operators * add operators * fix can_operate condition * update can_operate to not rely on the inference_state * rebase + update * fix condition * async initial functionality * fix capacity settting again * add blocking * more testing * update to use split/join * fix * rebase fix * remove index * change event loop * rebase fix * update async run to use new operator scheduling properly * rebase fixes (#1458) * more fixes (#1459) * bring back functionalities that were lost in v2 during rebasing * Update src/deepsparse/transformers/helpers.py * ready for review * bring tests back" * quality * original readme * addressing Dipikas comments * Update src/deepsparse/transformers/pipelines/text_generation/pipeline_no_kv_cache.py * addressing PR review --------- Co-authored-by: Benjamin Fineran <[email protected]> Co-authored-by: Dipika Sikka <[email protected]>

dsikka added 26 commits November 1, 2023 12:36

initial functionality and working example with image classification

6c75b65

remove testing image

75de103

rebase fixes

aa5d885

initial functionality and working example with image classification

8cc63ee

text gen

ab2b711

updates func

00cb85e

prompt inference, initial functionality

5cf4b3f

remove image; update state docstring

1b951dc

Fix typo

809cfc1

add todo for split/join

6336d8e

remove context, clean-up args, remove prefill_preprocess_operaator

3f2193d

fix docstrings

216ceea

initial functionality and working example with image classification

02b74d4

updates func

37f090c

prompt inference, initial functionality

7bd25da

finish generation operators and update routes

98bc123

further breakdown operators

ef8277b

add operators

664abdd

fix can_operate condition

754ce2c

update can_operate to not rely on the inference_state

ed7cd58

rebase + update

5d56421

fix condition

5086e1f

async initial functionality

44156e6

fix capacity settting again

740eb67

Merge branch 'v2' into features/v2/generation

c991c30

Merge branch 'features/v2/generation' into features/v2/async

473a691

Base automatically changed from features/v2/generation to v2 November 3, 2023 15:15

dsikka added 3 commits November 3, 2023 11:18

Merge branch 'v2' into features/v2/async

8ed2a64

add blocking

c2666dd

more testing

59f69d3

dsikka requested review from bfineran, rahul-tuli and tdg5 November 13, 2023 15:29

bfineran approved these changes Nov 13, 2023

View reviewed changes