Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EAGLE-3698] - model upload handles multiple batch #227

Closed
wants to merge 10 commits into from
Closed

Conversation

phatvo9
Copy link
Contributor

@phatvo9 phatvo9 commented Nov 28, 2023

Why

For now model only predicts one by one input even sending a batch with size >1.

How

  • get_predictions() method in inference.py will take a list of inputs instead of single input.
  • Update examples with batch input.
  • Update doc

Other updates:

  • insert triton decorator function when initializing model repository, so user won't forget to do this.
  • enable infer param description from_kwargs.

Note:

Models generated by a lower version will not function on this version.

Copy link
Contributor

@HarmitMinhas96 HarmitMinhas96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is currently very large. Can it be broken down into more manageable chunks for review?
E.g.:

  1. Add multiple batches handling in one PR (maybe two if it can be logicially split)
  2. Add new model type example text-embedder
  3. Add new model type example multimodal-embedder
  4. Add vllm example

Or the model type and vllm examples can be added first if you prefer

@phatvo9
Copy link
Contributor Author

phatvo9 commented Nov 30, 2023

This PR is currently very large. Can it be broken down into more manageable chunks for review? E.g.:

  1. Add multiple batches handling in one PR (maybe two if it can be logicially split)
  2. Add new model type example text-embedder
  3. Add new model type example multimodal-embedder
  4. Add vllm example

Or the model type and vllm examples can be added first if you prefer

Broke down into #236 : update code for batching and update old examples and #237 added text-embedder and multimodal-embedder examples
Since vllm example is merged #217, so I added it together with old examples.

@phatvo9
Copy link
Contributor Author

phatvo9 commented Dec 4, 2023

Closed it since #236 and #237 merged

@phatvo9 phatvo9 closed this Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants