Replies: 1 comment
-
I have verified that model can be LLMEngine. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I believe
AsyncLLMEngine
was removed sometime ago.I was wondering if there is support for loading a built engine and support asynchronous inference requests.
I have seen that we have this https://github.com/NVIDIA/TensorRT-LLM/blob/main/tests/hlapi/test_llm.py#L288 but this is not for the built engine?
Beta Was this translation helpful? Give feedback.
All reactions