feat: Expose BF16 precision in TensorRT#328
feat: Expose BF16 precision in TensorRT#328whoisj merged 3 commits intotriton-inference-server:mainfrom
Conversation
|
Adding @yinggeh to review this. I'm not as familiar w/ ONNX->TRT as I should be. |
Not familiar either but looks like a small change. I will take a look today. |
|
@dwyatte, please make sure you've completed the contribution requirements: https://github.com/triton-inference-server/server?tab=readme-ov-file#contributing. Thank you. |
@whoisj Block (my corporate entity) has previously completed the CLA here, but let me know if I need to personally submit something too |
Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>
Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>
|
Please update PR title and description using the template https://github.com/triton-inference-server/server/blob/main/.github/PULL_REQUEST_TEMPLATE/pull_request_template_external_contrib.md. Fill n/a for any field doesn't apply |
@yinggeh Done! |
|
@whoisj Is our client eligible for contributing now? |
|
@whoisj Can I merge? |
|
@yinggeh as far as I know, yes. |
What does the PR do?
BF16 was added to the ONNX runtime TensorRT EP in microsoft/onnxruntime#24743, this PR should expose it to Triton's ONNX backend
Checklist
Agreement
<commit_type>: <Title>pre-commit install, pre-commit run --all)Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
n/a
Where should the reviewer start?
src/onnxruntime.cc
Test plan:
n/a
Caveats:
n/a
Background
See microsoft/onnxruntime#24743 for more info
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)