triton-inference-server / server Public

Notifications You must be signed in to change notification settings
Fork 1.5k
Star 8.5k

Code
Issues 608
Pull requests 60
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: triton-inference-server/server

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

608 Open 3,216 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Python backend with multiple instances cause unexpected and non-deterministic results

#7907 opened Dec 25, 2024 by NadavShmayo

MIG deployment of triton cause "CacheManager Init Failed. Error: -17"

#7906 opened Dec 25, 2024 by LSC527

Shared memory io bottleneck?

#7905 opened Dec 24, 2024 by wensimin

Support for guided decoding for vllm backend

#7897 opened Dec 20, 2024 by Inkorak

How Triton inference server always compare the current frame infer result with the previous one

#7893 opened Dec 19, 2024 by Komoro2023

async execute is not run concurrently

#7888 opened Dec 17, 2024 by ShuaiShao93

Unable to open shared memory region

#7887 opened Dec 17, 2024 by zjhong12581

Error when using ONNX with TensorRT (ORT-TRT) Optimization on Multi-GPU

#7885 opened Dec 16, 2024 by efajardo-nv

Manual warmup per model instance / specify warmup config dynamically using c api

#7884 opened Dec 16, 2024 by asaff1

Triton documentation inconsistency

#7878 opened Dec 12, 2024 by BenHaItay

Segfault/Coredump in grpc::ModelInferHandler::InferResponseComplete

#7877 opened Dec 12, 2024 by andyblackheel

Core was generated by /opt/tritonserver/backends/python/triton_python_backend_stub

#7875 opened Dec 12, 2024 by powerpistn

No content returned with OpenAI-Compatible Frontend Beta (ensemble & bls)

#7868 opened Dec 11, 2024 by njaramish

[Feature]: ORCA format reporting for KV-Cache metrics in Inference Response Header

#7865 opened Dec 10, 2024 by BenjaminBraunDev

Backend python save img and inference

#7863 opened Dec 10, 2024 by davy-blavette

Python Backend's fail-over feature was not implemented

#7862 opened Dec 9, 2024 by zhuichao001

Yolov11 Object Detection Deploy Problem

#7860 opened Dec 7, 2024 by asdemirel

There is not a good way to call trtllm backend to initialize lora weights from Python BLS

#7856 opened Dec 6, 2024 by ShuaiShao93

Torchscript Model can't have bfloat16 inputs / outputs in 24.09

#7853 opened Dec 5, 2024 by MatthieuToulemont

Question: Possible to use standalone DCGM in Triton?

#7851 opened Dec 4, 2024 by ysk24ok

Support for Assume Role to load models from S3

#7850 opened Dec 3, 2024 by siddharthdeshmukh

Compare model throughput using trtexec and perf_analyzer

#7848 opened Dec 3, 2024 by Will-Chou-5722

Mlflow Backend Storage with Triton (without disc space dublication)

#7846 opened Nov 30, 2024 by frosk1

TIS OpenAI frontend, make trust_remote_code configurable

#7845 opened Nov 30, 2024 by chorus-over-flanger

Empty response from python backend

#7844 opened Nov 28, 2024 by eklann

Previous 1 2 3 4 5 … 24 25 Next

Previous Next

ProTip! Adding no:label will show everything without a label.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly