random_seed
seems to be ignored (or at least inconsistent) for inflight_batcher_llm
#468
Open
2 of 4 tasks
System Info
I've converted Llama 3 using TensorRT-LLM's convert_checkpoint script, and am serving it with the inflight_batcher_llm template. I'm trying to get diverse samples for a fixed input, but I've found that if I make several requests concurrently, several will have identical outputs.
I'm setting
top_p=1, top_k=1024, temperature=1.0, beam_width=1
, and generating a unique random seed for each request. The requests are being made over the gRPC API, and I'm using v0.9.0 of TensorRT-LLM and tensorrtllm_backend.Who can help?
@byshiue
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
I expect each request with a different seed to yield a different response
actual behavior
Several of the 5 responses are consistently identical
additional notes
I changed the script I'm using for testing to wait for a response before sending another request, and this results in all 5 outputs being distinct, so it seems like the concurrency/inflight batching really is the problem.
The text was updated successfully, but these errors were encountered: