@@ -474,7 +474,32 @@ Output file (`output2.png`):
474474![ output2] ( ./output2.png )
475475
476476
477- ## Measuring performance
477+ ## Measuring throughput
478+ To increase throughput in image generation scenarios, it is worth changing plugin config and increase NUM_STREAMS. Additionally, set up static shape for the model to avoid dynamic shape overhead. This can be done by setting ` resolution ` parameter in the request.
479+
480+ Edit graph.pbtxt and restart the server:
481+ ```
482+ input_stream: "HTTP_REQUEST_PAYLOAD:input"
483+ output_stream: "HTTP_RESPONSE_PAYLOAD:output"
484+
485+ node: {
486+ name: "ImageGenExecutor"
487+ calculator: "ImageGenCalculator"
488+ input_stream: "HTTP_REQUEST_PAYLOAD:input"
489+ input_side_packet: "IMAGE_GEN_NODE_RESOURCES:pipes"
490+ output_stream: "HTTP_RESPONSE_PAYLOAD:output"
491+ node_options: {
492+ [type.googleapis.com / mediapipe.ImageGenCalculatorOptions]: {
493+ models_path: "./"
494+ device: "CPU"
495+ num_images_per_prompt: 4 # 4 images per inference request
496+ resolution: "512x512" # reshape to static value
497+ plugin_config: '{"PERFORMANCE_HINT":"THROUGHPUT","NUM_STREAMS":8}'
498+ }
499+ }
500+ }
501+ ```
502+
478503Prepare example request ` input_data.json ` :
479504```
480505{
@@ -484,7 +509,7 @@ Prepare example request `input_data.json`:
484509 {
485510 "model": "OpenVINO/stable-diffusion-v1-5-int8-ov",
486511 "prompt": "dog",
487- "num_inference_steps": 2
512+ "num_inference_steps": 50
488513 }
489514 ]
490515 }
@@ -503,71 +528,31 @@ docker run --rm -it --net=host -v $(pwd):/work:rw nvcr.io/nvidia/tritonserver:24
503528 --endpoint=v3/images/generations \
504529 --async \
505530 -u localhost:8000 \
506- --request-count 8 \
507- --concurrency-range 8
508- ```
509-
510- MCLX23
511- ```
512- *** Measurement Settings ***
513- Service Kind: OPENAI
514- Sending 8 benchmark requests
515- Using asynchronous calls for inference
516-
517- Request concurrency: 8
518- Client:
519- Request count: 8
520- Throughput: 0.210501 infer/sec
521- Avg latency: 29514881 usec (standard deviation 1509943 usec)
522- p50 latency: 31140977 usec
523- p90 latency: 36002018 usec
524- p95 latency: 37274567 usec
525- p99 latency: 37274567 usec
526- Avg HTTP time: 29514870 usec (send/recv 3558 usec + response wait 29511312 usec)
527- Inferences/Second vs. Client Average Batch Latency
528- Concurrency: 8, throughput: 0.210501 infer/sec, latency 29514881 usec
529- ```
530-
531- SPR36
532- ```
533- *** Measurement Settings ***
534- Service Kind: OPENAI
535- Sending 8 benchmark requests
536- Using asynchronous calls for inference
537-
538- Request concurrency: 8
539- Client:
540- Request count: 8
541- Throughput: 1.14268 infer/sec
542- Avg latency: 5124694 usec (standard deviation 695195 usec)
543- p50 latency: 5252478 usec
544- p90 latency: 5922719 usec
545- p95 latency: 6080321 usec
546- p99 latency: 6080321 usec
547- Avg HTTP time: 5124684 usec (send/recv 15272 usec + response wait 5109412 usec)
548- Inferences/Second vs. Client Average Batch Latency
549- Concurrency: 8, throughput: 1.14268 infer/sec, latency 5124694 usec
531+ --request-count 16 \
532+ --concurrency-range 16
550533```
551534
552- ```
553535** * Measurement Settings ** *
554536 Service Kind: OPENAI
555537 Sending 16 benchmark requests
556538 Using asynchronous calls for inference
557539
558540Request concurrency: 16
559- Client:
541+ Client:
560542 Request count: 16
561- Throughput: 1.33317 infer/sec
562- Avg latency: 8945421 usec (standard deviation 929729 usec)
563- p50 latency: 9395319 usec
564- p90 latency: 11657659 usec
565- p95 latency: 11657659 usec
566- p99 latency: 11659369 usec
567- Avg HTTP time: 8945411 usec (send/recv 491743 usec + response wait 8453668 usec)
543+ Throughput: 0.0999919 infer/sec
544+ Avg latency: 156783666 usec (standard deviation 1087845 usec)
545+ p50 latency: 157110315 usec
546+ p90 latency: 158720060 usec
547+ p95 latency: 158720060 usec
548+ p99 latency: 159494095 usec
549+ Avg HTTP time: 156783654 usec (send/recv 8717 usec + response wait 156774937 usec)
568550Inferences/Second vs. Client Average Batch Latency
569- Concurrency: 16, throughput: 1.33317 infer/sec, latency 8945421 usec
570- ```
551+ Concurrency: 16, throughput: 0.0999919 infer/sec, latency 156783666 usec
552+
553+
554+ 0.0999919 infer/sec meaning 0.4 images per second considering 4 images per prompt.
555+
571556
572557## References
573558- [ Image Generation API] ( ../../docs/model_server_rest_api_image_generation.md )
0 commit comments