diff --git a/demos/image_generation/README.md b/demos/image_generation/README.md index a8bb589331..e271f47374 100644 --- a/demos/image_generation/README.md +++ b/demos/image_generation/README.md @@ -474,6 +474,85 @@ Output file (`output2.png`): ![output2](./output2.png) +## Measuring throughput +To increase throughput in image generation scenarios, it is worth changing plugin config and increase NUM_STREAMS. Additionally, set up static shape for the model to avoid dynamic shape overhead. This can be done by setting `resolution` parameter in the request. + +Edit graph.pbtxt and restart the server: +``` +input_stream: "HTTP_REQUEST_PAYLOAD:input" +output_stream: "HTTP_RESPONSE_PAYLOAD:output" + +node: { + name: "ImageGenExecutor" + calculator: "ImageGenCalculator" + input_stream: "HTTP_REQUEST_PAYLOAD:input" + input_side_packet: "IMAGE_GEN_NODE_RESOURCES:pipes" + output_stream: "HTTP_RESPONSE_PAYLOAD:output" + node_options: { + [type.googleapis.com / mediapipe.ImageGenCalculatorOptions]: { + models_path: "./" + device: "CPU" + num_images_per_prompt: 4 # 4 images per inference request + resolution: "512x512" # reshape to static value + plugin_config: '{"PERFORMANCE_HINT":"THROUGHPUT","NUM_STREAMS":8}' + } + } +} +``` + +Prepare example request `input_data.json`: +``` +{ + "data": [ + { + "payload": [ + { + "model": "OpenVINO/stable-diffusion-v1-5-int8-ov", + "prompt": "dog", + "num_inference_steps": 50 + } + ] + } + ] +} + +``` + +Run benchmark: +```bash +docker run --rm -it --net=host -v $(pwd):/work:rw nvcr.io/nvidia/tritonserver:24.12-py3-sdk \ + perf_analyzer \ + -m OpenVINO/stable-diffusion-v1-5-int8-ov \ + --input-data=/work/input_data.json \ + --service-kind=openai \ + --endpoint=v3/images/generations \ + --async \ + -u localhost:8000 \ + --request-count 16 \ + --concurrency-range 16 +``` + +``` +*** Measurement Settings *** + Service Kind: OPENAI + Sending 16 benchmark requests + Using asynchronous calls for inference + +Request concurrency: 16 + Client: + Request count: 16 + Throughput: 0.0999919 infer/sec + Avg latency: 156783666 usec (standard deviation 1087845 usec) + p50 latency: 157110315 usec + p90 latency: 158720060 usec + p95 latency: 158720060 usec + p99 latency: 159494095 usec + Avg HTTP time: 156783654 usec (send/recv 8717 usec + response wait 156774937 usec) +Inferences/Second vs. Client Average Batch Latency +Concurrency: 16, throughput: 0.0999919 infer/sec, latency 156783666 usec +``` + +0.0999919 infer/sec meaning 0.4 images per second considering 4 images per prompt. ## References diff --git a/docs/image_generation/reference.md b/docs/image_generation/reference.md index f8357384a9..c4c8e4a53d 100644 --- a/docs/image_generation/reference.md +++ b/docs/image_generation/reference.md @@ -60,6 +60,11 @@ The calculator supports the following `node_options` for tuning the pipeline con - `optional uint64 default_num_inference_steps` - default number of inference steps used for generation, if not specified by the request [default = 50]; - `optional uint64 max_num_inference_steps` - maximum number of inference steps allowed for generation. Requests exceeding this value will be rejected. [default = 100]; +Static model resolution settings: +- `optional string resolution` - enforces static resolution for all requests. When specified, underlying models are reshaped to this resolution. +- `optional uint64 num_images_per_prompt` - used together with max_resolution, to define batch size in static model shape. +- `optional float guidance_scale` - used together with max_resolution + ## Models Directory