Inter-op and intra-op threading parameters in PyTorch #473

olcayc · 2021-01-19T21:46:40Z

Feature

Add APIs to set inter-op and intra-op threading parameters in PyTorch

Is your feature request related to a problem? Please describe.

Latency and throughput for CPU inference are affected by the number of inter-op and intra-op threads. See CPU Threading and Torchscript Inference for reference.

To get the best inference time performance, these parameters will need to be tuned for each model and physical host configuration. IIUC, Neuropods does not offer an API to set these parameters.

Describe the solution you'd like

Python APIs equivalent to those in pytorch, e.g.:

set_num_threads, get_num_threads
set_num_interop_threads, get_num_interop_threads

Describe alternatives you've considered

Additional context

The text was updated successfully, but these errors were encountered:

VivekPanyam · 2021-01-20T00:18:28Z

Thanks for the feature request @olcayc!

I put up a PR in August that exposes this configuration, but didn't land it (#410).

This is little tricky to do correctly because there are a few different goals and constraints one might have with a system like this and some of them are incompatible:

Goals:

Allow framework-specific threadpool configuration (e.g. run Torch models with one configuration and TF models with another configuration)
Allow model specific configuration
Work with OPE (out of process execution)
Work with in-process execution

Constraints

Torch and TF set thread pool configuration at a per-process level (even though TF exposes it as part of session configuration)

The PR above exposes threadpool configuration as a per-model runtime option with a comment explaining that it is effective per process (meaning you should use OPE for actual per-model control).

That solution works great in many cases, but isn't ideal in others (e.g. setting a generic threading config per framework).

Can you explain your problem space a bit more? Do you control both the model and the system running inference?

Thanks!

olcayc · 2021-01-20T00:29:51Z

Adding @vkuzmin-uber @tgaddair for visibility

We do control the container running neuropods and the model. I think being able to set these parameters at the per-process level would be perfectly suitable. At this stage, we are not running multiple models in the same process. As we develop different versions of the model over time we want to be able to tune threading configuration for each one to get the best latency and throughput

VivekPanyam · 2021-01-20T00:57:05Z

That makes sense. Sounds like #410 would work for you then. I'll rebase and get it ready to land

(Also just to note, we rebranded from "Neuropods" to "Neuropod" before the public release)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inter-op and intra-op threading parameters in PyTorch #473

Inter-op and intra-op threading parameters in PyTorch #473

olcayc commented Jan 19, 2021

VivekPanyam commented Jan 20, 2021

olcayc commented Jan 20, 2021

VivekPanyam commented Jan 20, 2021 •

edited

Loading

Inter-op and intra-op threading parameters in PyTorch #473

Inter-op and intra-op threading parameters in PyTorch #473

Comments

olcayc commented Jan 19, 2021

Feature

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

VivekPanyam commented Jan 20, 2021

olcayc commented Jan 20, 2021

VivekPanyam commented Jan 20, 2021 • edited Loading

VivekPanyam commented Jan 20, 2021 •

edited

Loading