Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inter-op and intra-op threading parameters in PyTorch #473

Open
olcayc opened this issue Jan 19, 2021 · 3 comments
Open

Inter-op and intra-op threading parameters in PyTorch #473

olcayc opened this issue Jan 19, 2021 · 3 comments

Comments

@olcayc
Copy link

olcayc commented Jan 19, 2021

Feature

Add APIs to set inter-op and intra-op threading parameters in PyTorch

Is your feature request related to a problem? Please describe.

Latency and throughput for CPU inference are affected by the number of inter-op and intra-op threads. See CPU Threading and Torchscript Inference for reference.

To get the best inference time performance, these parameters will need to be tuned for each model and physical host configuration. IIUC, Neuropods does not offer an API to set these parameters.

Describe the solution you'd like

Python APIs equivalent to those in pytorch, e.g.:

set_num_threads, get_num_threads
set_num_interop_threads, get_num_interop_threads

Describe alternatives you've considered

Additional context

@VivekPanyam
Copy link
Collaborator

Thanks for the feature request @olcayc!

I put up a PR in August that exposes this configuration, but didn't land it (#410).

This is little tricky to do correctly because there are a few different goals and constraints one might have with a system like this and some of them are incompatible:

Goals:

  • Allow framework-specific threadpool configuration (e.g. run Torch models with one configuration and TF models with another configuration)
  • Allow model specific configuration
  • Work with OPE (out of process execution)
  • Work with in-process execution

Constraints

  • Torch and TF set thread pool configuration at a per-process level (even though TF exposes it as part of session configuration)

The PR above exposes threadpool configuration as a per-model runtime option with a comment explaining that it is effective per process (meaning you should use OPE for actual per-model control).

That solution works great in many cases, but isn't ideal in others (e.g. setting a generic threading config per framework).

Can you explain your problem space a bit more? Do you control both the model and the system running inference?

Thanks!

@olcayc
Copy link
Author

olcayc commented Jan 20, 2021

Adding @vkuzmin-uber @tgaddair for visibility

We do control the container running neuropods and the model. I think being able to set these parameters at the per-process level would be perfectly suitable. At this stage, we are not running multiple models in the same process. As we develop different versions of the model over time we want to be able to tune threading configuration for each one to get the best latency and throughput

@VivekPanyam
Copy link
Collaborator

VivekPanyam commented Jan 20, 2021

That makes sense. Sounds like #410 would work for you then. I'll rebase and get it ready to land

(Also just to note, we rebranded from "Neuropods" to "Neuropod" before the public release)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants