-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inter-op and intra-op threading parameters in PyTorch #473
Comments
Thanks for the feature request @olcayc! I put up a PR in August that exposes this configuration, but didn't land it (#410). This is little tricky to do correctly because there are a few different goals and constraints one might have with a system like this and some of them are incompatible: Goals:
Constraints
The PR above exposes threadpool configuration as a per-model runtime option with a comment explaining that it is effective per process (meaning you should use OPE for actual per-model control). That solution works great in many cases, but isn't ideal in others (e.g. setting a generic threading config per framework). Can you explain your problem space a bit more? Do you control both the model and the system running inference? Thanks! |
Adding @vkuzmin-uber @tgaddair for visibility We do control the container running neuropods and the model. I think being able to set these parameters at the per-process level would be perfectly suitable. At this stage, we are not running multiple models in the same process. As we develop different versions of the model over time we want to be able to tune threading configuration for each one to get the best latency and throughput |
That makes sense. Sounds like #410 would work for you then. I'll rebase and get it ready to land (Also just to note, we rebranded from "Neuropods" to "Neuropod" before the public release) |
Feature
Add APIs to set inter-op and intra-op threading parameters in PyTorch
Is your feature request related to a problem? Please describe.
Latency and throughput for CPU inference are affected by the number of inter-op and intra-op threads. See CPU Threading and Torchscript Inference for reference.
To get the best inference time performance, these parameters will need to be tuned for each model and physical host configuration. IIUC, Neuropods does not offer an API to set these parameters.
Describe the solution you'd like
Python APIs equivalent to those in pytorch, e.g.:
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: