Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Experimental] Add a RuntimeOption to set inter and intra op threadpool sizes #410

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion build/install_python_deps.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ set -e
NEUROPOD_PYTHON_BINARY="python${NEUROPOD_PYTHON_VERSION}"

# Install pip
wget https://bootstrap.pypa.io/2.7/get-pip.py -O /tmp/get-pip.py
wget https://bootstrap.pypa.io/pip/2.7/get-pip.py -O /tmp/get-pip.py
${NEUROPOD_PYTHON_BINARY} /tmp/get-pip.py

# Setup a virtualenv
Expand Down
22 changes: 21 additions & 1 deletion source/neuropod/backends/tensorflow/tf_backend.cc
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@
}

// Get TF session options given Neuropod RuntimeOptions
tensorflow::SessionOptions get_tf_opts(const RuntimeOptions & /*unused*/)
tensorflow::SessionOptions get_tf_opts(const RuntimeOptions &runtime_opts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related. Some thoughts. RuntimeOption is used now in C++, C and Java API (and could be Python too). We need to keep it in sync. I have seen that Tensorflow using Proto declaration in such cases and then generates struct for languages.

We may use similar approach.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense - definitely something to look into.

I don't like TF's approach with options for their C API though. They basically require a buffer with a serialized proto as input. This makes it fairly complicated to set options directly from C.

{
tensorflow::SessionOptions opts;

Expand All @@ -103,6 +103,26 @@
opts.config.set_allow_soft_placement(true);
opts.config.set_log_device_placement(false);

// Set intra and inter op parallelism
// See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/protobuf/config.proto
if (runtime_opts.experimental_intra_op_parallelism_threads != 0)
{
opts.config.set_intra_op_parallelism_threads(
static_cast<int32_t>(runtime_opts.experimental_intra_op_parallelism_threads));
}

Check warning on line 112 in source/neuropod/backends/tensorflow/tf_backend.cc

View check run for this annotation

Codecov / codecov/patch

source/neuropod/backends/tensorflow/tf_backend.cc#L109-L112

Added lines #L109 - L112 were not covered by tests

if (runtime_opts.experimental_inter_op_parallelism_threads == 1)
{
// Only use the caller thread
opts.config.set_inter_op_parallelism_threads(-1);
}

Check warning on line 118 in source/neuropod/backends/tensorflow/tf_backend.cc

View check run for this annotation

Codecov / codecov/patch

source/neuropod/backends/tensorflow/tf_backend.cc#L115-L118

Added lines #L115 - L118 were not covered by tests
else if (runtime_opts.experimental_inter_op_parallelism_threads > 1)
{
// The number in runtime_opts includes the caller thread
opts.config.set_inter_op_parallelism_threads(
static_cast<int32_t>(runtime_opts.experimental_inter_op_parallelism_threads) - 1);
}

Check warning on line 124 in source/neuropod/backends/tensorflow/tf_backend.cc

View check run for this annotation

Codecov / codecov/patch

source/neuropod/backends/tensorflow/tf_backend.cc#L120-L124

Added lines #L120 - L124 were not covered by tests

// Note: we can't use GPUOptions::visible_device_list as it is a per process setting
//
// From: https://github.com/tensorflow/tensorflow/issues/18861#issuecomment-385610497
Expand Down
15 changes: 15 additions & 0 deletions source/neuropod/backends/torchscript/torch_backend.cc
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,21 @@
TorchNeuropodBackend::TorchNeuropodBackend(const std::string &neuropod_path, const RuntimeOptions &options)
: NeuropodBackendWithDefaultAllocator<TorchNeuropodTensor>(neuropod_path, options)
{
// inter and intra op parallelism settings only supported in Torch >= 1.2.0
#if CAFFE2_NIGHTLY_VERSION >= 20190808
// Set intra and inter op parallelism
// See https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html#runtime-api
if (options.experimental_inter_op_parallelism_threads != 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Interop case, it allows to set several times only for TBB case. For our case, if there are 2nd IPE model with non-zero value, it will fail. As I see in code, for non-TBB cases, it sets atomic var and I think it can be done here as well. I'd save still possibility to change it for TBB case. I am thinking about building libtorch for TBB, we have a Torchscript model/use case where TBB's "better" concurrency can be critical.

{
at::set_num_interop_threads(static_cast<int32_t>(options.experimental_inter_op_parallelism_threads));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I see, there is a torch::set_num_interop_threads that is supposed to be "public". Minor, but I think it is issue still.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same about other.

}

Check warning on line 235 in source/neuropod/backends/torchscript/torch_backend.cc

View check run for this annotation

Codecov / codecov/patch

source/neuropod/backends/torchscript/torch_backend.cc#L233-L235

Added lines #L233 - L235 were not covered by tests

if (options.experimental_intra_op_parallelism_threads != 0)
{
at::set_num_threads(static_cast<int32_t>(options.experimental_intra_op_parallelism_threads));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to give access to get_num_* somehow. User sets runtime (or not), then starts neuropod execution and should be able to check what are the settings are used, it is important for "default" case, when system sets value and also for IPE case where models share it.

}

Check warning on line 240 in source/neuropod/backends/torchscript/torch_backend.cc

View check run for this annotation

Codecov / codecov/patch

source/neuropod/backends/torchscript/torch_backend.cc#L238-L240

Added lines #L238 - L240 were not covered by tests
#endif

if (options.load_model_at_construction)
{
load_model();
Expand Down
17 changes: 17 additions & 0 deletions source/neuropod/options.hh
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,23 @@ struct RuntimeOptions

// Whether or not to disable shape and type checking when running inference
bool disable_shape_and_type_checking = false;

// EXPERIMENTAL
// Set the intra and inter op parallelism for the underlying framework
// Within a given process, only the first usage of the below configuration is used
// See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/protobuf/config.proto
// and https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html#runtime-api
// for more details
// For true per-model control of these values, use out-of-process execution (see above)
// A value of 0 means system defined
// Note: for TorchScript, requires at least Torch 1.2.0
uint32_t experimental_intra_op_parallelism_threads = 0;

// EXPERIMENTAL
// A value of 0 means system defined
// Note: this count includes the caller thread
// Note: for TorchScript, requires at least Torch 1.2.0
uint32_t experimental_inter_op_parallelism_threads = 0;
};

} // namespace neuropod