Skip to content

Conversation

@codemzs
Copy link
Member

@codemzs codemzs commented Feb 18, 2021

Enables external CUDA allocator (i.e PyTorch CUDA caching allocator) and also removes all references to torch.cuda.empty_cache() since we should not be clearing the cache as it can affect throughput and PyTorch allocator does that on need basis.


# CPP extension to get torch CUDA allocator's alloc and free function addresses
self._use_external_cuda_allocator = True
if self._use_external_cuda_allocator:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it allowed to do something like

model1 = ORTModule(model1)
model2 = ORTModule(model2)

in the same process (python interpreter)? Just checking load_inline or self._torch_cuda_allocator.cuda_caching_allocator_raw_delete_address()` can handle this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't personally tried but I don't see why it wouldn't work. It will just recompile and recreate the binary file.

providers = ["CUDAExecutionProvider", "CPUExecutionProvider"]
provider_options = [{"device_id": str(self._device.index)}, {}]
if self._use_external_cuda_allocator:
provider_options = [{"device_id": str(self._device.index), "cuda_external_alloc": str(self._torch_alloc), "cuda_external_free": str(self._torch_free)}, {}]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
provider_options = [{"device_id": str(self._device.index), "cuda_external_alloc": str(self._torch_alloc), "cuda_external_free": str(self._torch_free)}, {}]
provider_options = [{"device_id": str(_utils.get_device_index(self._device)), "cuda_external_alloc": str(self._torch_alloc), "cuda_external_free": str(self._torch_free)}, {}]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @thiagocrepaldi , seems like a good suggestion but the standard software engineering practice would be to make this change as a different PR since I'm not touching this particular piece of code (i.e "device_id": str(self._device.index)). Your suggestion is a clean up and I do not want to mix it with enabling an external allocator.

if self._use_external_cuda_allocator:
provider_options = [{"device_id": str(self._device.index), "cuda_external_alloc": str(self._torch_alloc), "cuda_external_free": str(self._torch_free)}, {}]
else:
provider_options = [{"device_id": str(self._device.index)}]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
provider_options = [{"device_id": str(self._device.index)}]
provider_options = [{"device_id": str(_utils.get_device_index(self._device))}]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above.

@codemzs
Copy link
Member Author

codemzs commented Feb 19, 2021

Thanks, @SherlockNoMad and @thiagocrepaldi for the review. Thanks @baijumeswani for answering questions regarding the torch no_grad memory test.

@codemzs codemzs merged commit 1a2f1bd into thiagofc/ortmodule-api Feb 19, 2021
@codemzs codemzs deleted the mzs/external-cuda-allocator-ortmodule branch February 19, 2021 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants