Skip to content

Conversation

@hariharans29
Copy link
Member

@hariharans29 hariharans29 commented Jun 15, 2021

Description:

  1. This change tries to support the use-case where-in an external user wants to plug in their CPU device allocator implementation for sharing across all sessions to work with ORT-internal code.

  2. This change also cleans up some related allocator internals in ORT - the most important refactoring being removal of the confusing IArenaAllocator interface which in theory is not needed at all. All allocators can simply implement IAllocator whilst hosting arena logic in them in their allocation code path.

Known limitations:

  1. Can't plug in a different "custom" allocators for individual sessions yet (i.e.) user can't choose to plug-in Allocator_1 to be used only for Session_1 and Allocator_2 for Session_2 and so on. (Not sure if there is a use-case for this yet). The user can still use ORT's allocator and their custom allocator in tandem across sessions though because using the custom allocator for a session is controlled by using a session option.

  2. Doesn't support non-CPU device allocators to be plugged-in yet.

Motivation and Context
#6689
#6143

@hariharans29 hariharans29 requested a review from a team as a code owner June 15, 2021 20:43
@DvirDukhan
Copy link

@hariharans29
Thanks a lot for the PR
Can you please specify if there are any new changes in the flow regarding sharing a provided 3rd party allocator between sessions (how the new allocator can be used in shared arena scenario as described at https://www.onnxruntime.ai/docs/reference/api/c-api.html section: "share allocator(s) between sessions")
Also, how can we use the new API to specify the arena settings for a new registered allocator?

Thanks

CC @alonre24

@hariharans29
Copy link
Member Author

@hariharans29
Thanks a lot for the PR
Can you please specify if there are any new changes in the flow regarding sharing a provided 3rd party allocator between sessions (how the new allocator can be used in shared arena scenario as described at https://www.onnxruntime.ai/docs/reference/api/c-api.html section: "share allocator(s) between sessions")
Also, how can we use the new API to specify the arena settings for a new registered allocator?

Thanks

CC @alonre24

I will document it after this is checked-in, but right now I am thinking, we just need to turn this on: session.use_env_allocators = 1 in order for a session to use your custom allocator.

The other configs that you see is used to define the behavior of ORT's internal arena which doesn't apply in your case as you actually control the behavior of your custom allocator through its implementation. Does that make sense ?

@mrry
Copy link
Contributor

mrry commented Jun 16, 2021

Is there some consolidation we could/should do with the custom CUDA allocator support @codemzs added in #6745?

FWIW, that PR has a completely different set of limitations: it does allow different allocators on a session-by-session basis (since it's an EP-level property) but it only works for the CUDA EP, and not any other EPs. However, I don't think we're relying on being able to set different allocators for different sessions, if that's hard to support.

@alonre24
Copy link
Contributor

Hey @hariharans29,
Having the ability to control the behavior of the custom allocator does make sense, but we were wondering if there might be a way to "enjoy both worlds". We are interested in registering an external allocator (that is, supplying Alloc and Free callbacks) that will use ORT's internal arena implementation. This can be done perhaps by sending an OrtArenaCfg to the new RegisterAllocator API, and creating an instance of BFCArena from the given "resource allocator".
Do you think that it is possible to do something like that?
@DvirDukhan

@hariharans29
Copy link
Member Author

Is there some consolidation we could/should do with the custom CUDA allocator support @codemzs added in #6745?

FWIW, that PR has a completely different set of limitations: it does allow different allocators on a session-by-session basis (since it's an EP-level property) but it only works for the CUDA EP, and not any other EPs. However, I don't think we're relying on being able to set different allocators for different sessions, if that's hard to support.

We are discussing it. Even if that is done, it may come as a separate change.

@hariharans29
Copy link
Member Author

Hey @hariharans29,
Having the ability to control the behavior of the custom allocator does make sense, but we were wondering if there might be a way to "enjoy both worlds". We are interested in registering an external allocator (that is, supplying Alloc and Free callbacks) that will use ORT's internal arena implementation. This can be done perhaps by sending an OrtArenaCfg to the new RegisterAllocator API, and creating an instance of BFCArena from the given "resource allocator".
Do you think that it is possible to do something like that?
@DvirDukhan

We do not want to expose this "mix and match" capability for an external device allocator to use the BFCArena. BFCArena is an internal component and its implementation details are bound to change without warning and we do not want to keep maintaining compatibility layers.

That being said since the source code of BFCArena is available, it should be fairly easy for you to re-use that in your allocator implementation and even "adapt it" for your use case if necessary.

@hariharans29 hariharans29 changed the title WIP: Support plugging in custom user-defined allocators for sharing between sessions Support plugging in custom user-defined allocators for sharing between sessions Jun 24, 2021
* \param shape Shape of the tensor
* \param p_data A preallocated buffer. Can be NULL if the shape is empty.
* Tensor does not own the data and will not delete it
* Tensor will own the memory and will delete it when the tensor instance is destructed.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment was wrong - fixing it

p_data = static_cast<IArenaAllocator*>(alloc.get())->Reserve(mem_size);
else
p_data = alloc->Alloc(mem_size);
p_data = alloc->Reserve(mem_size);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just call Reserve() now as it is part of the IAllocator interface (and calls Alloc() by default). BFCArena provides a specialized override for Reserve()

@hariharans29 hariharans29 merged commit 3360024 into master Jul 22, 2021
@hariharans29 hariharans29 deleted the hari/custom_alloc branch July 22, 2021 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants