Skip to content

Conversation

@alonre24
Copy link
Contributor

This PR extends the existing onnxruntime C_API so it will allow using an external allocator.

Motivation
#6143
When using onnxruntime from an application that has a limited memory consumption, it is essential for the app to monitor and control onnx memory usage. However, onnx is currently using it's own allocator which is separated from the app that is using it. Hence, we add the option to pass an external allocator, so that every memory allocation that occurs while creating and running inference sessions, will use the external allocator.

Description
This PR adds 4 functions to onnxruntime_c_api.h:

  1. CreateCustomDeviceAllocator(uint32_t version, void* AllocFunc(OrtAllocator*, size_t), void FreeFunc(OrtAllocator*, void*), const OrtMemoryInfo* InfoFunc(const OrtAllocator*), OrtAllocator** out);
  2. RegisterCustomDeviceAllocator(OrtEnv* env, OrtAllocator *CustomAllocator);
  3. CreateCustomArenaAllocator(OrtAllocator* device_allocator, void* AllocFunc(size_t), void FreeFunc(void*), void* ReserveFunc(size_t), size_t UsedFunc(void), size_t MaxFunc(void), OrtAllocatorArena** out);
  4. RegisterCustomArenaAllocator(OrtEnv* env, OrtAllocatorArena *CustomArenaAllocator);

The first 2 functions concern a custom device allocator. Function (1) allocates a new OrtAllocator* whose internal fields are the ones that were sent as inputs. Note that InfoFunc should return an OrtMemoryInfo* with OrtAllocatorType field set to OrtDeviceAllocator. (2) Creates an instance of AllocatorWrapper that implements the IAllocator interface with the CustomAllocator functionality, and registers this allocator to the given env by using the existing RegisterAllocator mechanism. This will guarantee that for every session that is created and associated with env, if it's session_options is configured to use theenv allocator (and arena is disabled), then the custom allocator will be used instead of the default one.

As for (3) and (4), these enable to create and register a custom allocator that implements the arena functionality as defined in the IArenaAllocator interface. An allocator of type OrtArenaAllocator should have an underline device_allocator, along with implementation for the following callbacks: Alloc, Free, Reserve, Used, Max. Therefore, we add the following struct to onnxtunrime_C_API.h:

struct OrtAllocatorArena {
  OrtAllocator *device_allocator;
  void* (*Alloc)(size_t size);
  void (*Free)(void* p);
  void* (* Reserve)(size_t size);
  size_t (*Used)();
  size_t (*Max)();
};

Function (3) allocates and returns the user an OrtAllocatorArena* whose inner fields are the given inputs (Note that the underline device_allocator InfoFunc should return an OrtMemoryInfo* with OrtAllocatorType field set to OrtArenaAllocator). For (4) we use a new class called ArenaAllocatorWrapper that implements IArenaAllocator according to a given OrtAllocatorArena* that encapsulate the required implementations. Calling (4) will create and registered the arena allocator to the given env by using the existing RegisterAllocator mechanism, similar to (2), so that the custom arena allocator will be used for managing inference sessions associated with env and configured to use it's env allocator instead of the default one.

@alonre24 alonre24 requested a review from a team as a code owner February 15, 2021 11:08
@ghost
Copy link

ghost commented Feb 15, 2021

CLA assistant check
All CLA requirements met.

@gkorland
Copy link

@snnn @skottmckay can you please review this PR?

@pranavsharma
Copy link
Contributor

@snnn @skottmckay can you please review this PR?

#6143 (comment)

@snnn
Copy link
Contributor

snnn commented Mar 26, 2021

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux Nuphar CI Pipeline,Linux OpenVINO CI Pipeline,MacOS CI Pipeline,MacOS NoContribops CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-distributed,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule,orttraining-ortmodule-distributed

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@snnn
Copy link
Contributor

snnn commented Mar 26, 2021

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux Nuphar CI Pipeline,Linux OpenVINO CI Pipeline,MacOS CI Pipeline,MacOS NoContribops CI Pipeline,Windows CPU CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@snnn
Copy link
Contributor

snnn commented Mar 26, 2021

/azp run Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-distributed,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule,orttraining-ortmodule-distributed

@azure-pipelines
Copy link

Azure Pipelines successfully started running 8 pipeline(s).

@alonre24
Copy link
Contributor Author

@pranavsharma
You mentioned in our talk that this feature is part of your plan for 1.8... so in that case, is this PR still relevant?

@pranavsharma
Copy link
Contributor

@pranavsharma
You mentioned in our talk that this feature is part of your plan for 1.8... so in that case, is this PR still relevant?

One of my team members is working on this feature. After some internal discussions, I don't think we'll be using the PR you created.

@alonre24
Copy link
Contributor Author

@pranavsharma
OK, thanks for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants