Improvements on Exposed ORT support #976

kallebysantos · 2024-10-16T10:44:50Z

Hey there 🙏,
Coming from #947, I did test the new Exposed ORT feature in alpha-v20 and alpha-21 releases. I'd notice that to make it work we must specify 'auto' as inference device:

Current behaviour:

Without specify a device:

import { pipeline } from '@huggingface/transformers';

let pipe = await pipeline('sentiment-analysis');
// The exposed runtime is loaded 
// but when trying to inference:
let out = await pipe('I love transformers!');
//Error: Unsupported device: "wasm". Should be one of:

Explicit using auto as device:

import { pipeline } from '@huggingface/transformers';

let pipe = await pipeline('sentiment-analysis', null, { device: 'auto' });

let out = await pipe('I love transformers!');
// [{'label': 'POSITIVE', 'score': 0.999817686}]

To solve this annoying config, the PR improves the environment support for Exposed ORT by implicitly selecting 'auto' as the default fallback instead of 'wasm'.

import { pipeline } from '@huggingface/transformers';

let pipe = await pipeline('sentiment-analysis');

let out = await pipe('I love transformers!');
// [{'label': 'POSITIVE', 'score': 0.999817686}]

xenova · 2024-11-26T16:16:57Z

Hi again 👋 Apologies for the late reply.

For this PR, could you explain the meaning/origin of the "auto" device? Is this an execution provider defined by the custom ORT runner?

kallebysantos · 2024-11-26T18:14:53Z

Hi again 👋 Apologies for the late reply.

That's ok, saw that you had a loot of work to do!
Btw congratulations v3 is finally launched 🎉

For this PR, could you explain the meaning/origin of the "auto" device? Is this an execution provider defined by the custom ORT runner?

Currently the runner will not look on it to execute, it does based on the environment. So the device property should do nothing in this case.

But the problem is that transformers.js tries to explicit use some device, like "wasm", and that is not possible to execute it without specify "auto".

Example, if we don't specify a device, or anything instead of "auto" it will throw:

const pipe = await pipeline('feature-extraction', 'supabase/gte-small')
// Error: Unsupported device "wasm", Should be one of:    .

const pipe = await pipeline('feature-extraction', 'supabase/gte-small', { device: 'cpu' })
// Error: Unsupported device "cpu", Should be one of:    .

const pipe = await pipeline('feature-extraction', 'supabase/gte-small', { device: 'webgpu' })
// Error: Unsupported device "webgpu", Should be one of:    .

But if I do "auto" it works:

const pipe = await pipeline('feature-extraction', 'supabase/gte-small',  { device: 'auto' })

// [.... embeddings ]

Custom ORT exposed:

console.log(globalThis[Symbol.for("onnxruntime"])
/*
{
  Tensor: [class Tensor],
  env: {},
  InferenceSession: { create: [AsyncFunction: fromBuffer] }
}
*/

So the purpose of this PR is to automatically set "auto" when its running from some Custom ORT. To avoid this annoying config.
Also I did add some global flags to detect if is or not running from Custom ORT.

I invite you to try run transformers.js from Supa stack, Custom ORT rust backend its available from [email protected]^.
You can use it from supabase cli:

npx supabase functions new "ort-test"

npx supabase functions serve

xenova · 2024-11-26T20:36:08Z

Thanks for the additional context! 👍 Just on that note, what if an exposed ORT library does allow for different devices to be specified, and the user could choose the version they would like?

For Supabase's runner, specifically, I imagine it is running on CPU, right? So "cpu" could be mapped to the default device on their side perhaps?

kallebysantos · 2024-11-26T20:54:29Z

Thanks for the additional context! 👍 Just on that note, what if an exposed ORT library does allow for different devices to be specified, and the user could choose the version they would like?

For Supabase's runner, specifically, I imagine it is running on CPU, right? So "cpu" could be mapped to the default device on their side perhaps?

Sure, it makes sense. But this way they still need to manually specify a device?

-I need to study more about how transformers.js handle the available devices.

At this moment, could you give me any suggestion? about how can we achieve: the following, without it throws an error?

const pipe = await pipeline('feature-extraction', 'supabase/gte-small')

Do I need to export some property that says "Hey trasnformers.js I'm using "cpu" as default so pls map to it" ?

EDIT: I'd look in both transfomers.js and onnxruntime-common and I think that I got your point, but I still thinking that custom providers should fallback to "auto" as default.

xenova · 2024-12-02T01:26:19Z

Note that there is no "auto" execution provider in onnxruntime-web/onnxruntime-node (and this is a layer added on top by Transformers.js).

That said, I think I see what the problem is: for the custom runtime, we don't specify supportedDevices or defaultDevices, which will be sent to the executionProviders option in createInferenceSession

transformers.js/src/backends/onnx.js

Lines 91 to 92 in 2c92943

    
           supportedDevices.push('cpu'); 
        
           defaultDevices = ['cpu'];

device -> execution provider mapping:

transformers.js/src/backends/onnx.js

Lines 117 to 136 in 2c92943

    
           export function deviceToExecutionProviders(device = null) { 
        
               // Use the default execution providers if the user hasn't specified anything 
        
               if (!device) return defaultDevices; 
        
               // Handle overloaded cases 
        
               switch (device) { 
        
                   case "auto": 
        
                       return supportedDevices; 
        
                   case "gpu": 
        
                       return supportedDevices.filter(x => 
        
                           ["webgpu", "cuda", "dml", "webnn-gpu"].includes(x), 
        
                       ); 
        
               } 
        
               if (supportedDevices.includes(device)) { 
        
                   return [DEVICE_TO_EXECUTION_PROVIDER_MAPPING[device] ?? device]; 
        
               } 
        
               throw new Error(`Unsupported device: "${device}". Should be one of: ${supportedDevices.join(', ')}.`) 
        
           }

So, I think the solution here would be to set them both to undefined by default (instead of [] as it is currently), since then the custom ORT environment will use its defaults: https://onnxruntime.ai/docs/api/js/interfaces/InferenceSession.SessionOptions.html#executionProviders

kallebysantos · 2024-12-03T12:37:38Z

Hey Joshua, thanks for your suggestion

So, I think the solution here would be to set them both to undefined by default (instead of [] as it is currently), since then the custom ORT environment will use its defaults..

Yes it worked, 42ae1fe

xenova

Thanks for iterating! ✅ Looks good!

src/env.js

src/backends/onnx.js

- Add global variable to check if `IS_EXPOSED_RUNTIME_ENV` -> true if Js exposes their own custom runtime. - Applying 'auto' device as default for exposed runtime environment.

- Adding checkings for 'Tensor' and 'InferenceSession' members of the exposed custom ort.

xenova

Thanks again for the modifications! I was talking to a colleague about this PR and one thing which was brought up was the possibility of accidental namespace clashes (if onnxruntime is detected globally).

For that reason, I think this should be "opt-in" behaviour based on the presence or absence of an environment variable, and should not be the default for all users.

In the supabase runtime, are you able to set default environment variables. e.g., HF_TRANSFORMERS_USE_EXPOSED_RUNTIME (name subject to change, ofc).

We can then use this to detect when to use the exposed runtime or not.

src/backends/onnx.js

kallebysantos · 2025-01-02T19:43:32Z

Hi Joshua, I hope that you had a happy new year!

... based on the presence or absence of an environment variable ... In the supabase runtime, are you able to set default environment variables. e.g., HF_TRANSFORMERS_USE_EXPOSED_RUNTIME (name subject to change, ofc).

I agree with this suggestion, the only thing to consider is that process.env is only available on NodeJs runtime, and since Supabase uses Deno I'll need to add an IS_DENO_ENV to get it working.

Btw I think that IS_DENO_ENV should have already been added cause since v3 Deno it's not working anymore.
I hope to be wrong, but from my tests I can't get transformers.js v3 working on Deno without a custom execution provider like we did for supabase.
I get errors like:

Error: Unsupported device: "wasm". Should be one of: .
---
Cannot read properties of undefined (reading 'wasm')

- Introduce `HF_TRANSFORMERS_USE_EXPOSED_RUNTIME` env variable to opt-in the exposed runtime behaviour

- Using array loop to avoid code duplication

kallebysantos mentioned this pull request Nov 2, 2024

feat: exposing onnx backend to JS land supabase/edge-runtime#436

Merged

kallebysantos force-pushed the v3 branch from 30b1ccf to 9994c75 Compare December 3, 2024 00:14

kallebysantos changed the base branch from v3 to main December 3, 2024 00:17

kallebysantos closed this Dec 3, 2024

kallebysantos deleted the v3 branch December 3, 2024 00:18

kallebysantos restored the v3 branch December 3, 2024 00:21

kallebysantos reopened this Dec 3, 2024

xenova approved these changes Dec 7, 2024

View reviewed changes

xenova reviewed Dec 7, 2024

View reviewed changes

src/env.js Show resolved Hide resolved

xenova reviewed Dec 7, 2024

View reviewed changes

src/backends/onnx.js Outdated Show resolved Hide resolved

kallebysantos added 3 commits December 8, 2024 23:13

Add 'Exposed Runtime' to api mappings

d50953f

- Add global variable to check if `IS_EXPOSED_RUNTIME_ENV` -> true if Js exposes their own custom runtime. - Applying 'auto' device as default for exposed runtime environment.

Revert 'auto' as default device for custom ort

7b7175e

Ensuring that 'Exposed ORT' implements necessary functions

32dc19d

- Adding checkings for 'Tensor' and 'InferenceSession' members of the exposed custom ort.

kallebysantos force-pushed the v3 branch from f8da67a to 32dc19d Compare December 8, 2024 23:14

xenova requested changes Dec 28, 2024

View reviewed changes

src/backends/onnx.js Outdated Show resolved Hide resolved

kallebysantos added 2 commits January 2, 2025 20:32

Add "opt-in" to use exposed runtime

621ba63

- Introduce `HF_TRANSFORMERS_USE_EXPOSED_RUNTIME` env variable to opt-in the exposed runtime behaviour

Refactor 'Exposed ORT' when checking necessary functions

db909e6

- Using array loop to avoid code duplication

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements on Exposed ORT support #976

Improvements on Exposed ORT support #976

kallebysantos commented Oct 16, 2024

xenova commented Nov 26, 2024

kallebysantos commented Nov 26, 2024 •

edited

Loading

xenova commented Nov 26, 2024

kallebysantos commented Nov 26, 2024 •

edited

Loading

xenova commented Dec 2, 2024

kallebysantos commented Dec 3, 2024

xenova left a comment

xenova left a comment

kallebysantos commented Jan 2, 2025

Improvements on Exposed ORT support #976

Are you sure you want to change the base?

Improvements on Exposed ORT support #976

Conversation

kallebysantos commented Oct 16, 2024

xenova commented Nov 26, 2024

kallebysantos commented Nov 26, 2024 • edited Loading

xenova commented Nov 26, 2024

kallebysantos commented Nov 26, 2024 • edited Loading

xenova commented Dec 2, 2024

kallebysantos commented Dec 3, 2024

xenova left a comment

Choose a reason for hiding this comment

xenova left a comment

Choose a reason for hiding this comment

kallebysantos commented Jan 2, 2025

kallebysantos commented Nov 26, 2024 •

edited

Loading

kallebysantos commented Nov 26, 2024 •

edited

Loading