Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Migrate to JSI for React Native #16031

Open
jhen0409 opened this issue May 21, 2023 · 8 comments
Open

[Feature Request] Migrate to JSI for React Native #16031

jhen0409 opened this issue May 21, 2023 · 8 comments
Labels
api:Java issues related to the Java API feature request request for unsupported feature or enhancement platform:mobile issues related to ONNX Runtime mobile; typically submitted using template

Comments

@jhen0409
Copy link
Contributor

jhen0409 commented May 21, 2023

Describe the feature request

react_native binding

Migrate to JSI can help to improve performance issues with the current implementation of React Native bridge and base64 encode/decode.

For return type of tensor input and output, we can construct ArrayBuffers from existing memory buffers, but it may be only supported by newer React Native version (>= 0.71?). Another way is implement type like blob-jsi-helper does with memcpy but it may causes some memory issues.

I might try to send a PR later for iOS first.

For Android, since it's using C++, it may require some additional work to access onnxruntime directly. (Calling java class through JNI may become complicated)

Describe scenario use case

Transformers.js is trying to support React Native (huggingface/transformers.js#118 by @hans00), I found that the performance in base64 encode/decode of buffer is not so good. See also this comment.

I also try to use react-native-blob-jsi-helper to replace base64 by a patch for test purpose, see this draft PR for more details.

@jhen0409 jhen0409 added the feature request request for unsupported feature or enhancement label May 21, 2023
@github-actions github-actions bot added api:Java issues related to the Java API platform:mobile issues related to ONNX Runtime mobile; typically submitted using template labels May 21, 2023
@skottmckay
Copy link
Contributor

We have looked at adding this support but haven't had the resources available to do so. Do you have a production scenario you require this for?

Whilst the base64 encode/decode overhead isn't great, typically model execution is the largest cost for a production model.

I'm struggling to understand how base64 encode/decode could add up to multiple seconds in the comment you linked though. The buffer size can be pre-calculated, and it's a single iteration of the data to encode/decode. Reasonably trivial stuff. We don't do much when decoding either - convert from base64 using buffer and put in a typed array.

const buffer: Buffer = Buffer.from(results[key].data as string, 'base64');
const typedArray = tensorTypeToTypedArray(results[key].type as Tensor.Type);
tensorData = new typedArray(buffer.buffer, buffer.byteOffset, buffer.length / typedArray.BYTES_PER_ELEMENT);

@jhen0409
Copy link
Contributor Author

In the case the model run just take ~500ms for each decode session run, but every decode run in JS (decodeReturnType) take ~3.7s with ~5250000 size buffer.

I've plan to use onnxruntime for production in the second half of the year, this may be the current challenge. Will be great if we have resources/maintenance here.

@skottmckay
Copy link
Contributor

We'll see if we can get it done in the next release.

@jhen0409
Copy link
Contributor Author

jhen0409 commented Jun 4, 2023

For #16094 I've been testing for a while, and there are a few things I see that aren't perfect:

  • Delay on session run with native bridge
    • Android: 20 ~ 30ms on Hermes / more on JSC (I reported a related issue here, although it only for V8 but JSC have the same issue)
    • iOS: 3 ~ 4ms
  • resolveArrayBuffer
    • Android: 1 ~ 5 ms
    • iOS: 0.x ~ 1.x ms

For a task that requires multiple session runs (like whisper), such a delay may very large, it may take a few seconds. The best way still a full JSI migration, but currently the performance of PR #16094 is acceptable for me.

@skottmckay
Copy link
Contributor

Can you expand on what 'full JSI migration' would equate to?

I have a very limited knowledge of JSI so I'm not sure what the cost/benefit of that would be. My understanding was the main reason for using JSI with ORT would be to avoid the base64 encode/decode when running the model (given that's typically the performance sensitive operation), and your much appreciated PR seems to cover that.

@jhen0409
Copy link
Contributor Author

jhen0409 commented Jun 8, 2023

The full JSI migration will need some refactor in my imagination, to rewrite run JSI function with C++ for load & run and consider threading (avoid blocking JS thread) & type convertation for tensor helper.

  • For iOS, it maybe easy because Objective C++
  • For Android, options:
    1. Access existing Java native module though JNI, but it’s not clean and may a little bit slow
    2. Recompile onnxruntime-c to use the API directly in C++, I like it but seems to be some other work to do

It may basically solve the delay of native bridge, especially the Android platform.

The base64 encode/decode is my main concern, the full implementation of JSI may not be urgent.

@skottmckay
Copy link
Contributor

Thanks for the info. Is the native bridge delay on every run call or just the first?

@jhen0409
Copy link
Contributor Author

jhen0409 commented Jun 9, 2023

Thanks for the info. Is the native bridge delay on every run call or just the first?

This is result (use ASR task of this example, and patched #16094) to running whisper-tiny.en for a demo audio file

  • Monitor inferenceSession.run by performance.now
  • Monitor run native method by System.currentTimeMillis
  • Get the diff time

Platform: Android
Device: Pixel 6
React Native version: 0.71
JS engine: Hermes

Full infer time: 378.95 ms, native infer: 378 ms, diff: 0.95 ms
Full infer time: 371.03 ms, native infer: 364 ms, diff: 7.03 ms
Full infer time: 626.60 ms, native infer: 620 ms, diff: 6.60 ms
Full infer time: 640.24 ms, native infer: 632 ms, diff: 8.24 ms
Full infer time: 628.04 ms, native infer: 622 ms, diff: 6.04 ms
Full infer time: 618.83 ms, native infer: 611 ms, diff: 7.83 ms
Full infer time: 626.14 ms, native infer: 620 ms, diff: 6.14 ms
Full infer time: 615.17 ms, native infer: 609 ms, diff: 6.17 ms
Full infer time: 633.09 ms, native infer: 627 ms, diff: 6.09 ms
Full infer time: 630.86 ms, native infer: 625 ms, diff: 5.86 ms
Full infer time: 616.15 ms, native infer: 610 ms, diff: 6.15 ms
Full infer time: 617.30 ms, native infer: 611 ms, diff: 6.30 ms
Full infer time: 636.89 ms, native infer: 625 ms, diff: 11.89 ms
Full infer time: 630.92 ms, native infer: 619 ms, diff: 11.92 ms
Full infer time: 676.55 ms, native infer: 634 ms, diff: 42.55 ms
Full infer time: 647.41 ms, native infer: 637 ms, diff: 10.41 ms
Full infer time: 628.70 ms, native infer: 618 ms, diff: 10.70 ms
Full infer time: 695.45 ms, native infer: 633 ms, diff: 62.45 ms
Full infer time: 644.94 ms, native infer: 633 ms, diff: 11.94 ms
Full infer time: 621.50 ms, native infer: 610 ms, diff: 11.50 ms
Full infer time: 654.28 ms, native infer: 643 ms, diff: 11.28 ms
Full infer time: 650.18 ms, native infer: 639 ms, diff: 11.18 ms
Full infer time: 630.11 ms, native infer: 624 ms, diff: 6.11 ms
Full infer time: 636.94 ms, native infer: 630 ms, diff: 6.94 ms
Full infer time: 648.47 ms, native infer: 641 ms, diff: 7.47 ms
Full infer time: 651.05 ms, native infer: 645 ms, diff: 6.05 ms
Full infer time: 648.11 ms, native infer: 641 ms, diff: 7.11 ms

Not sure if it might caused by some scheduling issue, so it could have a delay up to 60 ms.

The difference may be less noticeable by use MNIST in the e2e project.

skottmckay pushed a commit that referenced this issue Jun 16, 2023
### Description
<!-- Describe your changes. -->

- Create `OnnxruntimeJSIHelper` native module to provide two JSI
functions
- `jsiOnnxruntimeStoreArrayBuffer`: Store buffer in Blob Manager &
return blob object (iOS: RCTBlobManager, Android: BlobModule)
  - `jsiOnnxruntimeResolveArrayBuffer`: Use blob object to get buffer
- The part of implementation is reference to
[react-native-blob-jsi-helper](https://github.com/mrousavy/react-native-blob-jsi-helper)
- Replace base64 encode/decode
  - `loadModelFromBlob`: Rename from `loadModelFromBase64EncodedBuffer`
  - `run`: Use blob object to replace input.data & results[].data

For [this
context](#16031 (comment)),
it saved a lot of time and avoid JS thread blocking in decode return
type, it is 3700ms -> 5~20ms for the case. (resolve function only takes
0.x ms)

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

It’s related to #16031, but not a full implementation for migrate to
JSI.

It just uses JSI through BlobManager to replace the slow part (base64
encode / decode).

Rewriting it entirely in JSI could be complicated, like type convertion
and threading. This PR might be considered a minor change.

/cc @skottmckay
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api:Java issues related to the Java API feature request request for unsupported feature or enhancement platform:mobile issues related to ONNX Runtime mobile; typically submitted using template
Projects
None yet
Development

No branches or pull requests

2 participants