Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imageUpload and webcam sending streams of TensorLike for onnxModel or tfjsModel #18

Open
FrankwaP opened this issue Mar 23, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@FrankwaP
Copy link

FrankwaP commented Mar 23, 2023

Hello,

First: thank you for this promising library!

Is your feature request related to a problem? Please describe.

I would like to test MarcelleJS to create a webpage for object detection or semantic segmentation, using a trained model stored in ONNX format. So far I'm just trying to adapt this example: https://github.com/marcellejs/marcelle/tree/main/apps/demos/object-detection/

Since our models do not take input in ImageData format, it seems I would need to use:

const detector = onnxModel({
  inputType: 'generic',
  taskType: 'generic',
});

However both imageUpload and webcam only send streams of images in the ImageData format, so setting up a pipeline between these and detector does not seem possible.

Describe the solution you'd like
Both imageUpload and webcam should be able to send streams of images in the ImageData AND TensorLike formats.

Describe alternatives you've considered
It might be possible to make a custom prediction model using image in ImageData format as input and use it for the ONNX export. I still haven't found any relevant source for that however.

@FrankwaP
Copy link
Author

The ONNX runtime website actually gives a function ton convert from ImageData to a tensor in the ORT format https://onnxruntime.ai/docs/tutorials/web/classify-images-nextjs-github-template.html

function imageDataToTensor(image: Jimp, dims: number[]): Tensor {
  // 1. Get buffer data from image and create R, G, and B arrays.
  var imageBufferData = image.bitmap.data;
  const [redArray, greenArray, blueArray] = new Array(new Array<number>(), new Array<number>(), new Array<number>());

  // 2. Loop through the image buffer and extract the R, G, and B channels
  for (let i = 0; i < imageBufferData.length; i += 4) {
    redArray.push(imageBufferData[i]);
    greenArray.push(imageBufferData[i + 1]);
    blueArray.push(imageBufferData[i + 2]);
    // skip data[i + 3] to filter out the alpha channel
  }

  // 3. Concatenate RGB to transpose [224, 224, 3] -> [3, 224, 224] to a number array
  const transposedData = redArray.concat(greenArray).concat(blueArray);

  // 4. convert to float32
  let i, l = transposedData.length; // length, we need this for the loop
  // create the Float32Array size 3 * 224 * 224 for these dimensions output
  const float32Data = new Float32Array(dims[1] * dims[2] * dims[3]);
  for (i = 0; i < l; i++) {
    float32Data[i] = transposedData[i] / 255.0; // convert to float
  }
  // 5. create the tensor object from onnxruntime-web.
  const inputTensor = new Tensor("float32", float32Data, dims);
  return inputTensor;
}

I guess I can create a converter component with that then use a webcam -> converter -> detector pipeline?

@JulesFrancoise JulesFrancoise added the enhancement New feature or request label Mar 31, 2023
@JulesFrancoise
Copy link
Collaborator

Thank you for your interest in Marcelle and for the opening the issue!

Indeed, we made the choice to use ImageData by default, because the format was relevant for streams, and supported by default by Tensorflow.js models. Unfortunately, it does not work with all libraries so we have to write adapters.

The solution you found seems promising, and I think that you could use it without necessarily the need to create a new component, but by applying the function to events on a stream, for instance:

const $imagesAsTensor = input.$images.map(img => imageDataToTensor(img, dims))

And you can actually get dims from imageData. Also, be careful about with the image format. In your code sample from ORT, it seems to be channels-first, but I remember that this is not always consistent in their model Zoo.

Note: There is a possible problem with garbage collection, and this depends on ORT, which I don't know well yet. The reason we don't use Tensorflow.js Tensors in streams is that it creates memory leaks, because memory management in Tensorflow.js is a bit particular, because of the support of various backends. TFJS has this tidy function to alleviate this issue, but we didn't find a way to keep track and properly manage memory in streams. Again, I don't know ORT enough to know if this can be an issue.

Let us know if this solution works for your problem.

On a more general note, we wanted to include, besides 'components', a set of stream operators in Marcelle, to do the kind of operations you are looking for (format conversion, image cropping, etc), it might be something to include in a future version.

@FrankwaP
Copy link
Author

FrankwaP commented Apr 3, 2023

Bonjour Jules et merci pour ta réponse :-)

I've actually implemented the imageDataToTensor function in the preprocessImage function of the onnx-model.component.ts component, and made it work!
I'm now dealing with Non Max Suppression (necessary for SSD/YOLO) but it's a ONNX thing…

I'm "just" a Python dev who understood how to make pre-made JS/TS code work for his use case, so there's not material for a pull request… but here's what I did so far.

In marcelle/packages/core/src/utils/image.ts I added the imageDataToTensor function:

// …
// line 25
// straight from: https://onnxruntime.ai/docs/tutorials/web/classify-images-nextjs-github-template.html
// with modification from original code noted as "MODIF"
// MODIF import * as Jimp from 'jimp';
import { Tensor } from 'onnxruntime-web';

// MODIF export function imageDataToTensor(image: Jimp, dims: number[] ): Tensor {
export function imageDataToTensor(image: ImageData, dims: number[] = [1, 3, 224, 224] ): Tensor {
  // 1. Get buffer data from image and create R, G, and B arrays.
// MODIF  var imageBufferData = image.bitmap.data;
  var imageBufferData = image.data;
  const [redArray, greenArray, blueArray] = new Array(new Array<number>(), new Array<number>(), new Array<number>());

  // 2. Loop through the image buffer and extract the R, G, and B channels
  for (let i = 0; i < imageBufferData.length; i += 4) {
    redArray.push(imageBufferData[i]);
    greenArray.push(imageBufferData[i + 1]);
    blueArray.push(imageBufferData[i + 2]);
    // skip data[i + 3] to filter out the alpha channel
  }

  // 3. Concatenate RGB to transpose [224, 224, 3] -> [3, 224, 224] to a number array
  const transposedData = redArray.concat(greenArray).concat(blueArray);

  // 4. convert to float32
  let i, l = transposedData.length; // length, we need this for the loop
  // create the Float32Array size 3 * 224 * 224 for these dimensions output
  const float32Data = new Float32Array(dims[1] * dims[2] * dims[3]);
  for (i = 0; i < l; i++) {
    float32Data[i] = transposedData[i] / 255.0; // convert to float
  }
  // 5. create the tensor object from onnxruntime-web.
  const inputTensor = new Tensor("float32", float32Data, dims);
  return inputTensor;
}

In marcelle/packages/core/src/components/onnx-model/onnx-model.component.ts

// …
import { imageDataToTensor } from '../../utils/image';
/// …
/// line 193
  @Catch
  preprocessImage(img: InputTypes['image']): ort.Tensor {
    return imageDataToTensor(img)
  }
/// …

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants