Skip to content

cosmonic/ml-demo

 
 

Repository files navigation

MlInference

NOTE: additional documentation here

This repository provides a wasmCloud capability provider and actors to perform inference using machine learning models for ONNX and Tensorflow.

Usage

In order to run and deploy the ML Demo on Cosmonic, follow the Cosmonic Getting Started Guide and use the cosmo CLI to deploy the application.

cosmo up
cosmo app deploy ./wadm.yaml

The deployed application should look like the following: Cosmonic Logic View showing that some actors are OK to not be linked

Examples

Apart from the underlying inference engine, e.g. ONNX vs. Tensorflow, the pre-configured models differ in a further aspect: concerning the trivial models, one may request processing upon arbitrary shapes of one-dimensional data, [1, n]. Mobilenet and Squeezenet, however, have more requirements regarding their respective input tensor. To fulfill these, the respective input tensor of an arbitrary image can be preprocessed before being routed to the inference engine.

The application provides three endpoints. The first endpoint routes the input tensor to the related inference engine without any pre-processing. The second endpoint pre-processes the input tensor and routes it to the related inference engine thereafter. The third performs a pre-processing before the prediction step and a post-processinging afterwards.

  1. 0.0.0.0:<port>/<model>, e.g. 0.0.0.0:7078/identity
  2. 0.0.0.0:<port>/<model>/preprocess, e.g. 0.0.0.0:7078/squeezenetv117/preprocess
  3. 0.0.0.0:<port>/<model>/matches, e.g. 0.0.0.0:7078/squeezenetv117/matches

Identity Model

To trigger a request against the identity model, type the following:

curl -v POST 0.0.0.0:8078/identity -d '{"dimensions":[1,4],"valueTypes":["ValueF32"],"flags":0,"data":[0,0,128,63,0,0,0,64,0,0,64,64,0,0,128,64]}'

The response should comprise HTTP/1.1 200 OK as well as {"result":"Success","tensor":{"dimensions":[1,4],"valueTypes":["ValueF32"],"flags":0,"data":[0,0,128,63,0,0,0,64,0,0,64,64,0,0,128,64]}}

The following happens:

  1. The http POST sends a request for a model with name "challenger", index 0 and some data.
  2. data is vector [1.0f32, 2.0, 3.0, 4.0] converted to a vector of bytes.
  3. A response is computed. The result is sent back.
  4. The data in the request equals data in the response because the pre-loaded model "challenger" is one that yields as output what it got as input.

Plus3 model

To trigger a request against the plus3 model, type the following:

curl -v POST 0.0.0.0:8078/plus3 -d '{"dimensions":[1,4],"valueTypes":["ValueF32"],"flags":0,"data":[0,0,128,63,0,0,0,64,0,0,64,64,0,0,128,64]}'

The response is

{"result":"Success","tensor":{"dimensions":[1,4],"valueTypes":["ValueF32"],"flags":0,"data":[0,0,128,64,0,0,160,64,0,0,192,64,0,0,224,64]}}

Note that in contrast to the identity model, the answer from plus3 is not at all identical to the request. Converting the vector of bytes [0,0,128,64,0,0,160,64,0,0,192,64,0,0,224,64] back to a vector of f32 yields [4.0, 5.0, 6.0, 7.0]. This was expected: each element from the input is incremented by three [1.0, 2.0, 3.0, 4.0][4.0, 5.0, 6.0, 7.0], hence the name of the model: plus3.

Mobilenet model

# in order for the relative path to match call from directory 'deploy'
curl -v POST 0.0.0.0:8078/mobilenetv27/preprocess --data-binary @../providers/mlinference/tests/testdata/images/n04350905.jpg

Note that the output tensor is of shape [1,1000] and needs to be post-processed by an evaluation of the softmax over the outputs. In case the softmax shall be evaluated as well use the third endpoint, for example like the following:

# in order for the relative path to match call from directory 'deploy'
curl -v POST 0.0.0.0:8078/mobilenetv27/matches --data-binary @../providers/mlinference/tests/testdata/images/n04350905.jpg

Squeezenet model

# in order for the relative path to match call from directory 'deploy'
curl -v POST 0.0.0.0:8078/squeezenetv117/preprocess --data-binary @../providers/mlinference/tests/testdata/images/n04350905.jpg

Note that the output tensor is of shape [1,1000] and needs to be post-processed where the post-processing is currently not part of the application. Or, including pos-processing as follows:

# in order for the relative path to match call from directory 'deploy'
curl -v POST 0.0.0.0:8078/squeezenetv117/matches --data-binary @../providers/mlinference/tests/testdata/images/n04350905.jpg

The answer should comprise

[{"label":"n02883205 bow tie, bow-tie, bowtie","probability":0.16806115},{"label":"n04350905 suit, suit of clothes","probability":0.14194612},{"label":"n03763968 military uniform","probability":0.11412828},{"label":"n02669723 academic gown, academic robe, judge's robe","probability":0.09906072},{"label":"n03787032 mortarboard","probability":0.09620707}]

Creation of new bindles

The capability provider assumes a bindle to comprise two parcels where each parcel is assigned one of the following two groups:

  • model
  • metadata

The first, model, is assumed to comprise model data, e.g. an ONNX model. The second, metadata, is currently assumed to be json containing the metadata of the model. In case you create new bindles, make sure to assign these two groups.

Supported Inference Engines

The capability provider uses the amazing inference toolkit tract and currently supports the following inference engines

  1. ONNX
  2. Tensorflow

Restrictions

Concerning ONNX, see tract's documentation for a detailed discussion of ONNX format coverage.

Concerning Tensorflow, only TensorFlow 1.x is supported, not Tensorflow 2. However, models of format Tensorflow 2 may be converted to Tensorflow 1.x. For a more detailled discussion, see the following resources:

  • https://www.tensorflow.org/guide/migrate/tf1_vs_tf2
  • https://stackoverflow.com/questions/59112527/primer-on-tensorflow-and-keras-the-past-tf1-the-present-tf2#:~:text=In%20terms%20of%20the%20behavior,full%20list%20of%20data%20types.

Currently, there is no support of any accelerators like GPUs or TPUs. On the one hand, there is a range of coral devices like the Dev board supporting Tensorflow for TPU based inference. However, they only support the Tensorflow Lite derivative. For more information see Coral's Edge TPU inferencing overview.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 73.4%
  • Makefile 16.0%
  • Shell 5.8%
  • HTML 2.4%
  • Smithy 2.4%