Skip to content

Conversation

@Alex-wuhu
Copy link

Summary

This PR introduces the WASI-NN Device (dev_wasi_nn). The device provides AI inference capabilities for HyperBEAM, supporting loading models from Arweave transactions and performing inference with session management for optimal performance. It provides an Erlang interface for AI model inference, leveraging a NIF backend for the actual inference logic and automatic model caching to avoid repeated downloads.

Key features:

  • GPU powered AI LLM inference
  • Automatic model download and caching from Arweave
  • Session management for context reuse across multiple requests
  • Persistent context management for improved performance
  • Robust error handling for network issues, model loading failures, and inference errors
  • Unit tests for both model download and inference flows

API Endpoints

GET /[email protected]/infer

  • Description: Performs AI inference using a specified model and prompt.
  • Parameters:
    • model-id : Arweave transaction ID of the model file. If not provided, uses a default model.
    • prompt (required): The input text for inference.
  • Returns: {ok, #{<<"result">> := Result, <<"session-id">> := SessionId}} on success, or {error, Reason} on failure.

Setup Environment

  • rebar3 as wasi_nn shell

File Change Description

  • dev_wasi_nn.erl : main device code for HB
  • dev_wasi_nn_nif.erl : NIF code for calling native implmentaion in C
  • rebar3.config & Makefile : add profiles for device wasi_nn -> rebar3 as wasi_nn shell
  • native/wasi_nn_llama : native implementation with wasi_nn & llamacpp backend

Models

  • Current tested stable model on Arweave is : Phi-3 Mini 4k Instruct ,"ISrbGzQot05rs_HKC08O_SmkipYQnqgB1yC3mjZZeEo"
  • Not support old models like Phi-2 or GPT-2
  • You can download latest models in local and using hb_store_fs to load model

Unit test

  • dev_wasi_nn :
    • read_model_by_ID_test : Return model file path , If not exist , download Phi-3 Mini 4k Instruct from arweave, remeber to setup longer HTTP timeout
    • infer_test : API end point unit test
image
  • dev_wasi_nn_nif : run_inference_test -> one round for inference call
nif

During call this device , logs will show using cuda for inference
nn2

@Alex-wuhu Alex-wuhu marked this pull request as ready for review July 30, 2025 12:34
* format: The format string for the message.
* ...: The variables to be printed in the format.
*/
void beamr_print(int print, const char* file, int line, const char* format, ...);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker on this PR, but perhaps if we are using these debugging prints more widely now, they should be abstracted into utility HB_PRINT (etc) functions?

Comment on lines 158 to 178
read_model_by_ID(TxID) ->
%% Start the HTTP server (required for gateway access)
hb_http_server:start_node(#{}),
%% Configure store with local caching for model files
LocalStore = #{
<<"store-module">> => hb_store_fs,
<<"name">> => <<"model-cache">>
},
Opts = #{
store => [
%% Try local cache first
LocalStore,
#{
<<"store-module">> => hb_store_gateway,
%% Cache results here
<<"local-store">> => LocalStore
}
]
},
%% Attempt to read the model from cache or download from Arweave
case hb_cache:read(TxID, Opts) of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best practice is just to pass the Opts to your helper function, then you can do hb_cache:read(TXID, Opts). I guess the idea here was to add the specific FS model store to the opts? In which case, you are better off making a function like this:

opts(BaseOpts) ->
    ModelStore = ..., % Ideally checking the opts for a `model_store` param so that the user can configure it if they want.
    NewOpts#{
        store = [ModelStore|hb_opts:get(store, [], BaseOpts)]
    }.

If the concern is just the size of the models though, it might make more sense to just work with us on the relevance filter for the hb_store interface. We want to add the ability to have filters based on size or path prefix (e.g, data/) to hb_store:write, such that bigger items go to the FS, etc.


%% Extract the data reference from the message
%% This could be either a link to existing cached data or binary data
DataLink = maps:get(<<"data">>, Message),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use hb_maps:get directly here instead? Destructuring and reorganizing links breaks the abstractions and is likely to cause lots of pain later when new stores are introduced, etc.

_ ->
cache_owner_loop()
after
3600000 -> % Stay alive for a long time (1 hour), then check again
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest, what is the motivation for this? It doesn't hurt, but I can't see intuitively how it helps either?

@samcamwilliams
Copy link
Collaborator

Awesome first PR @Alex-wuhu ! Thank you for contributing!

I left a bunch of notes throughout, but most should be relatively minor. Ping me a DM on Slack if you are up for working together on the hb_store relevance filter changes (and if I don't see it, nudge Davy from our team who will remind me 😄).

One thing that is normally worth doing is writing Eunit tests that use hb_ao:resolve to run through the flow. Even better, some 'local' hb_ao:resolves, and a few samples requests over HTTP too with hb_http:get[/post/request]. This can highlight any unexpected type or Opts issues which you might not catch if you are making direct calls.

Major 🫡s. This is super cool!

…ns support and cleaning up code in dev_wasi_nn.erl and dev_wasi_nn_nif.erl
@jax-cn jax-cn deleted the edge/PR branch September 5, 2025 04:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants