Device wasi_nn for AI inference #393

Alex-wuhu · 2025-07-30T12:34:11Z

Summary

This PR introduces the WASI-NN Device (dev_wasi_nn). The device provides AI inference capabilities for HyperBEAM, supporting loading models from Arweave transactions and performing inference with session management for optimal performance. It provides an Erlang interface for AI model inference, leveraging a NIF backend for the actual inference logic and automatic model caching to avoid repeated downloads.

Key features:

GPU powered AI LLM inference
Automatic model download and caching from Arweave
Session management for context reuse across multiple requests
Persistent context management for improved performance
Robust error handling for network issues, model loading failures, and inference errors
Unit tests for both model download and inference flows

API Endpoints

GET /[email protected]/infer

Description: Performs AI inference using a specified model and prompt.
Parameters:
- model-id : Arweave transaction ID of the model file. If not provided, uses a default model.
- prompt (required): The input text for inference.
Returns: {ok, #{<<"result">> := Result, <<"session-id">> := SessionId}} on success, or {error, Reason} on failure.

Setup Environment

rebar3 as wasi_nn shell

File Change Description

dev_wasi_nn.erl : main device code for HB
dev_wasi_nn_nif.erl : NIF code for calling native implmentaion in C
rebar3.config & Makefile : add profiles for device wasi_nn -> rebar3 as wasi_nn shell
native/wasi_nn_llama : native implementation with wasi_nn & llamacpp backend

Models

Current tested stable model on Arweave is : Phi-3 Mini 4k Instruct ,"ISrbGzQot05rs_HKC08O_SmkipYQnqgB1yC3mjZZeEo"
Not support old models like Phi-2 or GPT-2
You can download latest models in local and using hb_store_fs to load model

Unit test

dev_wasi_nn :
- read_model_by_ID_test : Return model file path , If not exist , download Phi-3 Mini 4k Instruct from arweave, remeber to setup longer HTTP timeout
- infer_test : API end point unit test

dev_wasi_nn_nif : run_inference_test -> one round for inference call

During call this device , logs will show using cuda for inference

…_nn.erl

samcamwilliams · 2025-08-03T06:02:28Z

native/wasi_nn_llama/include/wasi_nn_logging.h

+ *  format: The format string for the message.
+ *  ...: The variables to be printed in the format.
+ */
+void beamr_print(int print, const char* file, int line, const char* format, ...);


Not a blocker on this PR, but perhaps if we are using these debugging prints more widely now, they should be abstracted into utility HB_PRINT (etc) functions?

src/dev_wasi_nn.erl

samcamwilliams · 2025-08-03T06:11:25Z

src/dev_wasi_nn.erl

+read_model_by_ID(TxID) ->
+    %% Start the HTTP server (required for gateway access)
+    hb_http_server:start_node(#{}),
+    %% Configure store with local caching for model files
+    LocalStore = #{
+        <<"store-module">> => hb_store_fs,
+        <<"name">> => <<"model-cache">>
+    },
+    Opts = #{
+        store => [
+            %% Try local cache first
+            LocalStore,
+            #{
+                <<"store-module">> => hb_store_gateway,
+                %% Cache results here
+                <<"local-store">> => LocalStore
+            }
+        ]
+    },
+    %% Attempt to read the model from cache or download from Arweave
+    case hb_cache:read(TxID, Opts) of


Best practice is just to pass the Opts to your helper function, then you can do hb_cache:read(TXID, Opts). I guess the idea here was to add the specific FS model store to the opts? In which case, you are better off making a function like this:

opts(BaseOpts) -> ModelStore = ..., % Ideally checking the opts for a `model_store` param so that the user can configure it if they want. NewOpts#{ store = [ModelStore|hb_opts:get(store, [], BaseOpts)] }.

If the concern is just the size of the models though, it might make more sense to just work with us on the relevance filter for the hb_store interface. We want to add the ability to have filters based on size or path prefix (e.g, data/) to hb_store:write, such that bigger items go to the FS, etc.

src/dev_wasi_nn.erl

samcamwilliams · 2025-08-03T06:16:33Z

src/dev_wasi_nn.erl

+
+            %% Extract the data reference from the message
+            %% This could be either a link to existing cached data or binary data
+            DataLink = maps:get(<<"data">>, Message),


Can we just use hb_maps:get directly here instead? Destructuring and reorganizing links breaks the abstractions and is likely to cause lots of pain later when new stores are introduced, etc.

src/dev_wasi_nn.erl

samcamwilliams · 2025-08-03T06:29:11Z

src/dev_wasi_nn_nif.erl

+        _ -> 
+            cache_owner_loop()
+    after 
+        3600000 -> % Stay alive for a long time (1 hour), then check again


Out of interest, what is the motivation for this? It doesn't hurt, but I can't see intuitively how it helps either?

src/dev_wasi_nn_nif.erl

samcamwilliams · 2025-08-03T06:37:13Z

Awesome first PR @Alex-wuhu ! Thank you for contributing!

I left a bunch of notes throughout, but most should be relatively minor. Ping me a DM on Slack if you are up for working together on the hb_store relevance filter changes (and if I don't see it, nudge Davy from our team who will remind me 😄).

One thing that is normally worth doing is writing Eunit tests that use hb_ao:resolve to run through the flow. Even better, some 'local' hb_ao:resolves, and a few samples requests over HTTP too with hb_http:get[/post/request]. This can highlight any unexpected type or Opts issues which you might not catch if you are making direct calls.

Major 🫡s. This is super cool!

…ns support and cleaning up code in dev_wasi_nn.erl and dev_wasi_nn_nif.erl

Alex-wuhu and others added 6 commits July 23, 2025 03:07

add device wasi nn and compile set up

86dec73

Merge remote-tracking branch 'origin/permaweb/edge' into edge/PR

224b581

refactor: update model paths and clean up inference logic in dev_wasi…

beaf12c

…_nn.erl

clean up code add UT

b3c2eb0

remove debug

f2959f6

using hb_cache for model files

90fe7f1

Alex-wuhu marked this pull request as ready for review July 30, 2025 12:34