Machine Learning in WebAssembly: Using wasi-nn in Wasmtime

The wasi-nn proposal allows WebAssembly programs to access host-provided machine learning (ML) functions. This post will explain the motivation for wasi-nn, a brief look at the specification, and how to use it in Wasmtime to do machine learning inference. You may find this post interesting if you want to execute ML inference in a standalone WebAssembly runtime (i.e. not in a browser) or if you would like to understand the process for implementing new WASI specifications. In a follow-on post, Implementing a WASI proposal in Wasmtime, I explain how I implemented the wasi-nn proposal in Wasmtime using OpenVINO™.

Motivation

First, why design and implement a WASI ML specification? There are efforts to port ML frameworks to Wasm SIMD (e.g. TensorFlow), but these are hampered by the “lowest-common denominator” approach of Wasm and Wasm SIMD: Wasm specifications must consider the limitations of all CPU architectures and therefore cannot take advantage of specialized instructions in any single architecture (in our case, x86). For peak performance, another approach was needed: the implementation described here uses OpenVINO™ for peak performance on x86 architectures and extending support to other architectures is completely possible.

Missing CPU instructions are not the only problem: many ML models take advantage of auxiliary processing units (e.g. GPUs, TPUs). It is difficult to see how Wasm could compile to such devices (at least currently) so some other way of using those devices is necessary–a higher-level API like wasi-nn provides a way to use them.

Finally, deploying ML inference to Wasm runtimes is difficult enough without adding the complexity of a translation to a lower-level abstraction. So wasi-nn allows programmers to deploy models directly, shifting the work of compiling the models for the appropriate device to other tools (e.g. OpenVINO™, TF)–the wasi-nn specification is agnostic to which one is used. If at some point a Wasm proposal were available that made it possible to use a machine’s full ML performance (e.g. flexible-vectors, gpu), it is conceivable that wasi-nn could be implemented “under the hood” with only Wasm primitives–until that time, ML programmers will still be able to execute inference using the approach described here and should see minimal changes if such a switch were to happen.

In summary, something like wasi-nn is needed because:

Wasm is unlikely to expose instructions necessary for peak ML performance in the near future
Wasm is unlikely to compile to non-CPU architectures any time soon
A high-level API is helpful and, if the above become true, wasi-nn could be re-implemented on top of them

Specification

When we began investigating how to expose ML functionality to Wasm programs, we considered WebNN. WebNN is a draft browser API with a similar goal–to provide ML functionality to users. Keeping wasi-nn’s API surface close to WebNN’s seemed a worthy goal: ideally, users could compile Wasm programs using ML features that could execute in either environment (browser or standalone Wasm runtimes).

Unfortunately, WebNN (like Android’s NDK) allows users to craft the ML computation graph node-by-node–we call this a “graph builder” API. This means that the API must expose nodes for every mathematical operation. Since the ML ecosystem is still growing, the set of operations is not complete. By some accounts, the number of operations in TF is growing at 20% per year.

To avoid this, we chose to specify wasi-nn as a “graph loader” API (at least initially, more below). Like Microsoft’s WinML API, wasi-nn allows users to “load, bind, and evaluate” models. But whereas WinML expects the model to be packaged in ONNX format, wasi-nn is format-agnostic: it assumes models can be one or more blobs of bytes that are passed along unchanged to the implementation which is determined by a $graph_encoding enumeration. This makes it possible to implement wasi-nn with different underlying implementations: it could pass ONNX models on to a WinML backend, TF models on to a TF backend, OpenVINO™ models to an OpenVINO™ backend, etc.

The decision to specify wasi-nn as a “graph loader” API does not preclude a “graph builder” addition in the future. But initially, while the set of operations stabilizes, the “graph loader” approach has the least churn for ML developers targeting standalone Wasm. (In fact, if WebNN were to add a “loader” API, the interoperability gap would be considerably smaller–with a polyfill, wasi-nn programs could run in the browser using WebNN).

As it stands today, the wasi-nn specification expects users to:

load a model using one or more opaque byte arrays
init_execution_context and bind some tensors to it using set_input
compute the ML inference using the bound context
retrieve the inference result tensors using get_output

Use

In its first iteration, the wasi-nn specification is implemented in the Wasmtime engine using OpenVINO™ as the backing implementation. A follow-on post, Implementing a WASI proposal in Wasmtime, describes the details of the implementation. Here we focus on how to use wasi-nn in Wasmtime.

The first step is to build or retrieve the ML artifacts. The inputs to OpenVINO™ will be a model, the model weights, and one or more input tensors. For this example, I generated OpenVINO™-compatible artifacts for an AlexNet model as a test fixture here. To do this, I used OpenVINO™’s model optimizer, which has support for various model types, such as Caffe, TensorFlow, MXNet, Kaldi, and ONNX. Because wasi-nn does not pre-process the input tensors (i.e. decode and resize images), I created a tool to read images and generate the raw tensors (i.e. BGR) offline with the dimensions and precision the model expects. The artifacts include:

alexnet.xml: the model description
alexnet.bin: the model weights
tensor-1x3x227x227-f32.bgr: the image tensor, adapted for the model
build.sh: a script for regenerating the artifacts

Next, we build Wasmtime with wasi-nn enabled; since ML inference is an optional (and highly experimental!) feature, we must explicitly include it as a build option. Enabling wasi-nn in Wasmtime turns on the wasmtime-wasi-nn crate that imports the openvino crate . Though the openvino crate can build OpenVINO™ from source, the fastest and most stable route is to build it using existing OpenVINO™ binaries:

$ OPENVINO_INSTALL_DIR=/opt/intel/openvino cargo build -p wasmtime-cli --features wasi-nn

You can download OpenVINO™ binaries through package repositories such as apt and yum, or optionally build from source. Your platform may have require a different OPENVINO_INSTALL_DIR location, though /opt/intel/openvino is the default Linux destination. If you run into issues, the openvino crate documentation may be helpful.

Once Wasmtime is built with wasi-nn enabled, we can write an example using the wasi-nn APIs. Here we use Rust to compile to WebAssembly, but any language that targets WebAssembly should work. Note how we reference the ML artifacts we retrieved previously:

pub fn main() {
    // Load the graph.
    let xml = fs::read_to_string("fixture/alexnet.xml").unwrap();
    let weights = fs::read("fixture/alexnet.bin").unwrap();
    let graph = unsafe { wasi_nn::load(&[&xml.into_bytes(), &weights], wasi_nn::GRAPH_ENCODING_OPENVINO, wasi_nn::EXECUTION_TARGET_CPU).unwrap() };

    // Add the input tensor.
    let context = unsafe { wasi_nn::init_execution_context(graph).unwrap() };
    let tensor_data = fs::read("fixture/tensor-1x3x227x227-f32.bgr").unwrap();
    let tensor = wasi_nn::Tensor { dimensions: &[1, 3, 227, 227], r#type: wasi_nn::TENSOR_TYPE_F32, data: &tensor_data };
    unsafe { wasi_nn::set_input(context, 0, tensor).unwrap(); }

    // Execute the inference.
    unsafe { wasi_nn::compute(context).unwrap(); }

    // Retrieve the output tensor.
    let mut output_buffer = vec![0f32; 1000];
    unsafe { wasi_nn::get_output(context, 0, &mut output_buffer[..] as *mut [f32] as *mut u8, (output_buffer.len() * 4).try_into().unwrap()); }
}

If we compile the example using the wasm-wasi32 target, we can see that it will import the necessary wasi-nn functions:

$ cargo build --release --target=wasm32-wasi
$ wasm2wat target/wasm32-wasi/release/wasi-nn-example.wasm | grep import
    (import "wasi_snapshot_preview1" "proc_exit" (func $__wasi_proc_exit (type 0)))
    (import "wasi_ephemeral_nn" "load" (func $_ZN7wasi_nn9generated17wasi_ephemeral_nn4load17hfba8f512ab8c63beE (type 9)))
    (import "wasi_ephemeral_nn" "init_execution_context" (func $_ZN7wasi_nn9generated17wasi_ephemeral_nn22init_execution_context17he6f1beedc2598fbdE (type 2)))
    (import "wasi_ephemeral_nn" "set_input" (func $_ZN7wasi_nn9generated17wasi_ephemeral_nn9set_input17h51e14f836a91281bE (type 8)))
    (import "wasi_ephemeral_nn" "get_output" (func $_ZN7wasi_nn9generated17wasi_ephemeral_nn10get_output17h6a86ab75f932394dE (type 9)))
    (import "wasi_ephemeral_nn" "compute" (func $_ZN7wasi_nn9generated17wasi_ephemeral_nn7compute17h49f58c91c97507d5E (type 5)))
    (import "wasi_snapshot_preview1" "fd_close" (func $_ZN4wasi13lib_generated22wasi_snapshot_preview18fd_close17he8c060f039f6c828E (type 5)))
    (import "wasi_snapshot_preview1" "fd_filestat_get" (func $_ZN4wasi13lib_generated22wasi_snapshot_preview115fd_filestat_get17h27caa6992e6ea3b8E (type 2)))
    ...

Now we can run this example using Wasmtime!

# Ensure the OpenVINO libraries are on the library path (e.g. LD_LIBRARY_PATH) since they will be dynamically linked:
$ source /opt/intel/openvino/bin/setupvars.sh
# Run our example Wasm in Wasmtime with the wasi-nn feature enabled and tell it where to look for the model artifacts (i.e. $ARTIFACTS_DIR)
$ OPENVINO_INSTALL_DIR=/opt/intel/openvino cargo run --features wasi-nn -- run --mapdir fixture::$ARTIFACTS_DIR target/wasm32-wasi/release/wasi-nn-example.wasm

The full example is available here and the process described above can be run using a Wasmtime CI script, ci/run-wasi-nn-example.sh.

To recap:

retrieve and/or generate model artifacts
build Wasmtime with the wasi-nn feature
compile your code against the wasm32-wasi target (in Rust parlance)
run the resulting Wasm file in wasi-nn-enabled Wasmtime

Conclusion

This post introduces the wasi-nn specification and demonstrates how to use it in the Wasmtime engine. These are early days for wasi-nn and further work is clearly needed: extending and refining the API, supporting more backends in Wasmtime, implementing wasi-nn in other Wasmtime engines, etc. If you are interested in contributing to this effort, please do!

for discussions on the specification, open an issue in the wasi-nn repository
to contribute to the implementation in Wasmtime, check out the wasmtime-wasi-nn crate
for other questions, one of us is usually available on the BytecodeAlliance Zulip channel

Finally, thanks to Mingqiu Sun for thinking through the specification issues with me and Johnnie Birch for valuable review feedback. Alex Crichton reviewed most of the CI integration and Dan Gohman really guided us through the WASI specification proposal process. Thanks!