Implementing a WASI Proposal in Wasmtime: wasi-nn

In a previous post, Machine Learning in WebAssembly: Using wasi-nn in Wasmtime, we described the wasi-nn specification and a user-level view of its usage in Wasmtime. In this post, we dive into the details of implementing the proposal using wasi-nn as an example. If you are interested in designing new WASI specifications and making them work–especially in the Wasmtime engine–you may find this post useful.

Others have done related work: Aaron Turner has described how to build graphical applications with Wasmer and WASI and Radu Matei has written about adding a new WASI syscall in Wasmtime. This guide, unlike Aaron’s article, will focus on integration with Wasmtime, a Rust project, so the following sections will be Rust-focused and Wasmtime-specific. If you squint, the general approach will be similar for other Wasm runtimes (e.g. NodeJS) but the details may look substantially different. This guide will build on Radu’s article, encompassing the end-to-end process of a WASI proposal and diving into more detail. The process involves several steps:

design the WASI API
provide a backing implementation (in this case, by porting an ML framework to Rust)
implement the WITX specification with the backing implementation
expose the implementation in the runtime
optionally, provide bindings to compile programs to the specification (in this case, from Rust to wasi-nn)

WASI API

WASI exists to expose system interfaces to WebAssembly programs. It divides its API surface into separate proposals, organized by theme into proposals such as filesystem, IO, clocks, crypto, etc. Our proposal, wasi-nn, is a more exotic addition to this list. We discuss the motivation more in the previous post, but briefly, machine learning is a popular feature that benefits from a system interface and, until wasi-nn, WebAssembly had no access to the hardware’s peak ML performance.

WASI proposals follow a process through several stages. The first step involves presenting an idea (half-baked even!) to the WASI subgroup. This is usually enough to reach stage 0.

Eventually the proposal “champion,” in WASI terms, must define the interface. To do so, the WASI repository defines the WITX language and associated tools. For wasi-nn, we defined several new types (e.g. the tensor struct, the tensor_type enum, etc.) along with methods like set_input to pass tensors into the inference engine:

  ;;; Define the inputs to use for inference.
  (@interface func (export "set_input")
    (param $context $graph_execution_context)
    ;;; The index of the input to change.
    (param $index u32)
    ;;; The tensor to set as the input.
    (param $tensor $tensor)

    (result $error $nn_errno)
  )

The WASI repository provides tools that parse the WITX specification and generate human-readable documentation:

# Create or modify a witx file in-tree, then:
cd tools/witx
cargo run --example witx repo-docs

The output for wasi-nn is available here.

Backing Implementation

Once the API is defined in WITX, we must build (or find) an implementation of the API. If you already have an implementation for your API, you can safely jump to the next section. If not, this section describes how we exposed OpenVINO™ in Rust.

We decided to use OpenVINO™ to implement wasi-nn, since:

it can execute models on various devices (e.g. not solely the CPU) and
it has received considerable attention to improve inference performance on Intel CPUs (full disclosure: I work for Intel).

The wasi-nn API is flexible enough to accept different, even multiple, backing implementations, so choosing OpenVINO™ here does not prohibit similar work using TensorFlow or WinML–in fact, that work would be beneficial to prove out wasi-nn’s generality.

Since OpenVINO™ does not have Rust bindings, I implemented both openvino-sys and openvino and published the crates. Rust bindings (allowing calls from Rust to the backing implementation) are necessary for integration with Wasmtime; other Wasm engines will have different language requirements.

Note: Using OpenVINO™ means that the models accepted by this implementation must be in OpenVINO™ IR format. This format uses two files for encoding an ML graph: a graph description XML file and a binary file with the weights. The OpenVINO™ IR is not the most common encoding format out in the wild, so OpenVINO™ provides a model-optimizer tool to convert models from other formats (e.g. Caffe, TensorFlox, ONNX) to OpenVINO™ IR. If you are interested in executing your model with this implementation of wasi-nn, see the model-optimizer documentation and my test example.

Implement WITX with a Backing Implementation

Now that we have a WITX specification and a backing implementation (e.g. openvino) we can attempt to bind them together using wiggle. The guest, a Wasm module, will write code using the API exposed in the WITX file and the host, Wasmtime in our case, provides the implementation. The host code is contained in a new crate, crates/wasi-nn, to which I immediately added my version of the WASI spec (the one containing the wasi-nn proposal) as a Git submodule:

cd crates/wasi-nn
git submodule add https://github.com/abrown/

Then, in my crate’s build.rs, I set up an environment variable to inform wiggle of the location of the WITX files:

let wasi_root = PathBuf::from("./spec").canonicalize().unwrap();
println!("cargo:rustc-env=WASI_ROOT={}", wasi_root.display());

If you are making changes to the WITX specification in-tree, let Cargo know that it should rebuild the crate if it observes a WITX change:

for entry in walkdir::WalkDir::new(wasi_root) {
    println!("cargo:rerun-if-changed={}", entry.unwrap().path().display());
}

I then used wiggle::from_witx to bind together the WITX specification to the host structures it expects me to create (WasiNnCtx and WasiNnError):

wiggle::from_witx!({
    witx: ["$WASI_ROOT/phases/ephemeral/witx/wasi_ephemeral_nn.witx"],
    ctx: WasiNnCtx,
    errors: { errno => WasiNnError }
});

WasiNnCtx should contain any state necessary for implementing the proposal; e.g., we return u32 handles to the loaded graphs and execution contexts, so WasiNnCtx maintains a hash map of handles to these structures. WasiNnError is a wrapper structure for the errors that the implementation could return. wiggle::from_witx will rather abruptly alert you of any missing implementations. For example, you will see in witx.rs how we must implement the GuestErrorConversion and UserErrorConversion traits for our context, WasiNnCtx; these translate errors (from the guest and from our implementation, respectively) into error codes that user of our WASI module can interact with (all WASI errors are currently encoded as integers). I found it quite helpful to use the cargo-expand CLI tool to visualize what wiggle was doing for me:

$ cargo expand witx

Redacting the macro expansions to see the overall structure, you may notice that wiggle generates Rust types from our WITX specification:

pub mod types {
    ...
    pub enum Errno { ...
    pub type TensorDimensions<'a> = wiggle::GuestPtr<'a, [u32]>; ...
    pub enum TensorType { ...
    pub struct Graph(u32); ...
}

Then it generates wrapper functions that will type-convert, log, etc. guest-side calls (e.g. from Wasm) and proxy them into the WasiEphemeralNn trait, which it also generates. The WasiEphemeralNn trait is important because that is where we tie our backing implementation to the context:

pub mod wasi_ephemeral_nn {
    ...
    pub fn load(ctx: &WasiNnCtx, memory: &dyn wiggle::GuestMemory, builder_ptr: i32..., builder_len: i32, encoding: i32, target: i32, graph_ptr: i32) -> i32 { ...
    pub fn init_execution_context(ctx: &WasiNnCtx, memory: &dyn wiggle::GuestMemory, graph: i32, context_ptr: i32) -> i32 { ...
    pub fn set_input(ctx: &WasiNnCtx, memory: &dyn wiggle::GuestMemory, context: i32, index: i32, tensor_ptr: i32) -> i32 { ...
    pub fn get_output(ctx: &WasiNnCtx, memory: &dyn wiggle::GuestMemory, context: i32, index: i32, out_buffer: i32, out_buffer_max_size: i32, bytes_written_ptr: i32) -> i32 { ...
    pub fn compute(ctx: &WasiNnCtx, memory: &dyn wiggle::GuestMemory, context: i32) -> i32 {
    ...
    pub trait WasiEphemeralNn {
        fn load<'a>(&self, builder: &GraphBuilderArray<'a>, encoding: GraphEncoding, target: ExecutionTarget) -> Result<(Graph), super::WasiNnError>;
        fn init_execution_context(&self, graph: Graph) -> Result<(GraphExecutionContext), super::WasiNnError>;
        fn set_input<'a>(&self, context: GraphExecutionContext, index: u32, tensor: &Tensor<'a>) -> Result<(), super::WasiNnError>;
        fn get_output<'a>(&self, context: GraphExecutionContext, index: u32, out_buffer: &wiggle::GuestPtr<'a, u8>, out_buffer_max_size: Size) -> Result<(Size), super::WasiNnError>;
        fn compute(&self, context: GraphExecutionContext) -> Result<(), super::WasiNnError>;
    }
}

Now that we can see the types and traits that wiggle has generated for us, we can actually implement wasi-nn. In impl.rs, I use the WasiNnCtx structure which I control along with imported functions from my backing implementation, openvino, to implement WasiEphemeralNn:

impl<'a> WasiEphemeralNn for WasiNnCtx {
    fn load<'b>(&self, builders: &GraphBuilderArray<'_>, encoding: GraphEncoding, target: ExecutionTarget) -> Result<Graph> {
        if encoding != GraphEncoding::Openvino {
            return Err(UsageError::InvalidEncoding(encoding).into());
        }
        if builders.len() != 2 {
            return Err(UsageError::InvalidNumberOfBuilders(builders.len()).into());
        }
...

For primitive types and enumerations (e.g. GraphEncoding above), this is quite simple, but any kind of buffer-passing (e.g. GraphBuilderArray above) is tricky across the guest-host divide. To cross this divide, wiggle exposes structures that allow the host to “see” and alter the guest code’s memory (see GuestPtr, GuestSlice, e.g.). I highly recommend reading through the documentation to these host-to-guest structures, since their misuse could result in errors or unsafe code.

Expose the Implementation in the Runtime

At this point, we have implemented the wasi-nn specification but we must inform Wasmtime that it should actually use our implementation when it executes Wasm modules. To do this, we use a Wasmtime-specific macro to generate implementation-to-runtime binding code:

wasmtime_wiggle::wasmtime_integration!({:
    target: witx,
    witx: ["$WASI_ROOT/phases/ephemeral/witx/wasi_ephemeral_nn.witx"],
    ctx: WasiNnCtx,
    modules: {
        wasi_ephemeral_nn => {
          name: WasiNn,
          docs: "An instantiated instance of the wasi-nn exports.",
          function_override: {}
        }
    },
    missing_memory: { witx::types::Errno::MissingMemory },
});

If we glance again at the output of cargo expand, we now see a new generated structure, WasiNn:

...
pub struct WasiNn {
    pub load: wasmtime::Func,
    pub init_execution_context: wasmtime::Func,
    pub set_input: wasmtime::Func,
    pub get_output: wasmtime::Func,
    pub compute: wasmtime::Func,
}
...
impl WasiNn {
    ...
    pub fn add_to_linker(&self, linker: &mut wasmtime::Linker) -> anyhow::Result<()> { ...
}

WasiNn::add_to_linker is used when we want to expose wasi-nn functionality to the user’s Wasm modules–using Wasmtime’s Linker, we instantiate WasiNn and add it to the functions available to import:

use wasmtime::{Linker, Module, Store};
use wasmtime_wasi_nn::{WasiNn, WasiNnCtx};
// ...
let store = Store::default();
let mut linker = Linker::new(&store);
let wasi_nn = WasiNn::new(&store, WasiNnCtx::new()?);
wasi_nn.add_to_linker(&mut linker)?;
let module = Module::from_file(store.engine(), file)?;
linker.module("", &module)?;
linker.get_default("")?.get0::<()>()?()?;

This section and its immediate predecessor are the most challenging: any new WASI proposal must first bind the WITX specification to an implementation and then implementation to the runtime, and to do so concretely in Wasmtime we:

use wiggle::from_witx! to generate the necessary types and traits
fill in the implementation trait (i.e. WasiEphemeralNn) with our implementation code (i.e. a combination of WasiNnCtx and openvino)
use wasmtime_wiggle::wasmtime_integration! to generate the linking code to the runtime (i.e. WasiNn::add_to_linker)
configure the runtime to import our new functionality

The full code described in this section is available here.

Provide WITX Bindings: wasi-nn-bindings

In order to compile a program to a wasi-nn Wasm binary, the programmer must call the wasi-nn API functions. In certain languages (e.g. Rust), this may be un-ergonomic. Using witx-bindgen I generated Rust bindings for wasi-nn that are available here. This enables the user to import a Rust-friendly wrapper of our API and write Rust code against it:

use wasi_nn;

unsafe {
    wasi_nn::load(&[&xml.into_bytes(), &weights], wasi_nn::GRAPH_ENCODING_OPENVINO, wasi_nn::EXECUTION_TARGET_CPU)
    .unwrap()
}

Unfortunately, the output of witx-bindgen is not perfect (see, e.g., this issue about pointers) but the resulting wrapper does eliminate some boilerplate. As an aside on toolchain ease-of-use, it would be quite nice if the WITX tooling automatically generated these bindings for us; bonus points would be for this automatic generation to occur for several languages (e.g. C, Rust).

Conclusion

This post walked through the process of implementing a new WASI proposal using wasi-nn as an example–from WITX specification to implementation in Wasmtime. Having walked this process, here are some possible improvements:

since working with macros is not particularly easy, offering a way to statically generate the glue code would benefit larger proposals (a la bindgen)
the wiggle documentation needs to continue to improve–some examples would be highly appreciated, as well as “do not do this”-style warnings
some portions of this article should be adapted for the Wasmtime book

If you would like to contribute to this effort or implement your own WASI proposal, there is help available: open an issue or pull request on the WASI or Wasmtime repositories. Also, someone related to the project is usually online on the BytecodeAlliance Zulip channel

Special thanks to Pat Hickey for answering all my questions about WITX, wiggle, and Wasmtime! Also, thanks to Mingqiu Sun and Johnnie Birch for reviewing drafts of this article.