The past year has been a productive one for the Cranelift project! Cranelift is the Bytecode Alliance’s native code compiler that serves as the foundation for the Wasmtime and Lucet WebAssembly virtual machines, and is used in other contexts as well, e.g. as an alternative backend for the Rust compiler.
We’re very excited about the progress we’ve made. In particular, it is wonderful to see how the community has grown organically as the compiler matures: since the start of the year we have had contributions from 33 distinct people1, both folks who do this as a day-job and also dedicated individual contributors. This is something to celebrate, and not to take for granted; we are doing our best to foster an open and inclusive community that welcomes anyone who wants to learn about compilers and help us build one!
This post is a look back at the year, chronicling the various major projects and enhancements that the compiler has undergone. We’ll first walk through all of the different results, then we’ll take a look at last year’s roadmap to see how we fared relative to our aspirations. (For those who are curious, we’re currently planning our 2022 roadmap as well!)
The first major goal we achieved in 2021 was to complete the migration from the old to new backend framework. This has been a long-running effort: beginning in early 2020, we discussed ways to improve the maintainability, ease of development, and performance of the codegen backends, which culminated in the new backend framework. This new framework was co-developed with our aarch64 (ARM64) backend, and soon after it came online, we began work on a new x86-64 backend, living alongside the existing one.
The long-term plan was to get the new x86-64 backend to parity, then switch the default, then remove the old x86-64 backend, then remove the old backend framework. In this way we “kept the plane flying while replacing the engines” — but to do so, we had to work through a long list of issues to reach parity, and then carefully evaluate correctness. These issues included a number of grungy compiler-infrastructure details (unwind and debuginfo, 128-bit ops, TLS and GOT support, struct arguments, Windows fastcall ABI support) and a few other things.
Once we had all tests passing and all of the long-tail issues worked out, and once our differential fuzzers churned on the new backend for a while (using a Wasm interpreter as an oracle as well as against the old backend), we wrote up an RFC, built consensus, and switched over the default. About six months later, when it was clear we no longer needed the old backend as a fallback, we passed another RFC to propose removing it. Then, with much cheering and celebration but also a slightly bittersweet “goodbye”, we did so. All said and done, it took us a little under two years to completely revamp our compiler backend design, leaving us with a stronger foundation for all of our ongoing work.
Security: CVE, VeriWasm
We switched our default to the new compiler backend in March, and all was well… until April, when we had our first CVE. The bug was a simple flaw in our ABI glue code that, when implementing a register-allocator reload of a spilled value, did a sign-extend rather than a zero-extend. This could in theory allow a WebAssembly instance to access memory outside its sandbox. Fortunately, after careful analysis, we found that the impact was masked by address-space layout randomization in default configurations of Wasmtime, and was not exploited in practice where Cranelift is in production. We wrote up the issue and our responses in a blog post.
While this was extremely stressful at the time, there were a number of silver linings for the project that came out of the experience. The first was that our processes for handling security-related issues, building and testing patches, and coordinating their release along with vulnerability disclosures, were all tested and made more concrete where necessary. We are now more confident (not that we weren’t before, but now it’s tested!) that we can respond to future security issues in a responsible and appropriate way.
This also gave us a needed push to look at ways to proactively find and mitigate against security issues. The VeriWasm tool was originally developed (previously, before this security incident) in a collaboration between UCSD and Stanford researchers and Fastly; the tool checks the machine code compiled from WebAssembly to ensure that its sandboxing properties are intact. We brought it up to date to support verification of code produced by the new Cranelift backend, and integrated it into Lucet so that a simple
--veriwasm option on the compiler commandline would verify the just-compiled artifact. Mitigations such as this provide an additional layer of defense when compiler bugs do slip through the cracks.
Fuzzing and Correctness
In addition to the VeriWasm effort, we have had a long-running interest in fuzzing and ensuring correctness in a bunch of different ways in Cranelift. We have a number of fuzz targets that run continuously on OSS-Fuzz, and we are always looking to add more.
In 2021, the stars aligned and we acquired three (!) separate new differential-execution fuzz targets that execute the same code under Cranelift and other compilers/engines, comparing the results. In addition to the original differential wasmi fuzzer, we now have differential fuzzing against V8, the official Wasm spec interpreter, and the CLIF (Cranelift IR) interpreter. There is ongoing work to determine how best to use all of these; for example, the Wasm spec interpreter is not well-suited to running large programs (it is designed to match the spec formalisms exactly, rather than for speed) and so we are not running it in that way, but we might use it to test smaller code fragments.
There was also an extremely innovative project to build a “custom mutator”, wasm-mutate, that allows all of our Wasm-based fuzzing to more effectively generate and test interesting programs by making semantics-preserving changes. This project is a good example of the kind of out-of-the-box thinking that we hope to continue to apply to achieve the highest quality and correctness that we can with our resources.
We have the Sightglass benchmark suite, intended to provide a stable basis of evaluation as we evolve our compiler. In the past year, this project received significant work: we updated and added new benchmarks (1, 2, 3, 4, 5), and did a major revamp of the runner and stats-processing tooling in order to provide sound statistical results (1, 2, 3). This infrastructure has proved useful in evaluating major compiler changes: for example, regalloc2 development was largely driven by compile-speed and runtime performance measurements using Sightglass, and ISLE evaluations used this infrastructure as well to show that the new instruction selector DSL (below!) did not harm performance.
Support for SIMD (“vector instructions”) is increasingly important for any modern computing platform to provide, and Cranelift is no different: we have prioritized completion of our SIMD support on both aarch64 and x86-64. We now have a feature-complete implementation of Wasm-SIMD, thanks to tireless work (too many PRs to link individually!) by several folks, mainly our core contributors from Intel and Arm. This implementation is being fuzzed continuously against the V8 engine’s implementation, and after a flurry of initial fuzzbugs that we fixed, it seems to be stable; soon, we will consider enabling it in the default configuration.
Register Allocator (regalloc2)
As part of the push for better compiler performance, we did a deep dive this past year into the register allocator. While profiling the compiler itself, we found that a majority of time was usually spent in regalloc: even with all optimizations enabled, the register allocation process took more time than all of the other compiler stages put together (i.e., often over 50%). Moreover, the speed of the generated code is often highly dependent on how well the allocator can avoid unnecessary spills and moves between registers. This part of the compiler was thus a fruitful target for optimization.
We initially built the new backend framework around regalloc.rs, but it became clear when considering what optimizations we’d like to do that starting from a more battle-hardened foundation might give us a useful boost. So, in the spirit of open-source, we borrowed IonMonkey’s register allocator, starting by transliterating it as best we could from C++ to Rust. This was the beginning of regalloc2.
In the process of doing this, we started to find that there were opportunities to optimize differently and improve performance — from better, more cache-efficient data structures to asymptotically more efficient algorithms (e.g., avoiding quadratic behavior) to simpler, more effective heuristics. (The commit history preserves this quite circuitous phase of experimentation and gradual improvement.) In the end, the allocator remains about 50% IonMonkey-derived design and is about 50% novel.
There is also a complicated history of working out how to integrate this into Cranelift; we were thinking possibly at first via a compatibility shim that emulates regalloc.rs, but it is more likely now that we will wait until our ISLE transition (below) is complete and then natively use regalloc2’s API.
regalloc2 slightly improves performance when used via the compatibility shim, and should improve it substantially more once used natively. In any case, better performance awaits!
Instruction Selector DSL (ISLE)
Our most recent large project was a revamp of the instruction-selector code to use a custom-designed domain-specific language (DSL) to concisely specify lowering patterns. This project was a long-running effort over half the year: it started as a pre-RFC to propose some design principles and gather ideas and input, then progressed through prototyping and an RFC with an initial language design, through to a multi-month effort to build the integration/”library code” that allows the (separate and orthogonal by design) DSL compiler’s output to be used with Cranelift, culminating in an initial PR that introduced all of this to the Cranelift codebase and migrated a number of instruction lowerings. Work since then continues to move over more lowering patterns.
This DSL, ISLE (Instruction Selection Lowering Expressions), is designed to solve a particular problem: our machine backends, while simpler than they had been in our old backend framework, were still starting to become verbose and a little difficult to manage because they used handwritten pattern-matching code. This code was repetitive and followed certain idioms that we had to be careful to get right. It was a clear candidate for automatic generation from a higher-level description. ISLE allows a concise declarative specification: a particular pattern of IR opcodes becomes a particular sequence of instructions in the output. The DSL compiler then does the tedious work of generating the code that looks for and rewrites these patterns.
While this effort has proved successful so far in allowing for more concise backend code and allowing us to reason more easily about lowering patterns, the more exciting benefits are still to come: now that we have a declarative, functional representation of our instruction lowering, we can process these patterns to formally verify them against IR and ISA semantics, we can change the DSL compiler to systematically alter backend code as needed (e.g., to adapt to regalloc2’s native API), we can systematically apply optimizations to the generated compiler code (e.g., if we think of a more efficient way of matching IR opcodes), we can include or exclude lowering-pattern rules in a modular way to scale compiler size and compilation effort up or down, and more. This project is a good example of designing a foundation to grant us more options and flexibility in the future.
New Backend: s390x
One of the benefits of existing as an open-source project is that many organizations and individuals can contribute a far wider range of features than a single organization would likely be able to undertake on their own. Support for generating code for mainframe computers is a good example: most regular contributors to Cranelift do not have such a machine or know enough to support the architecture. Yet we received a major contribution in 2021: a fully-functional s390x backend that supports the IBM z/Architecture (aka
s390x, the 64-bit ISA with a direct backwards-compatible lineage to 1960s mainframes). This backend has been maintained and runs in CI (via emulation) as a full peer of our other two much more pedestrian architectures, x86-64 and aarch64.
Last Year’s Roadmap, Evaluated
Now that we’ve described the major achievements of the past year, let’s examine our roadmap for 2021, posted at the end of 2020, with the benefit of hindsight to see how well we did. We set out to make progress along four major axes:
- Migrate to new backend framework
- Work to improve correctness (more fuzzing, generate our lowering code automatically, investigate verification)
- Improve compiler performance (benchmarking, codegen quality improvements, (also) generate our lowering code automatically, transition to
VCodeas a machine-independent IR)
- Investigate security mitigations (ISA extensions like pointer authentication, other compiler mitigations)
Remarkably, we actually managed to achieve a large portion of this list (to the credit of many dedicated folks!). To make it concrete:
- Migrate to new backend framework: Completely done!
- Work to improve correctness: We’ll call this mostly “done”, with the last (verification) ramping up. We have three new differential code-execution fuzzers now, and we have our DSL, ISLE, which takes the human error out of handwritten instruction-selection pattern-matching code.
- Improve compiler performance: Mostly done as well! We significantly improved the Sightglass benchmark suite, so that we could drive this with data. We wrote a new register allocator. Our new DSL, ISLE, will also allow us to optimize compilation performance further. We missed only the aspirational idea of
- Security focus and mitigations: Our CVE experience improved our processes, and spawned more work on integrating VeriWasm. We have an active RFC for pointer authentication. Let’s call this “ongoing”.
And Many, Many More
These are the major efforts that we undertook in 2021, but there are countless other contributions that we received as well — too many to name them all here. By our count, there were 690 commits to Cranelift this year so far1, from 33 separate contributors, as we celebrated above. We hope that we can continue this success; come join us on our Zulip if you’d like to hang out, ask questions, and get involved!
Counting commits in
git log --since="Jan 1 2021 00:00:00 UTC" cranelift/, with
918671316301306d653345cc3486f0a15de2aa50. Authors counted by taking name portion of
Name <email>to deduplicate differing email addresses. Note that this count doesn’t include work that happens in sub-repositories such as
regalloc2, either. ↩ ↩2