Toward Building Self-Hosted Backends for Zig: Plans and Perspectives

I'm interested in adding certain "exotic" self-hosted backends to the Zig compiler, ones you don't typically run an OS on -- and I think I've settled on a process for developing and validating them.

Zig has been developing self-hosted backends for some time. Soon, though unfortunately not in time for 0.14.0, they'll enable x86 backend as the default for debug builds. Besides that, there are others in the works:

I'm particularly interested in using Zig for eBPF and embedded systems. eBPF is what I do for my day job, and my educational background is in electrical engineering, and I've always liked mixing programming and electronics. I'm one of the founders of the Zig Embedded group, I've run a couple embedded Zig workshops as well as coordinated the electronic badges for SYCL 2024. I'm pretty invested at this point.

LLVM Dependence

Being able to target so many architectures is exactly what I want from my language toolchain. LLVM makes that mostly possible for a lot of projects, but significant regressions do take place. For example, we weren't able to target AVR (think Arduino) for a couple LLVM releases due to some unsupported instructions that Zig was using. Then, when LLVM 18 was released it suddenly worked again. I'll be honest: I didn't feel like spending my time debugging LLVM to discover the root cause. I expect that Zig should be able to output LLVM IR, and that LLVM should transform that IR into instructions for the target machine. The fact that we were getting "instruction not supported" tells me that the AVR backend wasn't as robust as it could be. This matches anecdotes that I've heard that the existence of a backend for LLVM doesn't mean that it will work as correctly as you'd assume given the project's reputation and funding.

When you can't rely a dependency, it's time to do things yourself. The Zig compiler team understands this, it's a lot of work but:

All our bugs are belong to us.

And while we're not perfect, we write bad code, having the control and ability to try to make things better is exciting to me.

The Process

All the architectures/MCUs I'm interested in making backends for -- eBPF, AVR, Cortex M, MSP430, ESP, riscv (32-bit), MIPS, etc -- are difficult to validate. We can't just run the compiler as a process on one of those machines, and execute the generated program. The machines are too small! We have to cross-compile our program, load it into hardware, run it and somehow prove that it ran correctly. In addition, these target machines don't have the storage to fit all the test programs, you'd have to break them up.

To solve these problems, without adding microcontroller hardware, we're going to need emulators. This allows us to build up the foundation for a robust backend:

  1. Build an emulator
  2. Write behavioral tests using the emulator
  3. Write the backend
  4. Use those behavioral tests on new backend

1. Build an emulator

The emulator's job is to execute instructions to mutate state of the processor and memory. We only need unit tests to verify that it does this correctly. Since we'll rely on the emulator later to verify the code generated by the compiler, it needs to be thorough.

By using an emulator based approach it makes it significantly easier for others to contribute because there's no bespoke hardware requirement. An emulator can also be built to improve debuggability, even do things such as step backwards in time.

2. Write behavioral tests using the emulator

Now that we have a machine that correctly emulates a given architecture, we can use it in tests to execute generated code, and inspect state

Even if we were to stop after this stage, there would be value from this work because the tests would be against code generated by LLVM. The more validation we have here, the more LLVM regressions can be caught and reported. Whether LLVM wants to fix those bug reports before a release is a different problem.

3. Write the backend

Luckily, from building the emulator, the busy work of interpreting machine code meaning should mostly be done. Using that, we should be able to focus on the task of integrating with the Zig compiler. This part is the most uncertain for myself as I haven't done any heavy development on the compiler yet.

4. Use behavioral tests on new backend

This should be the quickest and easiest of the steps, ideally it's used in conjunction with building out the backend. I'll also find it motivating seeing the number of passed tests go up over time.

Next Steps

I'm starting with eBPF as the first victim of this process. Its bytecode and VM are relatively simple. Compare this with one of the embedded targets and you'll find optional sets of instructions, and additional simulation details I'd be interested in such as cycle counts, pipelines, caches. The latter details aren't necessary for validating the correctness of a backend, but I'm trying to avoid temptation as they sound cool and fun.