Beating C++ with Zig

Porting a C++ implementation over to Zig

Published on 2022-01-17

I have started doing more tinkering with Zig, and thought of a case where I might want to try experimenting with Zig; rewriting a Mandelbrot Set program. I have a lot of different projects all related to Mandelbrot computation, so it's something I know nearly off the top of my head without having to look up the basic equation (it's f(z, c) = z*z + c). I understand a lot of it, and I love making fractals from it.

That being said, years ago I had wrote a project which I wanted to make faster by making a C++ version, which I did, named mandelpp. It's a very bare-bones implementation which plain and simple renders a fractal, given a series of information surrounding it. It works, it uses some bare-minimum C++ features, and I had planned on integrating it into one of my larger projects, but never did.

Now I open it up today, and lament at trying to modernize it. GNU Make is simple, but doesn't cover enough bases. GNU Autoconf and Automake is crazy confusing and I couldn't get it to work plainly. CMake is something I thought about, but I'm not crazy familiar on it. Meson and Ninja are still things I thought about but never came up with a solution in my spare time.

Instead, I thought this might be a cool exercise to do in Zig. Where C++ is limited to the double type and the only way to go further is rolling your own types, including new libraries, or maybe using GNU Multi-Precision, Zig has a floating-point type of 128 bits, which is twice as big as the C++ double, so this was a bit appealing to me. If Zig could create code that handled 128-bit floating points, that would be pretty cool.

A Complex Struct

Zig has no notion of classes like C++, but it does have the idea of compile-time macros. The macros in Zig are very powerful, allowing you to do many cool things like create new type definitons at compile-time with a known type. The difference between this:

const Point = struct{
  x: f64,
  y: f64,
};

And this:

pub fn Point(comptime T: type) type {
  return struct {
    x: T,
    y: T,
  };
}

Are huge. One is a fixed-type struct definition where it only holds the f64 type, while the other is what is effectively a compile-time macro-like function that creates a new struct given a type T, and when it's invoked in code will produce the new code definition substituting the given type. You can produce many different size structs with this to meet your needs.

So for us to define a Complex number, which has two components real and imaginary to express the real and imaginary components, we can write a compile-time function to allow for type flexibility.

fn Complex(comptime T: type) type {
  return struct {
    real: T,
    imag: T,
  }
}

Struct Methods

Structs can have methods attached to them, similar to Rust or Golang. We can do this inside the compile-time function as well. We cannot do any type of operator-overload like C++, but we can create basic methods that mimic the functionality of such. The reasons why operator overloading does not exist is that resource management gets tricky when operators get overloaded and create unneeded complexities, which could lead to memory errors all over the place.

So to begin writing some math functions we attach some function signatures. I won't include the low-level math behind imaginary numbers, but we can see how this works. Also to be able to write macro-friendly code, we need to be able to refer to our struct, which we can't do in plain code. We need a compiler macro to generate the name for us, so we will use @This() to create the binding.

fn Complex(comptime T: type) type {
  return struct {
    real: T,
    imag: T,
    const Self = @This();

    fn init(r: T, i: T) Self {
      return Self{ .real = r, .imag = i };
    }
    fn add(self: Self, other: Self) Self { ... }
    fn sub(self: Self, other: Self) Self { ... }
    fn mul(self: Self, other: Self) Self { ... }
    fn div(self: Self, other: Self) Self { ... }
    fn length(self: Self) T { ... }
  };
}

So we create each binary math operation using some (T, T) -> T operations. However, this might not always be ideal. These functions return new structs, which might not be very efficient based on your application. Then there's the notion of pass-by-value or pass-by-pointer. For simple programs like the one we are writing, pass-by-value is completely okay, because our underlying struct only contains two variables, so it's very trivial data.

However, with larger structures, and sometimes with things like file locks or allocated memory, you are best off using pass-by-pointer because you do not wish to copy whole values over in function calls. But should we still copy a struct over or return a new struct entirely? That would involve copying information over, which we do not want to do. Instead it might be more reasonable to mutate in-place, so I'll make functions that take pointers to values instead.

fn Complex(comptime T: type) type {
  return struct {
    real: T,
    imag: T,
    const Self = @This();

    fn init(r: T, i: T) Self {
      return Self{ .real = r, .imag = i };
    }
    fn add(self: *Self, other: *Self) void { ... }
    fn sub(self: *Self, other: *Self) void { ... }
    fn mul(self: *Self, other: *Self) void { ... }
    fn div(self: *Self, other: *Self) void { ... }
    fn length(self: *Self) T { ... }
  };
}

Now instead of operating on raw values, we operate only with pointers, and instead of returning values, we simply return void, meaning we want to mutate our data in-place. This will avoid having to do so many raw copies of structs between addresses.

The Mandelbrot Recursion

How a Mandelbrot Set is calculated makes it beautiful when visualized in a 2D format. The idea behind a Mandelbrot set is no different than that of visualizing any other function of two variables; you plot an X and a Y axis, and you go through every point and calculate the function f with f(x, y).

For complex numbers we don't often say X or Y axes, but we talk about real and imaginary components. So instead of X we have the real axis, and instead of Y it's the imaginary axis, and instead of f, it's Z, which is often used when making complex functions.

We must now explore what the Mandelbrot Set equation is.

Zn+1 = Zn^2 + C   sqrt(|Zn+1|) < 2

There are two variables here in this equation: Z and C. Also note that when we notate n and n+1, we can see that this is a recursive relationship, the value of this relies on the value before it.

Z and C are two different complex numbers here, so there are a total of four values in play, each being their respective real and imaginary values. Remember when I said we are doing a two-dimensional plot of a two-variable function? Well, why are there four here?

One of these values here, will actually always start at zero. What we're changing when we do our plot is changing the values of C. So when Z is set to zero, it gets the initial C value added to it, and it recurses until the new Zn+1 value's magnitude exceeds a value of 4. Why 4? The initial equation compares the square root of the magnitude of Z against the number 2, but if we multiply both sides by the inverse of the square root function, aka the the x**2 function, we get |Zn+1| < 4, removing a nasty square root from the equation.

So while we have two different variables here, only the C is ever changing. So let's write our function that will recurse for us up until a maximum.

pub fn get_numcycles(Z: *Cbig, C: *Cbig) u8 {
    var i: u8 = 0;
    while (i < 255 and (Z.length() < 4.0)) : (i += 1) {
        Z.muls(Z);
        Z.adds(C);
    }
    return i;
}

Meet the get_numcycles function, where we count how long it takes for our initial C value to escape the |Z|<4 boundary, or diverge, however you want to call it. When drawing fractals, we care about how many steps it took for it it to escape. This number is hard-capped at 255 right now, or enough to fit into a u8 eight-bit value. Which conveniently also describes a RGB color format, interesting right?

So the inner while loop simply iterates on two conditions, whether i is under our iteration limit, and whether or not our Z value exceeds it's boundary. The other part of that while line is the simple i increment, which we can put into it's own little block as part of the while syntax. Then we simply do Z.mul(Z) and Z.add(C), until we are done. Pretty easy to remember, the hard part is crafting the C variable itself.

Rendering the Plot

Next step, we need to actually plot. For every pixel in our target image, we need to convert it to a valid C variable for us to render it, then we draw that pixel with a color. For simplicity sake, we will not be using any kind of fancy image saving formats, and will use the Netpbm format for images, which involves raw values and no compression. It's much easier to use something like ImageMagick to convert images to your desired format afterwards (which I have done for this blog post).

Let's define a basic loop implementation here.

var x : usize = 0;
var y : usize = 0;
const width : usize = 640;
const height: usize = 480;

var Z = Complex.init(0, 0);
var C = Complex.init(0, 0);

var i : u8 = 0;

while (y < height) : (y += 1) {
  while (x < width) : (x += 1) {
    Z.real = 0;
	Z.imag = 0;
	C.real = @intToFloat(f64, x);
	C.imag = @intToFloat(f64, y);
	i = get_numcycles(&Z, &C);
	// draw?
  }
  x = 0;
}

This is a naive implementation with no real draw function yet, and it will render across the entire spectrum between (0, 0) and (640, 480).

But... The Mandelbrot set is not really... visible in this range. It's really only good for about (-2,-2) to (2, 2). So we could render a 4x4 pixel image, but that isn't exactly useful...

We need to stretch the domain in which the Mandelbrot set is actually visible in by scaling up our drawing pixel coordinates a bit. A 640x480 is too large, and a 4x4 image is too small, so we must scale large coordinates down to smaller ones. There's two approaches we can take here, but I am going to stick with one I believe is better for performance, which I'll call complex incrementing.

In complex incrementing, the idea is we calculate the exact offset we need to apply to C each step of the way before we begin the main loop. That way, instead of setting C.real = x, we can instead do C.add(&re_increment) instead, where re_increment is the tiny bump needed to move C just a tiny bit. We do the same using im_increment, which does the imaginary component.

(The alternative would be linear interpolating our (x, y) co-ordinate values each loop cycle and setting C to be "lerped" values. This is a more demanding approach because of the added stress of needing to do two lerp calculations each frame, which involves a lot of floating point multiplication.)

So first, we must define some boundaries for which we can create a Mandelbrot Set fractal. Let us pick (-1.67, -1.12) and (0.8, 1.12), as this is the boundary which gives me the best results.

// our domain and range of Real/Imaginary
const r_min: f64 = -1.67;
const r_max: f64 = 0.8;
const i_min: f64 = -1.12;
const i_max: f64 = 1.12;

// our starting values
const Z = Complex.init(0, 0);
const C = Complex.init(r_min, i_min);

Now we can create an increment to bump our variables for every loop. If we are doing NxM number of pixels over an area, we must divide the total space of our fractal by however many pixels we wish to iterate. That is to say we must do |min|+|max| / num_pixels for both real and imaginary. We'll need to import std.math for the fabs function. We can also wrap both the increment values into a Complex type so we don't have to do the addition part ourselves. But we do also need to convert our width and heights to friendly numbers for division, so I will cast them to floats with @intToFloat.

const width_f: f64 = @intToFloat(f64, width);
const height_f: f64 = @intToFloat(f64, height);
var r_increment = Complex.init((math.fabs(r_min)+math.fabs(r_max))/width_f, 0); 
var r_increment = Complex.init(0, (math.fabs(i_min)+math.fabs(i_max))/height_f);

Now we get very tiny numbers we can bump C by every loop. Or very large numbers, depending on your domain. Who knows. We have to take the absolute value of both the min and max values, because if they were opposites of eachother and completely cancelled out, then it would result in zero. Taking the absolute value of both ensures we can have a correct calculation of space.

Now our iteration will look something like this (drawing is still left out right now).

// main loop for rendering
while (y < height) : (y += 1) {
    while (x < width) : (x += 1) {
        Z.real = 0;
        Z.imag = 0;
        i = get_numcycles(&Z, &C);
        C.add(&r_increment);
        // draw(x, y);
    }
    x = 0;
    // punch it back like a typewriter
    C.real = r_min;
    C.add(&i_increment);
}

File Writing

File writing is... Well, boring. Zig has a lot of shortcuts for handling errors, because you can return errors with Zig, which allow for added safety and compile-time gotchas. If you can't open a resource, the Zig program should be able to produce a panic, and by default provides error handling for those cases. You can add fallback strategies in place, or you can default to basic panics given by Zig. This approach is pretty similar to Rust's try! macro, where it tries to force the result of a error-inducing function.

For file writing, we technically don't have to write to a file for this type of program. It's possible to run the program and dump the standard output text to a file via shell redirection, like ./mandelbrot > my_image.ppm. But that's an extra step required for the user, so let's just open a file and give it a shot.

File opening in Zig is tricky, but the way I have had most success with is using a method called createFile. It is based off the std.fs.cwd() method to give you a Dir object relative to your environment, but createFile serves as a function to both create new files, and also open files for writing. Unlike openFile, which only opens files, createFile handles the case where there is no files, removing an annoying check you would have to do.

So when writing the header of our image, we would do something like this.

// create a file and open up a writer stream
var fi = try fs.cwd().createFile("test.ppm", .{});
var writer = fi.writer();
defer fi.close();

// write the headers
try writer.print("P6\n", .{});
try writer.print("#Real <>, Imag <>\n", .{});
try writer.print("640 480\n", .{});
try writer.print("255\n", .{});

Using defer we can have the file be closed at the end of the scope. A defer statement helps us so we don't forget, as Zig does not include fancy features like RAII for automatic memory freeing. This Writer will help us also render our pixels during the iteration count loop part.

while (y < height) : (y += 1) {
  while (x < width) : (x += 1) {
    Z.real = 0;
    Z.imag = 0;
    i = iter(&Z, &C);
    C.adds(&r_increment);
    // draw the pixel
    writer.print("{c}{c}{c}", .{i, i, i});
  }
  x = 0;
  C.real = r_min;
  C.add(&i_increment);
}

The Netpbm format requires us to render out pixels in top-left to bottom-right order, so the way we iterate from 0 < y < height and 0 < x < width is the proper way of going about it. But at each step of the way, we print three numbers of 8-bits each representing an RGB tuple. In this sample program, the number of iteration cycles is how dark or how bright our pixel will be. It can generate the following picture now.

Speeding up Writing

Every cycle of iteration we do a writer.print call, which effectively slows down the hot loop by a large margin. The rendering phase has to stop each time to run a call to the file writer, then after writing, it gives back control to the loop to continue to the next cycle. This can be avoided if we were to somehow buffer our output data into some storage, then write the buffer to the file every so often instead.

Before we begin the rendering steps, we are going to set up a fixed-sized array buffer and use that for our rendering stage.

var bi: usize = 0;
const bufsize: usize = 900;
var buffer: [bufsize]u8 = undefined;

The amount of the buffer size can be anything you want, but I picked a nice odd number as we have to store three numbers at a time for each pixel. So be sure to pick a multiple of three that's a decently good size.

The next step is to write what's essentially a buffer store and a buffer flush. We store to the buffer each loop, but the minute our buffer index bi exceeds the buffer capacity, we flush out to the file via writer.print.

First let's store to the buffer in the inner drawing loop.

//... other while loop code {
i = get_numcycles(&Z, &C);

// store it in buffer
buffer[bi] = i;
buffer[bi + 1] = i;
buffer[bi + 2] = i;
bi += 3;

// check if we filled the buffer
if (bi >= bufsize) {
    try writer.print("{s}", .{buffer[0..bufsize]});
    bi = 0;
}
// end while loop }

That will store all values into a temporary buffer, then write to file once the buffer is full, and then pick up where it left off. We don't have to fill the array with zeroes or anything, as we can reset the buffer index to zero for all the data to be overridden in the next loops.

But uhoh, there's an issue. If our image isn't an exact multiple of the size of the buffer, values get left behind. This is where we will do a post-render flush just in case there's data left in the buffer. Using another Zig slicer (ie: buffer[0..bi]) we can tell Zig the exact size of the array we wish to write to the file.

// after the two while loops - do this
if (bi > 0) {
	try writer.print("{s}", .{buffer[0..bi]});
}

Now we have a properly buffered Mandelbrot set renderer, and this performance trick will save us on speed, at the cost of making the program space a little larger due to the buffer (it's negligible though, 900 bytes is not that much).

Analysis

I will label my results with the following: Zig (f64), Zig (f128), and C++. I do not want to say my C++ version is perfect, and I think it could have improvements in it, but here were my results from testing.

Zig (f64)  - Average Time: ~84ms
Zig (f128) - Average Time: ~3.2s
C++        - Average Time: ~210ms

All binaries built with release optimizations

At first sight, seeing anything below my C++ version was incredible! The 64-bit floating point is notably better by almost a 2x factor, is written very similar, and means I probably don't have to write C++ anytime soon.

The 128-bit floating point version is where I was scratching my head for a while, because I didn't understand why it was that much slower. I started with a 128-bit program, but then converted it to 64-bit once I saw it was that much slower than the C++ one. Upon defaulting to 64-bit, speeds improved. But why?

My only conclusion I can come to is that despite the type being available in the language, it does not mean processors are going to natively support a 128-bit floating point type. What most likely happens is Zig generates some code in the background to target our host architecture, then exports code that would allow us to use 128-bit types, but that doesn't mean they are actually part of the original architecture. There is no way a 128-bit floating point type is native to my CPU, so it most likely has to do a lot of instructions every time a multiplication is actually involved.

So by turning my program into an only-64-bit program, my runtime is much quicker than expected. But if I wanted the added safety of a lot of bits of precision, then I can turn on 128-bit, wait a little bit longer, and get more accurate results in the long run. But this wouldn't be possible inside of C++ at least, because there is no 128-bit type available. I would have to import a library somewhere. So I consider this a cool Zig feature that we can plug in and make trade-offs with easily.

Now, what the hell is wrong with my C++ version? I suspect it's because I don't use a proper buffer. Each cycle is written to the file stream as-is and I make zero attempts to buffer it in any way with a fixed-size array. I don't know if I will ever add this feature, but I suspect my C++ version could go low, if not a little bit lower, than my basic Zig implementation here.

Conclusion

I'm pretty happy with Zig so far. I need to flesh this program out some more before I finally can use it for some practical use-cases, but Zig has been great so far. The lack of resources on the topic is what hurts Zig the most right now, but the language is very usable to get started with. It can interact with C libraries, it can even compile C/C++ code. Zig is pretty darn cool, and I can't wait to see what else Zig does in the future.

Also check out this blooper image I rendered when developing the program. This was rendered incorrectly when I used a bad domain, and also because I had written the colors incorrectly. But it looks pretty cool for some reason, so I'm not ashamed of my mistake.

If you liked this post, you can check out my live Mandelbrot Set generator. It's a lot like this, but it does it slowly and one section at a time until a final image is reached. Thanks for reading!