giggs' blog

Going lower

Games are an area of programming where performance is very critical.
Being keen to learn about game engines, learning about how the CPU works and assembly was mandatory. Join me as we descend into the lower levels of programming.

Great timing

I’d learned bits and pieces here and there, mostly from talking with my friend polomi, but I needed something more formal.
Luckily for me, Casey Muratori had just started his Performance-Aware Programming course. (Otherwise I would have started with Computer Systems: A Programmer’s Perspective, which I still plan on reading).

The pitch of this course is that most modern-day programmers have forgotten about performance, relying on hardware improvements instead. Casey aims to teach how the CPU works. He says we might make things slow without even realizing it, simply because we don’t know how the code we write does what it does.

And I used to be a prime example of that! I first learned how to program using Python, where these details are completely abstracted away. Back then, I thought the only thing that mattered was algorithmic complexity.

Casey posits that if we understand what our code is supposed to be doing, we can recognize when a program seems abnormally slow and attempt to fix it.
Thus, after a convincing demonstration where he took a simple Python code that sums the values in an array and made it 2000 times faster (in Python, albeit using the Cython library), he set out to teach us about how things work closer to the metal.

To ease us in, we first learned about the architecture for the Intel 8086 processor, a processor about 12 years older than me. While much simpler than today’s processors, many principles still hold and the core of the assembly language hasn’t changed much.

8086

In my experience, the easiest way to understand how something works is to build one yourself.
― Casey Muratori

Every piece of code that runs is at some point translated into assembly. What kind of assembly depends on your processor, but a good rule of thumb would be that computers use x86-64 assembly while mobile devices use ARM.

How does the CPU even read a program? We see nice, readable text. But the CPU only “sees” 1s and 0s.

Performance-aware programmer thinking: "They're the same pictures"

Understanding this was our first task. Armed with the 8086 Family User’s Manual, we needed to decode what you see on the right side of the image above, to produce the program on the left side of it. Want to figure it out yourself? Start at page 160, right after Table 4-6.

True to his words, Casey’s approach is very practical. He got us to decode only what was necessary to get a good understanding of how it works, and left the full decoder as a challenge homework which he explicitly recommended not doing. There are quite a few instructions, and I gladly followed our teacher’s advice.

Then, we were tasked with writing a simulator. You might wonder, what does that assembly program do?
Like all programs, it writes some values somewhere in memory. You may find this snarky and unhelpful (and I won’t blame you), but it’s an important point to understand. There’s no magic in programming. Everything you do, at the core, is this.
Putting a more positive spin on it:

The important part is just to appreciate how a few simple instructions on a CPU can really do an amazing variety of things! You don’t need that many. Most of the instructions in large CPU instruction sets are there for efficiency reasons, to try to improve performance — not because they’re required for program completeness. Just a few basic instructions is often all you need to build very complex programs!
― Casey Muratori

Seriously though, what does it do?
Well, if read correctly, the data in memory represent this image!

Notice the little coloured squares at the top left? This is what the program looks like if you open it as an image! I loved it when Casey showed that detail.

On a more philosophical note, it made me think about how hard it can be to properly understand data and the importance of context.

Anyway, my decoder, though functional, was a mess and needn’t scar your eyes. My simulator should be fine though. It’s incomplete and uses Casey’s (complete) reference decoder. As with the decoder, it was enough to grasp how the CPU does its job.
Perhaps more importantly, it introduced the concept that memory operations have a higher cost. Back then, a significant part of it was due to the cost of computing the effective address. Things are different now, as the CPU-memory performance gap has dramatically increased. We’re faced with different challenges, and memory management is a primary concern for performance.

Assembly Bomb

A few years ago while treading the same path, polomi did an exercise called Bomblab (listed here). Students download a binary “bomb” that can be disarmed by providing it 6 strings. It’s up to us to disassemble the bomb, read the assembly code and figure out what strings won’t make the bomb explode.

At this point, polomi figured I knew enough to give it a go. I was skeptical, but empirical evidence indicated I should trust him.

It required using Linux, and unfortunately the best setup I could get involved using GDB, Linux’s standard debugger. Coming from VisualStudio(Code), there were many niceties I missed.
Set up took a couple of hours and involved trying and discarding several other options.
Though GDB presented significant resistance, I eventually managed to find a setup that worked for me.

Other than reading the assembly code, it was often necessary to have a look at the actual values in memory and see what those were about. It looks like this, and I swear I knew what it meant at the time.

After about a day of work fun, I got through all stages and proudly posted this screenshot to our Discord.

The exercise was extremely fun and the tasks varied. I won’t spoil what they were. If you’re interested, do yourself a favour and try it out!

Wrap up

Given that I was able to defuse that bomb after writing the 8086 simulator, I’d say Casey’s approach to teaching was a success! If that sounds like something you’d be interested in, I encourage you to give it a go.

It was a great introduction towards getting a thorough understanding of how the CPU and memory work, and how to write performant code. I’ve made more progress on that front since then and there’s still a lot I need to learn of course, but it’s exciting!

#programming #assembly