We need to go deeper to solve tool roadblocks



Cover illustration "Yak Shaving" from Pepper&Carrot by David Revoy, licensed CC BY 4.0

At times, beware (maintainer of the Game Boy emulator bgb) has asked me what is blocking the development of Mindy's Hike. I've been stuck since June 6, 2023, by missing functionality in my tools. I need these tools to write maintainable code for an 8-bit processor. Over a week in the middle of August, I summoned my courage to push through and ended up uncovering a chain of dependencies that I needed to attack one at a time. I built a tool to allocate local variables, and to get that working, I needed to get the assembler to accept my local variables, and to get that working, I needed to improve the assembler's regression testing in a free software environment, and to get that working, I needed to work around deficiencies in macOS and react to a dependency update. And once I got all that done, I needed a nap.

Local variables in assembly language

The CPU core of the Game Boy system on chip is a Sharp SM83, which closely resembles an Intel 8080 and (to a lesser extent) a Zilog Z80. Like the 8080, the SM83 has seven main registers: A, B, C, D, E, H, and L. Six of these can be used as three pairs to hold pointers, called BC, DE, and HL. (These correspond to CX, DX, and BX of the 8086.) When planning how to write a function that scrolls to one area of the map, I realized that I needed more local variables than would comfortably fit in the CPU's registers. This meant that I would have to spill some variables to memory.

The size of working memory connected to the CPU is 8192 bytes. This is not 8192 MB or even 8192 kB, but 8192 bytes. Vintage consoles get away with using so little RAM by keeping the game program in ROM and executing it there. This means in general that I cannot afford to waste much RAM compared to a game for a modern platform.

Many familiar instruction sets, such as 68000, 65816, x86, and ARM, make it convenient to read and write memory at an offset from the stack pointer. This makes it easy for compilers to allocate space on the stack for a function's local variables. Unlike these architectures, SM83 has no offset addressing at all. Instead, memory is accessed through pointers, usually in HL or sometimes in BC or DE. There is a shortcut instruction ld hl, sp+offset which points HL at an offset into the stack. Using this instruction often is slow, it requires HL to have already been spilled, and it clobbers the status flags (zero and carry) which programs use for branching and multi-byte addition.

Another common approach to allocate local variables for non-recursive function calls is common in the homebrew scene for the NES, which uses the 6502 processor. It uses a pool of global variables and temporarily treats those as a particular function's local variables while that function is running. The programmer then has to take care to prevent each callee (a function that another function calls) from overwriting its callers' local variables. If a callee uses more local variables than the programmer first anticipates, the programmer must update all its callers to avoid variables that the callee has started to write. This creates extra work and risk for mistakes.

On Saturday, August 12, I set out to correct this by writing a script in Python to help me update this allocation. It reads the entire program's source code, constructs a call graph, counts how much space each function uses for local variables, and uses that to set the start address of each of its callers' local variables. It's analogous to the "compiled stack" feature of compilers that target Microchip's PIC microcontroller. I first tested it on two of my smaller Game Boy projects, including 144p Test Suite and my port of Martin Korth's Magic Floor. By Wednesday, August 16, I had got as far as printing the entire call graph from main on down.

RGBDS pull request challenges

With call graph in hand, I began writing a spec for what the resulting allocation would look like in an assembly language, only to hit a snag. I ran my sample output through RGBASM, the assembler in the widely used RGBDS toolchain for developing software targeting Game Boy systems. I discovered that RGBASM did not allow creating local symbols in RAM for a function whose code is in ROM. Local symbols are how RGBASM tells apart two functions that have labels inside with the same name. At the time, only a block of code within a function could be given a local label, such as loop or decision points, and I wanted to extend this to variables.

This led to several days of proverbial yak shaving to add this functionality to the next version. First, I made and submitted a pull request to get cross-section local symbols working (RGBDS pull request #1159). This largely consisted of removing an existing check from RGBASM that a local symbol could be defined only within the active scope of its parent global symbol.

While testing the first pull request, I was having trouble running the regression test for some parts of RGBDS on my personal machine, as it was trying to download and build non-free software. I don't want to include proprietary software in a tool's build process because requiring new contributors to download it hinders bringing them on board. So I submitted a second pull request to make local regression testing on non-free codebases optional (RGBDS pull request #1161). I accepted that until this was merged, some tests would fail locally because the test script could not download the proprietary software.

At one point, my changes to the test scripts caused the macOS jobs to fail in continuous integration (CI). The scripts performed very basic parsing of command lines, not expecting to get more than one option per run. I started with a scripting tool from util-linux called getopt, which understands command-line options longer than one letter. However, macOS is missing quite a few quality-of-life improvements to the shell scripting environment. I had to add (somewhat obnoxious) tracing to the test scripts so that more experienced contributors could help me figure out what was breaking. Once we narrowed down lack of long options in Apple getopt as the culprit, I ended up parsing the command line in pure Bash and pulling out the tracing.

As a long-time user and first-time contributor to RGBDS, I was not quite prepared for the volume of stylistic nitpicks and bikeshed arguments that I would encounter after submitting my first two pull requests. I had trouble understanding the rationale for several rounds of changes suggested by reviewers, especially when it relied on a lot of unwritten tribal knowledge about what is and isn't an appropriate build-time behavior switch, variable name, or code comment. In one case, it appeared that a comment convention had changed since the previous update to the regression test scripts.

And then I ran out of time, in a way. RGBGFX, an image conversion tool included with RGBDS, relies on the data compression library zlib to read images in PNG format. The regression test workflow was downloading zlib 1.2.13 as part of a job to build and run the Windows executable of RGBGFX. While I was working on getting my own pull requests in shape, all tests suddenly started failing. Another contributor tracked this down to zlib 1.3, released on Friday, August 18. When releasing zlib 1.3, its maintainer had removed the archive of the previous version from the zlib website. This change to zlib caused the step of RGBDS's Windows build that downloads zlib 1.2.13 to fail. The failure affected not only my own pull requests but all other pull requests on RGBDS.

After everything else became polished enough to publish, I filed an issue about the zlib 1.3 upgrade (RGBDS issue #1163). A maintainer gave me the go-ahead to slipstream a draft fix into my pull request pertaining to local symbols.

Going forward

Now that scoping for local variables is in RGBASM, I can add local variable allocation to the call graph analysis program and then test this local variable map against one other project. Once it appears to work, I will be in a stronger position to make the first progress on Mindy's Hike since Games Made Quick.

  1. When changing scrolling direction, spread the load of moving the compressed map's decode pointer to the other side of the map across multiple game ticks.
  2. Add an autotiler, or a tool to decorate the left and right sides of a platform.
  3. Add colors for terrain when viewed on a Game Boy Color system.
  4. Move Mindy as the platformer character that she is meant to be.

Further reading

Get Mindy's Hike

Leave a comment

Log in with itch.io to leave a comment.