Hey! I should have written this post earlier, and if I had more time I'd dive into a more thorough explanation of some instructions. But I've worked a bunch on Bleep, my piano-roll music maker for the Game Boy.
This week I made several optimizations to the map viewer in Bleep so that everything can run at 60FPS on the original GB. It does a lot of updates in a single frame, and it got to the point I ran out of obvious stuff to clean up. But I managed to do it somehow over the course of a few nights.
A lot of it was unrolling some loops at the expense of code size. The cost of incrementing values and branching to do loops can be expensive, and when the range of the data being iterated is known and reasonably small, it can be worth it to unroll.
I also switched various generalized bitshifts to more specific ones that took less cycles. The
swap instruction on the GB is a godsend in some spots, which swaps the high and low nybble of a byte. Often if you have to do large bit shifts, it's faster to swap for part of it and then mask off unwanted bits. This especially pays off when doing shifts over 16-bit values.
Aside from that, there was some code that was just accessing memory in suboptimal patterns. The Game Boy's CPU doesn't have nice instructions for indexing memory, and reading absolute locations in memory is slower than using indirected register pairs. Accessing data sequentially with the ld [hl+], a / ld [hl-], a / ld a, [hl+] / ld a, [hl-] instructions is useful. Other times, it might be needed to align data to 256-byte boundaries, so only the low register of the address needs to be changed. There's also the high RAM in the GB, from 0xFF80 .. 0xFFFE, which can save you a couple cycles compared to other memory areas, so it's useful for temporary variables that don't fit in registers and would be wasteful to shuffle around in the stack. I moved stuff around to save cycles while still not compromising too much on the readability.
Those are probably the biggest ones.
As a result of all this work, editing is way smoother, and scrolling around the pianoroll on diagonals will correctly. The CPU usage for the frame when scrolling diagonally (which is one row of tiles, one column of tiles, a status bar to update, some sprites that are redrawn every frame) amounts to about 99% usage on the original GB, but it works! If this were for a game instead of music software, it could get away with a lot less frequent screen updates than this, and it'd have less things to update.
If I had more time I'd love to do a more thorough explanation of some stuff, but I'll probably save that for a follow-up post some that explains some different optimization tricks that Bleep employs to keep things running quickly.
It's worth noting that all of these optimizations were only necessary for an original GB, the Game Boy Color has a double speed mode, which is more than enough time to process everything. Also, the Game Boy version was still usable, but certain tasks would take 2 frames instead of 1 to complete, and it felt a bit choppy. Not anymore!
And now that the optimizations were out of the way, I got back into implementing the improved music engine of Bleep.
The music engine has been undergoing some major changes. Each instrument can have custom tables that control the pitch and waveform of the sound during play. These are sorta similar to FamiTracker's method of specifying instruments, where you can specify a small table of values, with possible loop points, that affect different aspects of the sound. In Bleep, there are three pitch effects you apply to your instruments: pitch slides (offsets in frequency counter units), arpeggios (offsets in semitones from a root note) and absolute notes (absolute piano key numberings of notes).There's also two secondary table effects that can be done: waveform switching and panning.
Here's some samples I recorded of some basic instrument effect usage:
I'm still working out some final pieces of the instrument engine and fixing some bugs, but it's getting closer to being finished! Once it's done I just need to build menus on top, and there's a fully fledged instrument system with 64 unique instruments, 64 tables of up to 16 values, and 64 custom 16-sample 4-bit waveforms.