- Speed up primary instruction dispatch by ~25%
- Speed up tile rendering (1bpp/2bpp) ~57%
- Speed up MUL2* calls by ~40% by unrolling loops
- Speed up DIV/DIVk by unrolling (unmeasured gains)
- Fix foreground tile/OAM cycling logic, which was failing to locate old sprites to hide due to coordinate mismatch, and using a really slow search approach