Activity

Copticfelo

Ok so what changed from last time? mostly ppu changes
first of all, i fixed the input bug that prevented most games from being playable by treating writes to the input register P1 as queries and internally saving the state of the input until it’s queried in which case P1 is updated with the state in the “Joypad” struct.
then I made 3 changes to the PPU, I started by fixing bugs here and there in the background layer by introducing a simpler drawing pipeline (instead of looping over tiles we loop over pixels and calculate what tile we are in and do modulo 8 to get the tile column), introduced correct LYC and H-Blank interrupts (too long to discuss here) and all these changes fixed a lot of things that were broken from last time.
then i turned my attention to drawing the window layer, took a while to figure out because i kept confusing tile indices with pixel indices but eventually figured it out, and the window layer now works perfectly (hopefully anyways).
anyways next i noticed how blurry the image from the emulator got from scaling and remembered a teckquickie video talking about scaling algorithms and how nearest neighbor was the best for pixel art so i changed the scaling algo that SDL used to nearest neighbor and now the picture is very sharp and clear.
and finally, refactored sprites to use that simpler drawing pipeline i mentioned which resulted in a bug being fixed (sprites at the very left edge were being clipped).
now i will go figure out how gameboy audio works then come back to the missing PPU features later.
bye :>

Attachment
Attachment
Attachment
0
Copticfelo

Hi, 2 people and 1 bot who read those devlogs
I figured out why most games from the previous devlog didn’t render correctly, it was because most games had a keycode to reset the console (when all buttons are pressed), and with all io registers initialised to 0 and 0 meaning that the button is pressed most games thought that all buttons are pressed and kept reseting the console, after i implemented controls (which for some reason games aren’t able to query yet because of a bug i haven’t found yet) most games started rendering graphics and the performance issue was actually just a bug in the halt instruction and after fixing that and doing timing fixes the background layer rendering was complete.
next i turned to the sprite rendering which wasn’t difficult at all took me like half a day minus something called drawing priorities (more about that on the next paragraph).
I am happy to declare that the PPU is mostly done, Background rendering is fully working Sprite rendering is mostly fully working apart from drawing priorities (what to do when 2 sprites overlap, what to do when you surpass the 10 sprite per scanline limit and do you draw over opaque background pixels or not?), the window layer (easily the most useless gameboy feature imo) and all gameboy color features (which i am or may not implement due to time constraints)
but tbh sprite drawing was one of the easiest things to do in the PPU so far
If you want a detailed changelog just look at the commit history from 0958ea7 to a18cbfa (this is a devlog not a changelog)
for now here are some games rendering on the emulator, keep in mind that TLOZ works only after i patch the ERAM bank count to be at least 2 which i still haven’t committed cuz i am not sure if it’s correct to just increase ERAM size by 2 despite the header saying there is no ERAM banks at all.
bye for now :>

Attachment
Attachment
Attachment
2

Comments

Anass Zakar
Anass Zakar 7 days ago

Hi Copticfelo

Copticfelo
Copticfelo 2 days ago

Hi Anass Zakar :>

Copticfelo

oh boy there is a lot to talk about,
I finally started actual work on the PPU (pain), at first i was just goofing around with sdl3 trying to learn the most efficient way to draw to it and i concluded that i am just gonna keep a framebuffer that i upload to an sdl3 texture every frame, that however presented a problem, rust lifetimes,
there is a rule in rust that says that a struct may not hold a reference to itself (or by extension one of its attributes) but the Texture type in sdl is dependant on its TextureCreator which means i can’t store the Texture and its TextureCreator in the PPU struct storing it in the parent struct (CPUContext) is also terrible because of rust explicit lifetimes, basically if i introduce an explicit lifetime specifier, i would have to edit every single function that takes a reference to CPUContext (pretty much the whole emulator), to avoid this disaster of a refactor, i compromised a a bit and tried to make the game loop outside of CPUContext.
i don’t know if that was a good idea tho, but regardless i just wanted to focus on the Gameboy PPU logic for now and we can come back for structural improvements later and so, ONE REFACTOR LATER and after bunch of brainstorming to figure out how do i draw the background layer correctly,i eventually got….something…..to render, i mean tetris just spits garbage to the screen, Pokemon red just does nothing, Links awakening just panics, but Dr. Mario and the blargg test roms render correctly enough that i would call that a win, still tho that’s only the background layer and my early testing shows that the performance does suck (1 frame per sometimes) but i suspect it’s because i still haven’t perfected the drawing logic yet and inadvertently pulling a bunch of garbage data as well to the framebuffer, so yeah i still have to improve this a lot but maybe tmrw or something i need to go study lol :>

Attachment
0
Copticfelo

quick update: i have managed to get all the blargg test ROMs to pass after some suffering
so i implemented the gameboy timer and div registers and added some much needed performance and timing improvements (and accidentally discovered that logs absolutely destroy performance)
but after all that i found that 4 out of the 11 test would just get stuck in a loop i spent like 4 hours trying to debug gameboy assembly until i decided to put one of the test roms on Imhex (a hex editor) i realised that it had an message at the beginning of the ROM saying “54 61 6B 65 73 20 31 30 20 73 65 63 6F 6E 64 73 2E 0A 0A” which for the non-robots among you who don’t speak hexadecimal ascii means “Takes 10 seconds.” :<
so yeah i wasted all that time because i didn’t wait long enough for the ROM to finish testing, in my defense all the other test just took 2-3 seconds and my debug logs were def slowing down the whole things down by a long shot
anyways, after fixing a small bug in the daa instruction (again it does magic wizardry called binary coded decimals which i don’t get at all), i ran the all-in-one test and it happily passed all the way to the end (after waiting for like 30 seconds lol)
NOW HOPEFULLY i can focus on the PPU (Pixel Processing unit btw) a bit

Attachment
0
Copticfelo

idk how these hours keep passing so quickly (probably because of aggressive println debugging as my debugger is very buggy) but here we are
I eventually figured out what the bugs i had were by stepping through the tetris rom and comparing it with my emulator side by side (well workspace by workspace but whtv) and it became pretty clear that my emulator had a major problem that caused all the weird bugs, and ofc it’s a jump instruction
well not exactly it’s actually in call and reti
they were always being all treated as conditional, which is stupid especially in the case of reti which is never conditional
anyways, it was very easy to fix and the blargg’s test rom started outputing useful info
i didn’t keep that much track but i think initially only 2 out of 10 tests passing but the helpful part is that it pointed out what opcodes were wrong and after a lot of work applying various misc fixes and cleanups and some addition overflow safegaurds (which i need to put everywhere but lazy) i got it to 8/10 tests passing with the last 2 needing additional functions which i am still figuring out
anyways i then implemented some timer i barely understand because i really was trying to rush working on the PPU at this point but it just made everything a mess so i stashed the PPU changes i was trying to make and will hopefully return to it after i figure out the timer properly, implement stop and halt and make some mem helpers to help with the process of figuring out how to draw tiles and sprites, i estimate we still have 30-ish hours before the PPU is fully functional because i would still need to learn how to draw with SDL
anyways here a picture of one of the test ROMs output, that last line is the output in ASCII
well, bye for now :>

Attachment
0
Copticfelo

Hi, A lot of things have happened in the last 10 hours, none exactly revolutionary but mostly things that would make it easier on me to debug and bug fixes
The first thing i started working on (aside from the bugfixes which i am not gonna mention here) is got interrupts working, sorta…. like interrupts as a concept exist now but there are things that should fire interrupts that def don’t rn, guess that will have to wait
i was then very eager to get started on the PPU but because of the prev clock impl i couldn’t just perform things every cycle (like drawing a scanline) so because of that i refactored the whole cpu_context struct and ONE REFACTOR LATER we now have a better codebase for cycle accuracy
i then tried to start working on the PPU but realized tetris doesn’t even get as far as loading the tiles into the VRAM because it gets stuck on a loop, so i decided to focus on other things rn and see if i made something wrong in the instructions
i implemented what i could understand from the serial port behavior and tried to run some test roms but then i realized that even the blargg’s test roms behave very weird (either that or i just don’t understand how to use them) so i started looking on something i have been delaying for a while now, a proper logging system (instead of println chaos)
and ONE REFACTOR LATER we have an better logging (which also takes a lot of terminal space for some reason)
did it help? ehh not yet still need to investigate
i mean it def made things clearer on which instructions i am dealing with here
well guess i gotta go do more debugging now
bye 👋

Attachment
0
Copticfelo

looks like i have missed the clock by 5 hours but oh well,
after writing the last devlog i realized a dire realization, at this pace i won’t be able to finish the emulator before the heat death of the universe so i decided to forgo unit tests in favor of maybe running some test ROMs later to speed up development
and would you look at that,
All instructions have been implemented (minus STOP and HALT)
now i would love to explain what instruction exactly i implemented but that would make us exceed the 2000 char limit in this website
but to summarise what i implemented:

  • push/pop -> pushes and pops to the stack (crazy i know)
  • (misc 8bit loads too long to list) -> mostly weird edge cases
  • add/inc/dec -> 16 bit arithmetic
  • call/ret/reti/rst -> function calls and returns (like jump but sets up the stack for returns)
  • di/ei -> interrupts disable/enable
  • rr/rl/rra/rrl/rrc/rlc/rrca/rlca -> byte rotations
  • srl/sla/sra -> byte shifts
  • swap -> swaps lower 4 bits with upper 4 bits
  • bit/res/set -> bit operations
  • scf/ccf -> set or compliment carry flag
  • cpl -> A = not A
  • daa -> magic wizardry (binary-coded decimals or something)

all that remains is the STOP and HALT instructions which i am ignoring for now because the infrastructure (interrupts) for them is not written yet + idk how important are “enter low power mode” instructions in a desktop app

next thing on the list is the serial port (to run test roms) and the exciting part…the PPU, i don’t have a lot of graphics experience but hopefully it’s not that difficult :>

speaking of the PPU, I learned the loop we were getting stuck on last time is called a “V-Blank wait loop” not sure exactly what that means yet but i quickly patched it in a commit and suddenly we got out of the loop (although i need to handle memory banking to get any further in the ROM)
Well, until then, goodbye 🫡

Attachment
0
Copticfelo

well guess the clock hit 5 hours and its time for a 3rd devlog
(don’t ask about the crappy logo i am not a artist lol)
firstly, something i’ve been procrastinating for a while, implementing custom error types, why? idk everyone says it’s more “idiomatic rust” or whatever + doing that instead of returning strings everywhere seems more reasonable
anyways, more opcodes…….or rather opcode
yes indeed all this time was spent on the custom error types and trying to figure out this annoying instruction :<
LD HL SP+e8 -> Copies the value in SP (the stack pointer) PLUS the next signed (+/-) byte in the ROM file (i.e e8) to the HL register

it might seem simple enough but the problem lies with calculating the flags (carry and half-carry) for an addition between an unsigned 16-bit (SP) and a signed 8 bit (e8)
i had no clue what type of addition was that, signed? unsigned? 8 bit? 16 bit?
it took me a while to figure out but in the end i used the mgba emulator as a reference, and based my implementation on its implementation which was annoying since unlike C rust can’t do addition on numbers of different types (you need to cast them)
hopefully i did it correctly :>
anyways i had made a small fix in the jr (jump relative) instruction and suddenly we get past that instruction we were stuck on in the tetris ROM, don’t get your hopes up tho it just gets stuck in some sort of infinite loop and spikes up the CPU
anyways hopefully hopefully things will be much faster from now on since we got over the custom error types the next step would be probably the PUSH and POP instructions then the more 8 bit loads, jumps, bitwise operations, interrupts and then hopefully we can finally focus on something more fun (Graphics rendering using SDL :>)
Until then, Goodbye 🫡

Attachment
0
Copticfelo

Hey, I have decided to make these devlogs every 5-6 hours of work so i don’t lose my sanity, hopefully that’s ok? anyways……………right, the devlog
First, i implemented this 8 bit load instruction i thought was alr implemented
ld r8, n8 -> literally just copies the next byte in the ROM file to the register r8
pretty boring but……
Big news: We’re now getting into 16 bit instructions!!
there are not many of them in the GB but they proved to be a bit more complicated

I started by just trying to get a mental model of how 16 bit operands work in the GB at first I was just trying to do a one to one recreation of my “revolutionary” R8 enum but that proved to be very stupid because the r16 operand was wildly different from the r8 operand
basically
there are 3 types of the r16 operand:

  • r16 -> is either BC, DE, HL or SP
  • r16mem -> is either BC, DE, HL+ or HL- (+- just means you increment or decrement afterwords)
  • r16stk -> is either BC, DE, HL or AF (i have no clue what this does as of now, but is used in the pop and push instructions)
    i tried a lot of different things to implement this behavior nicely and cleanly.

I eventually figured it out after taking a break, I made the R16 enum for handling r16 operands, hopefully the cleanest piece of code i have ever written (maybe).
I implemented 3 instructions (12 opcodes overall) using it:

  • ld r16 n16 -> copies the next 16 bits in the ROM file (little endian) to r16
  • ld [r16mem] a -> copies the value in the A register to the memory address in r16mem
  • ld a [r16mem] -> same as above but in reverse
    and ofc wrote some tests for them.

Now when running a tetris ROM file only 6 instructions from that beginning section are unsupported, ofc it still crashes, most likely trying to read operands as instructions due to LD [a16], SP not being supported

Next step is probably implementing the rest of the 16 bit load instructions as well as the push and pop instructions.
Until then, Goodbye 🫡

Attachment
0
Copticfelo

Hey,
This is a project i have been working on on and off for past several months, so far i have accumulated 10 hours (actually more but hackatime is being annoying :>) of work, to sum up these 10+ hours,
I have written the infrastructure needed to make a simple Sharp SM83 (the cpu used on the original gameboy) CPU emulator the bottleneck now is implementing all the instructions of the 500 or so instructions the GB supports i have implemented approximately 156 instructions, most of these instructions are really one instruction with different operands.

let’s take for example the LD instruction, it just loads the value in the second CPU register into the first so it takes 2 operands like this LD [destination] [source], each gameboy instruction is one byte long (there are a ton of exceptions to this but believe me for now the vast majority are 1 byte long).

So how do you fit 2 operands into one byte you ask? answer: binary.
each byte consists of 8 bits each, either a 0 or 1
the LD instruction takes operands in its byte directly like this:
01 010 100
{—} {—}
01 -> are constant
middle 3-bits -> the destination register
last 3-bits -> the source register
the GB has 7 8-bit registers named from A to L with some “compound” 16-bit registers like HL
a 3 bit operand in decimal is a number from 0 to 7 and they are mapped like this:
0 1 2 3 4 5 6 7
b c d e h l [hl] a
[hl] means the “byte pointed to in memory by the address in the compound register HL”
for example: “LD H, A” = 01 100 111
^ ^
h a
so while 156 instructions/opcodes may look impressive in reality only like 13 different operations out of probably more than 50 (i didn’t count them that’s Gemeni’s guesswork).

for now i will just continue implementing more instructions especially after that big code refactor i was doing for the last couple of days is done.
Until then, Goodbye 🫡

Attachment
1

Comments

Copticfelo
Copticfelo about 2 months ago

sorry if this is a bit hard to read. apparently, this website doesn’t seem to display indentations very well