Activity

Copticfelo

Hi, so picking up from last time we said we are going to just do “some minor ppu fixes”, right? wrong.
as i was refactoring the PPU and working on the STAT interrupts i noticed that the current way of doing things is really a mess, so far the PPU worked by drawing a whole scanline in one shot this unfortunately breaks a lot of games (unless you do some ugly mode 3 length calculation hacks which would still leave you with some broken games).
so i made what is probably gonna be a mistake, i decided to rewrite the whole PPU to use the much more accurate to the hardware pixel fifo model (a fifo is “First-in First-out” a fancy way of saying a queue), every beginner resource on emulation recommended that i don’t bother with it and just use clock hacks to do what i want but that’s seems perhaps a little too unclean to me so off i went trying to implement it.
and let me tell you this is not ez, i have been so far only able to render the background correctly with completely wack timing that i was trying to fix before noticing that i should probably make a devlog explaining things :>
i don’t know all the specifics of how the pixel fifo works yet, so i am gonna try to explain more in the next devlog
also important to mention is that i am doing that ppu rewrite on a git branch called pixel-fifo until hopefully it gets functional enough that we can discard the old renderer and merge with the master branch
here is the tetris background layer rendering with the pixel fifo renderer, the sound is kinda messed up right now using the new renderer but hopefully i’ll be able to fix it later

Attachment
0
Copticfelo

Ok, so after 8 hours of pain and suffering, MBC3 seems to be correctly implemented
this was one of the most logically difficult things to implement because the MBC3 has an annoying feature called RTC (real time clock), in which the cartridge has its own 32.768 KHz oscillator to keep track of the passage of time in 5 registers:

  • seconds
  • minutes
  • hours
  • first 8 bits of days
  • {carry bit}{halt bit}00000{last bit of days}

(note: the day counter is 9 bits long so can store up to 511 days then it sets the carry bit and starts counting back from zero)
i first tried to count real system time seconds to do this at first ignoring the actual oscillator frequency, that unsurprisingly didn’t work at all and was prone to cpu scheduler instabilities, so i rewrote it to tick every 128 system clock cycle (cuz 4194304/32768 = 128), this ofc came with its own compromises, for one you have painstakingly increment every register one by one and check for overflows manually every tick which wasn’t that bad tbh, but the biggest problem is that now if you want to keep time when is emulator is closed (just like real cartridges which had a battery that keeps the clock ticking), you would need to add the unix timestamp delta (current_timestamp-save_timestamp) to the 5 RTC registers with all the baggage of the carry flag and whatnot which was very annoying but i think it works now
it only took 349 loc in a single file lol
now that that’s done DMG emulation is technically complete but in practice, there is still a lot of bugs and instabilities which i am going to start working on later today
(oh and there was also this funny performance fix that gave us like at least 2x more performance in debug mode)
here is the rtc3test rom passing and Pokemon: Silver saving time successfully (note: it crashes when you go down the stairs on the first room :<)

Attachment
Attachment
0
Copticfelo

Hey, thought i would make a quick update (yes 4 hours are quick by this project’s standards)
So I have been working on restructuring the MBC trait and structs (still not satisfied but whtv :<)
i basically made them own the ROM banks and the ERAM banks, and now when you wanna r/w from them you talk to the memory controller (i.e one of the structs that implements the MBC trait)
anyways speaking of ERAM banks, we now have save files (still needs polishing but later ig), basically whenever a game disables writing to ERAM i write all ERAM banks to a file (the file is currently just the all caps name of the game contained in the ROM file, unlike most emulators which use the name of the ROM file), and on startup the emulator looks for such file and if found loads its contents into ERAM, putting that all together games can save now
if you’re wondering why I didn’t implement MBC3 first, the answer is i was going to but realised half-way through that i needed to save RTC clock data to a file to make this work (more on that on the next devlog hopefully)
now excuse me imma go work on the much more complex MBC3, then hopefully i will be free to do some much needed PPU fixes before moving on to CGB support
Anyways here is The Legend of Zelda: Link’s awakening saving and loading (probably the most impressive game that runs on this emulator correctly lol minus the graphical glitches in the intro)
Tchüss :>

0
Copticfelo

Hi, been a while since i made a devlog or worked on anything new for that matter
So I have done 2 things, one is…you guessed it, ANOTHER major refactor, the design architecture i used for the emulator was very unconventional and very non-idiomatic, basically it’s what i like to call “✨Everything is a free function✨”, so naturally i despised this approach and wanted to leverage the power of structs in rust a lot more, so i renamed the CPUContext struct to “Bus” and got to work moving everything i didn’t like into a new SM83 struct, the details are too long to explain but basically now i can call “self.apu.pulse_1” or “bus.read(0xC001)” and not “context.apu.pulse_1” or “MemoryMap::dma_read(context, 0xC001)” like psychopaths
after that i decided to skip the noise channel for now, instead focusing on getting the emulator as functional as much as possible as i kinda really wanna support gameboy color games rn, but before that i want to make sure that the DMG foundation is solid enough to continue (DMG = the original gameboy codename).
so now what i am doing now? Answer: Memory banking controllers (MBC)
basically because you only have access to a 32KB of ROM Data on the gameboy address bus you need to “bank memory” on the ROM and the external RAM (ERAM) regions, so you have a certain number of 16 KB ROM banks for the ROM and likewise for ERAM (but 8 KB banks instead) and you can switch between them by the use of MBC chips of which there are multiple versions, most monochrome Gameboy games use MBC1 tho (not true for the CGB)
the emulator alr had hacky support for the first gen MBC chip MBC1 but i expanded that a bit to be a lot more correct and made some foundations for me to be able to easily support the rest of the MBC chips, it didn’t get anything that didn’t alr work working, but ig it will ensure that things become less buggy later on
ok time to go do MBC2, 3 and 5, here is F-1 race running on the emulator in the meantime (note that that alr worked a long time ago, apparently it uses MBC2 which is just MBC1 with some modifications, oh and ignore my terrible gameplay and the weird tile colors)
ok now, bye :>

0
Copticfelo

Ok, so what did i work on next?
4 things
1- There was a few bugs related to the length timer that i needed iron out in the pulse wave channels, the pulse wave is still missing something called the frequency sweep (it exists in code, but i am pretty sure that’s not how it’s supposed to work and can basically feel no change when it’s removed)
2- I added a file picker for some reason :>
3- I added the wave channel, it’s the 3rd audio channel in the GB, it was fairly trivial, didn’t even take a whole 4+ hours like everything in this project, all of its functionality afaik is working (aside from hardware quirks that i don’t care about), but because of the way my code is structured, i can only run code every 4 cycles but the wave channel is clocked once every 2 cycles so i just calculate the 2 samples and average them, not ideal but eh will fix later
4- this one was annoying, so basically my main OS is Arch (btw) linux, on that system, the audio worked perfectly but once you try running the emulator on a more sane OS (Mac or Windows), the audio noticeably stutters, i pretty quickly discovered that my chad linux install was able to generate 1500+ samples per audio request while MacOS (and probably windows too) was able to manage only 700-1200 every audio request and SDL always requests 1024 samples so you would sometimes hear stutters, the only thing that fixed this is tying to framerate to the generation of at least 1024 samples so the buffer wouldn’t underrun
Idk when that happened but the deadline is now April 30, so i might implement the feature i wanted to make like CGB support, a GUI using Iced, complete cycle accuracy, cleaning up the code, etc
Anyways goodbye for now :>

0
Copticfelo

Hello, In these 8 hours i learned something important, Audio is gonna be the most difficult thing in this project and not graphics
before i started working on the APU i knew next to nothing about audio, so as you would imagine it was hell, but even after i learned the relevant topics for this, i was still struggling, why you would ask?
Games on the gameboy can’t just store mp3 file for songs, no they dynamically generate their music by writing to a handful of registers to control 4 audio channels
2 Pulse width modulated channels (Square wave with extra steps)
1 custom wave channel (games can define their own samples and give a frequency)
1 noise wave (for percussion and sound effects, etc)
I have been exclusively focusing on the first 2, which might have been a mistake because they seem the most difficult
basically, they define a “duty cycles” which define how much should the amplitude be high (equal to the volume) compared to when the amplitude is low (aka 0), they also have things like frequency sweep and volume sweep which i am still figuring out
the biggest hurdle by far was the sample rate, the gameboy APU is synced with the master clock so that means the sample rate of the gameboy’s apu is over 1 MHz completely un-hearable for humans, to solve this you need to downsample the output to 44.1 or 48 KHz (just like most normal music), i didn’t do any complicated maths to do so i just skipped some samples so we hit our target frequency, it sounds atrocious in debug mode, but in release mode it sounds…..dare i say “fine”? there is still a lot of fine tuning i need to do on my end ofc to get it to play perfectly and there is also the frequency sweep (freq changes over time) and volume sweep (vol changes over time) i need to get working because when i enable them with the current implementation they just make this sound even more atrocious
anyways if you don’t value your ears here is how it sounds now
goodbye :>

0
Copticfelo

Ok so what changed from last time? mostly ppu changes
first of all, i fixed the input bug that prevented most games from being playable by treating writes to the input register P1 as queries and internally saving the state of the input until it’s queried in which case P1 is updated with the state in the “Joypad” struct.
then I made 3 changes to the PPU, I started by fixing bugs here and there in the background layer by introducing a simpler drawing pipeline (instead of looping over tiles we loop over pixels and calculate what tile we are in and do modulo 8 to get the tile column), introduced correct LYC and H-Blank interrupts (too long to discuss here) and all these changes fixed a lot of things that were broken from last time.
then i turned my attention to drawing the window layer, took a while to figure out because i kept confusing tile indices with pixel indices but eventually figured it out, and the window layer now works perfectly (hopefully anyways).
anyways next i noticed how blurry the image from the emulator got from scaling and remembered a teckquickie video talking about scaling algorithms and how nearest neighbor was the best for pixel art so i changed the scaling algo that SDL used to nearest neighbor and now the picture is very sharp and clear.
and finally, refactored sprites to use that simpler drawing pipeline i mentioned which resulted in a bug being fixed (sprites at the very left edge were being clipped).
now i will go figure out how gameboy audio works then come back to the missing PPU features later.
bye :>

Attachment
Attachment
Attachment
0
Copticfelo

Hi, 2 people and 1 bot who read those devlogs
I figured out why most games from the previous devlog didn’t render correctly, it was because most games had a keycode to reset the console (when all buttons are pressed), and with all io registers initialised to 0 and 0 meaning that the button is pressed most games thought that all buttons are pressed and kept reseting the console, after i implemented controls (which for some reason games aren’t able to query yet because of a bug i haven’t found yet) most games started rendering graphics and the performance issue was actually just a bug in the halt instruction and after fixing that and doing timing fixes the background layer rendering was complete.
next i turned to the sprite rendering which wasn’t difficult at all took me like half a day minus something called drawing priorities (more about that on the next paragraph).
I am happy to declare that the PPU is mostly done, Background rendering is fully working Sprite rendering is mostly fully working apart from drawing priorities (what to do when 2 sprites overlap, what to do when you surpass the 10 sprite per scanline limit and do you draw over opaque background pixels or not?), the window layer (easily the most useless gameboy feature imo) and all gameboy color features (which i am or may not implement due to time constraints)
but tbh sprite drawing was one of the easiest things to do in the PPU so far
If you want a detailed changelog just look at the commit history from 0958ea7 to a18cbfa (this is a devlog not a changelog)
for now here are some games rendering on the emulator, keep in mind that TLOZ works only after i patch the ERAM bank count to be at least 2 which i still haven’t committed cuz i am not sure if it’s correct to just increase ERAM size by 2 despite the header saying there is no ERAM banks at all.
bye for now :>

Attachment
Attachment
Attachment
2

Comments

Anass Zakar
Anass Zakar about 1 month ago

Hi Copticfelo

Copticfelo
Copticfelo about 1 month ago

Hi Anass Zakar :>

Copticfelo

oh boy there is a lot to talk about,
I finally started actual work on the PPU (pain), at first i was just goofing around with sdl3 trying to learn the most efficient way to draw to it and i concluded that i am just gonna keep a framebuffer that i upload to an sdl3 texture every frame, that however presented a problem, rust lifetimes,
there is a rule in rust that says that a struct may not hold a reference to itself (or by extension one of its attributes) but the Texture type in sdl is dependant on its TextureCreator which means i can’t store the Texture and its TextureCreator in the PPU struct storing it in the parent struct (CPUContext) is also terrible because of rust explicit lifetimes, basically if i introduce an explicit lifetime specifier, i would have to edit every single function that takes a reference to CPUContext (pretty much the whole emulator), to avoid this disaster of a refactor, i compromised a a bit and tried to make the game loop outside of CPUContext.
i don’t know if that was a good idea tho, but regardless i just wanted to focus on the Gameboy PPU logic for now and we can come back for structural improvements later and so, ONE REFACTOR LATER and after bunch of brainstorming to figure out how do i draw the background layer correctly,i eventually got….something…..to render, i mean tetris just spits garbage to the screen, Pokemon red just does nothing, Links awakening just panics, but Dr. Mario and the blargg test roms render correctly enough that i would call that a win, still tho that’s only the background layer and my early testing shows that the performance does suck (1 frame per sometimes) but i suspect it’s because i still haven’t perfected the drawing logic yet and inadvertently pulling a bunch of garbage data as well to the framebuffer, so yeah i still have to improve this a lot but maybe tmrw or something i need to go study lol :>

Attachment
0
Copticfelo

quick update: i have managed to get all the blargg test ROMs to pass after some suffering
so i implemented the gameboy timer and div registers and added some much needed performance and timing improvements (and accidentally discovered that logs absolutely destroy performance)
but after all that i found that 4 out of the 11 test would just get stuck in a loop i spent like 4 hours trying to debug gameboy assembly until i decided to put one of the test roms on Imhex (a hex editor) i realised that it had an message at the beginning of the ROM saying “54 61 6B 65 73 20 31 30 20 73 65 63 6F 6E 64 73 2E 0A 0A” which for the non-robots among you who don’t speak hexadecimal ascii means “Takes 10 seconds.” :<
so yeah i wasted all that time because i didn’t wait long enough for the ROM to finish testing, in my defense all the other test just took 2-3 seconds and my debug logs were def slowing down the whole things down by a long shot
anyways, after fixing a small bug in the daa instruction (again it does magic wizardry called binary coded decimals which i don’t get at all), i ran the all-in-one test and it happily passed all the way to the end (after waiting for like 30 seconds lol)
NOW HOPEFULLY i can focus on the PPU (Pixel Processing unit btw) a bit

Attachment
0
Copticfelo

idk how these hours keep passing so quickly (probably because of aggressive println debugging as my debugger is very buggy) but here we are
I eventually figured out what the bugs i had were by stepping through the tetris rom and comparing it with my emulator side by side (well workspace by workspace but whtv) and it became pretty clear that my emulator had a major problem that caused all the weird bugs, and ofc it’s a jump instruction
well not exactly it’s actually in call and reti
they were always being all treated as conditional, which is stupid especially in the case of reti which is never conditional
anyways, it was very easy to fix and the blargg’s test rom started outputing useful info
i didn’t keep that much track but i think initially only 2 out of 10 tests passing but the helpful part is that it pointed out what opcodes were wrong and after a lot of work applying various misc fixes and cleanups and some addition overflow safegaurds (which i need to put everywhere but lazy) i got it to 8/10 tests passing with the last 2 needing additional functions which i am still figuring out
anyways i then implemented some timer i barely understand because i really was trying to rush working on the PPU at this point but it just made everything a mess so i stashed the PPU changes i was trying to make and will hopefully return to it after i figure out the timer properly, implement stop and halt and make some mem helpers to help with the process of figuring out how to draw tiles and sprites, i estimate we still have 30-ish hours before the PPU is fully functional because i would still need to learn how to draw with SDL
anyways here a picture of one of the test ROMs output, that last line is the output in ASCII
well, bye for now :>

Attachment
0
Copticfelo

Hi, A lot of things have happened in the last 10 hours, none exactly revolutionary but mostly things that would make it easier on me to debug and bug fixes
The first thing i started working on (aside from the bugfixes which i am not gonna mention here) is got interrupts working, sorta…. like interrupts as a concept exist now but there are things that should fire interrupts that def don’t rn, guess that will have to wait
i was then very eager to get started on the PPU but because of the prev clock impl i couldn’t just perform things every cycle (like drawing a scanline) so because of that i refactored the whole cpu_context struct and ONE REFACTOR LATER we now have a better codebase for cycle accuracy
i then tried to start working on the PPU but realized tetris doesn’t even get as far as loading the tiles into the VRAM because it gets stuck on a loop, so i decided to focus on other things rn and see if i made something wrong in the instructions
i implemented what i could understand from the serial port behavior and tried to run some test roms but then i realized that even the blargg’s test roms behave very weird (either that or i just don’t understand how to use them) so i started looking on something i have been delaying for a while now, a proper logging system (instead of println chaos)
and ONE REFACTOR LATER we have an better logging (which also takes a lot of terminal space for some reason)
did it help? ehh not yet still need to investigate
i mean it def made things clearer on which instructions i am dealing with here
well guess i gotta go do more debugging now
bye 👋

Attachment
0
Copticfelo

looks like i have missed the clock by 5 hours but oh well,
after writing the last devlog i realized a dire realization, at this pace i won’t be able to finish the emulator before the heat death of the universe so i decided to forgo unit tests in favor of maybe running some test ROMs later to speed up development
and would you look at that,
All instructions have been implemented (minus STOP and HALT)
now i would love to explain what instruction exactly i implemented but that would make us exceed the 2000 char limit in this website
but to summarise what i implemented:

  • push/pop -> pushes and pops to the stack (crazy i know)
  • (misc 8bit loads too long to list) -> mostly weird edge cases
  • add/inc/dec -> 16 bit arithmetic
  • call/ret/reti/rst -> function calls and returns (like jump but sets up the stack for returns)
  • di/ei -> interrupts disable/enable
  • rr/rl/rra/rrl/rrc/rlc/rrca/rlca -> byte rotations
  • srl/sla/sra -> byte shifts
  • swap -> swaps lower 4 bits with upper 4 bits
  • bit/res/set -> bit operations
  • scf/ccf -> set or compliment carry flag
  • cpl -> A = not A
  • daa -> magic wizardry (binary-coded decimals or something)

all that remains is the STOP and HALT instructions which i am ignoring for now because the infrastructure (interrupts) for them is not written yet + idk how important are “enter low power mode” instructions in a desktop app

next thing on the list is the serial port (to run test roms) and the exciting part…the PPU, i don’t have a lot of graphics experience but hopefully it’s not that difficult :>

speaking of the PPU, I learned the loop we were getting stuck on last time is called a “V-Blank wait loop” not sure exactly what that means yet but i quickly patched it in a commit and suddenly we got out of the loop (although i need to handle memory banking to get any further in the ROM)
Well, until then, goodbye 🫡

Attachment
0
Copticfelo

well guess the clock hit 5 hours and its time for a 3rd devlog
(don’t ask about the crappy logo i am not a artist lol)
firstly, something i’ve been procrastinating for a while, implementing custom error types, why? idk everyone says it’s more “idiomatic rust” or whatever + doing that instead of returning strings everywhere seems more reasonable
anyways, more opcodes…….or rather opcode
yes indeed all this time was spent on the custom error types and trying to figure out this annoying instruction :<
LD HL SP+e8 -> Copies the value in SP (the stack pointer) PLUS the next signed (+/-) byte in the ROM file (i.e e8) to the HL register

it might seem simple enough but the problem lies with calculating the flags (carry and half-carry) for an addition between an unsigned 16-bit (SP) and a signed 8 bit (e8)
i had no clue what type of addition was that, signed? unsigned? 8 bit? 16 bit?
it took me a while to figure out but in the end i used the mgba emulator as a reference, and based my implementation on its implementation which was annoying since unlike C rust can’t do addition on numbers of different types (you need to cast them)
hopefully i did it correctly :>
anyways i had made a small fix in the jr (jump relative) instruction and suddenly we get past that instruction we were stuck on in the tetris ROM, don’t get your hopes up tho it just gets stuck in some sort of infinite loop and spikes up the CPU
anyways hopefully hopefully things will be much faster from now on since we got over the custom error types the next step would be probably the PUSH and POP instructions then the more 8 bit loads, jumps, bitwise operations, interrupts and then hopefully we can finally focus on something more fun (Graphics rendering using SDL :>)
Until then, Goodbye 🫡

Attachment
0
Copticfelo

Hey, I have decided to make these devlogs every 5-6 hours of work so i don’t lose my sanity, hopefully that’s ok? anyways……………right, the devlog
First, i implemented this 8 bit load instruction i thought was alr implemented
ld r8, n8 -> literally just copies the next byte in the ROM file to the register r8
pretty boring but……
Big news: We’re now getting into 16 bit instructions!!
there are not many of them in the GB but they proved to be a bit more complicated

I started by just trying to get a mental model of how 16 bit operands work in the GB at first I was just trying to do a one to one recreation of my “revolutionary” R8 enum but that proved to be very stupid because the r16 operand was wildly different from the r8 operand
basically
there are 3 types of the r16 operand:

  • r16 -> is either BC, DE, HL or SP
  • r16mem -> is either BC, DE, HL+ or HL- (+- just means you increment or decrement afterwords)
  • r16stk -> is either BC, DE, HL or AF (i have no clue what this does as of now, but is used in the pop and push instructions)
    i tried a lot of different things to implement this behavior nicely and cleanly.

I eventually figured it out after taking a break, I made the R16 enum for handling r16 operands, hopefully the cleanest piece of code i have ever written (maybe).
I implemented 3 instructions (12 opcodes overall) using it:

  • ld r16 n16 -> copies the next 16 bits in the ROM file (little endian) to r16
  • ld [r16mem] a -> copies the value in the A register to the memory address in r16mem
  • ld a [r16mem] -> same as above but in reverse
    and ofc wrote some tests for them.

Now when running a tetris ROM file only 6 instructions from that beginning section are unsupported, ofc it still crashes, most likely trying to read operands as instructions due to LD [a16], SP not being supported

Next step is probably implementing the rest of the 16 bit load instructions as well as the push and pop instructions.
Until then, Goodbye 🫡

Attachment
0
Copticfelo

Hey,
This is a project i have been working on on and off for past several months, so far i have accumulated 10 hours (actually more but hackatime is being annoying :>) of work, to sum up these 10+ hours,
I have written the infrastructure needed to make a simple Sharp SM83 (the cpu used on the original gameboy) CPU emulator the bottleneck now is implementing all the instructions of the 500 or so instructions the GB supports i have implemented approximately 156 instructions, most of these instructions are really one instruction with different operands.

let’s take for example the LD instruction, it just loads the value in the second CPU register into the first so it takes 2 operands like this LD [destination] [source], each gameboy instruction is one byte long (there are a ton of exceptions to this but believe me for now the vast majority are 1 byte long).

So how do you fit 2 operands into one byte you ask? answer: binary.
each byte consists of 8 bits each, either a 0 or 1
the LD instruction takes operands in its byte directly like this:
01 010 100
{—} {—}
01 -> are constant
middle 3-bits -> the destination register
last 3-bits -> the source register
the GB has 7 8-bit registers named from A to L with some “compound” 16-bit registers like HL
a 3 bit operand in decimal is a number from 0 to 7 and they are mapped like this:
0 1 2 3 4 5 6 7
b c d e h l [hl] a
[hl] means the “byte pointed to in memory by the address in the compound register HL”
for example: “LD H, A” = 01 100 111
^ ^
h a
so while 156 instructions/opcodes may look impressive in reality only like 13 different operations out of probably more than 50 (i didn’t count them that’s Gemeni’s guesswork).

for now i will just continue implementing more instructions especially after that big code refactor i was doing for the last couple of days is done.
Until then, Goodbye 🫡

Attachment
1

Comments

Copticfelo
Copticfelo 3 months ago

sorry if this is a bit hard to read. apparently, this website doesn’t seem to display indentations very well