DASH-OS ESP32 DMG Emulator: Optimization Sprint
Hardware: ESP32-32E (240MHz, No PSRAM), ST7796 TFT (HSPI 80MHz), MicroSD (VSPI 25MHz), 8BitDo (Bluepad32 BLE).
Constraints: ~11KB free heap. Bluepad32 fragments heap before setup(). GCC requires vars at top. TFT_eSPI byte-swap quirk active.
Current Memory Layout (~92KB total):
fbuf: 46KB (160x144x2 RGB565)
bank0Cache/bank1Cache: 16KB each (LRU)
cram: 8KB | gbp (peanut-gb): 5KB | lbuf: 1KB
Proven Wins (Implemented):
Early Malloc: Allocate all large buffers first line of setup() to beat BLE fragmentation.
SD Boot: 400kHz init -> SD.end() -> 25MHz restart. Prevents white-screen crashes.
Block Reads: 16KB bank reads vs byte-by-byte (12.8x speedup).
Burst SPI: Single 46KB pushColors vs scanline calls (8 FPS -> 60 FPS).
Hot Path: IRAM_ATTR on gb_rom_read and lcd_draw_line.
Failed Attempts (Why):
3rd Bank Cache: OOM/SD failure. Hard limit is ~90% heap.
50MHz Init: Card handshake failed.
Shared SPI: HSPI/VSPI isolation is required for stability.
GBC: Palette array unpopulated; peanut-gb standard is DMG only.
Optimization Goals (Need C++ Implementation):
DMA Overlap: Implement SPI DMA on HSPI to run gb_run_frame while fbuf transfers. Can I avoid double-buffering given 11KB heap? Provide setupDMA(), startDMA_pushFrame(), isDMAComplete().
ROM Pre-fetch: 8KB buffer for sequential bank reads on ROMs >256KB. Suggest async/interrupt strategy for prefetchBuf[8192].
Matrix Easter Egg: Lightweight falling chars for 480x320. <2KB heap, 15 FPS. Provide matrixModeUpdate().
GBC Hook: Efficient way to extract palettes in peanut-gb without massive RAM overhead.
Batching: Can I batch multiple pushColors into one CS assertion?
Rules: Vars at top of functions. No non-ASCII. Use existing buffer names. Memory safety checks mandatory.