OmniLab banner

OmniLab

2 devlogs
15h 10m 11s

OmniLab is an interactive, Iron Man-inspired Heads-Up Display (HUD) system that runs entirely locally to ensure zero latency. Developed by me (EngThi), it uses Python and MediaPipe for real-time hand gesture tracking via webcam, and the native Web…

OmniLab is an interactive, Iron Man-inspired Heads-Up Display (HUD) system that runs entirely locally to ensure zero latency. Developed by me (EngThi), it uses Python and MediaPipe for real-time hand gesture tracking via webcam, and the native Web Speech API for voice recognition. This data is sent via WebSockets to a local FastAPI server, which renders a 3D interface in the browser using Three.js. The next step for the project is to integrate a cloud-based Vision LLM (like Gemini), allowing the system to capture a frame from the webcam when I trigger the ‘Analyze’ voice command, analyze the environment, and provide intelligent feedback.

This project uses AI

I used the Gemini CLI and Perplexity as pair-programming assistants and mentors. They helped me set up the initial project structure, autocomplete boilerplate code, debug errors, and fix syntax mistakes in what I was writing. I also used them to bounce architectural ideas around (such as keeping real-time computer vision local to avoid latency). I strictly followed a ‘no black-box’ rule: I did not let the AIs just generate the project for me. Any code snippet provided by the AI was reviewed, and I asked for explanations of the parts I didn’t fully grasp to ensure I understood the underlying logic (like WebSocket communication and MediaPipe mechanics) and remained the actual developer of the system.

Repository

Loading README...

ChefThi

Turning OmniLab into a real HUD assistant: voice, vision and a more proactive AI persona 🎧🖐️

OmniLab has been my experimental lab for interfaces: 3D HUD, hand‑tracking, voice input, and AI all living in the same space. At the same time, life got busier: I started Computer Engineering, the campus is ~10 km away, and I’ve been splitting my time between classes, Blueprint hardware projects, and these software labs. That’s why commits came in bursts instead of daily drips — most of the work happened in small, tired, late‑night sessions.
Earlier this year I refactored the architecture to favor local‑first vision (removing a cloud version that was too high‑latency) and added the Web Speech API to the HUD, so I could trigger Gemini analyses via voice while the system tracked my hands in real time. That was the turning point: OmniLab stopped being “just a cool 3D scene” and started behaving like a genuine interface between my body, my voice and an AI brain.
Recently I pushed a big “SHIP‑ready” upgrade: Gemini integration is now first‑class, tests and CI/CD are in place, and the HUD feels more stable as a product, not just a demo. On top of that, I refined the AI persona: instead of only answering direct questions, OmniLab now makes proactive observations about what it sees and hears — it can comment on the scene, suggest next actions, and feel more like a lab partner than a tool.

Most of this evolution happened while juggling buses, deadlines and other projects, with Perplexity helping me reason about trade‑offs (what to keep in 3D, what to simplify, where AI actually adds value). This devlog is my way of catching the Flavortown timeline up with the reality: OmniLab grew quietly, but it grew a lot. ✨

Attachment
0
ChefThi

OmniLab Devlog #1

I’ve officially kicked off OmniLab on my first laptop! Coming from a background of mobile development and browser-based IDEs, my first instinct was to keep everything “off-device”. I spent a good chunk of these 5 hours attempting to run the processing stack on a remote VM (Firebase Studio) and tunneling the HUD via a web page. However, the latency was unbearable for real-time tracking. I quickly realized that for a “Jarvis-like” experience, the vision loop must be 100% local.

Technical Hurdles & Git Mess

The first challenge was MediaPipe. I started with legacy code, but it wouldn’t play nice. I had to dive into the latest MediaPipe Tasks API docs to rewrite the landmark detection core. It’s much more efficient now, but the documentation shift caught me off guard.

Since I was jumping between cloud editing and local testing without properly cloning the repo first, I ended up with a mess of Git conflicts. I used the Gemini CLI as a mentor to help me untangle the branches, resolve the “already exists” errors, and get the local and remote repositories back in sync. It was a great lesson in maintaining a clean workflow on a new machine.

Current Progress

I’ve successfully implemented the “pinch” gesture logic (calculating the hypotenuse between thumb and index) and set up a local FastAPI server to bridge vision data to a Three.js HUD. The HUD now runs locally on Debian 13 (XFCE), which eliminated all the lag from my previous VM tests.

Timelapses

Attachment
1

Comments

ChefThi
ChefThi 20 days ago

To clarify the technical choices: I’m focusing heavily on keeping the HUD lightweight on my new machine by using Debian 13 (XFCE) and optimizing the Python vision loop. I’m also studying the ada_v2 repository to implement better modularity in the UI layer. Integrating these clean interface concepts into a zero-latency environment is the main goal for the next update.