OmniLab is an interactive, Iron Man-inspired Heads-Up Display (HUD) system that runs entirely locally to ensure zero latency. Developed by me (EngThi), it uses Python and MediaPipe for real-time hand gesture tracking via webcam, and the native Web…
OmniLab is an interactive, Iron Man-inspired Heads-Up Display (HUD) system that runs entirely locally to ensure zero latency. Developed by me (EngThi), it uses Python and MediaPipe for real-time hand gesture tracking via webcam, and the native Web Speech API for voice recognition. This data is sent via WebSockets to a local FastAPI server, which renders a 3D interface in the browser using Three.js. The next step for the project is to integrate a cloud-based Vision LLM (like Gemini), allowing the system to capture a frame from the webcam when I trigger the ‘Analyze’ voice command, analyze the environment, and provide intelligent feedback.
I used the Gemini CLI and Perplexity as pair-programming assistants and mentors. They helped me set up the initial project structure, autocomplete boilerplate code, debug errors, and fix syntax mistakes in what I was writing. I also used them to bounce architectural ideas around (such as keeping real-time computer vision local to avoid latency). I strictly followed a ‘no black-box’ rule: I did not let the AIs just generate the project for me. Any code snippet provided by the AI was reviewed, and I asked for explanations of the parts I didn’t fully grasp to ensure I understood the underlying logic (like WebSocket communication and MediaPipe mechanics) and remained the actual developer of the system.