ChefThi shipped OmniLab

1 day ago

Shipped this project!

I built OmniLab—an interactive, Iron Man-inspired 3D Heads-Up Display (HUD) controlled by hand gestures and AI vision.

The hardest part wasn’t the initial code, but the deployment battle over the last two weeks. My original idea was to run everything 100% locally for zero latency and just submit a video demo, but that wasn’t accepted. I tried migrating the backend to Render and Railway, but the reviewers (shipwrights) rejected them because those platforms don’t guarantee “forever free” stable hosting, even though my credits covered well beyond the Flavortown deadline. On top of that, I fought a brutal war against Cloudflare and Google CAPTCHAs trying to make my Playwright stealth agent work on datacenter IPs. The workarounds worked sometimes, but they were ultimately too unstable.

So, I pivoted. Instead of fighting endless hosting and CAPTCHA rules, I rebuilt the project base. I implemented MediaPipe Tasks for Web to handle vision directly in the browser and focused on massively improving the interface. I added new tactical gesture controls (Swipe, Thumbs Up, Fist), optimized the 3D HUD, and created a flawless “Demo Mode”. I’m extremely proud of how I adapted to the constraints and delivered a highly optimized, futuristic command center!

ChefThi worked on OmniLab

4 days ago

0h 39m logged

feat: release V15 Elite HUD - Gemini 3.1 & Stealth SP Protocol (dd66521)
to
docs: add operational flow explanation and server-side captcha warning (e122d07)

hit a new tier today. Released the V15 Elite HUD powered by Gemini 3.1 & Stealth SP Protocol. Since the captchas couldn’t be fully defeated on the cloud, I added a server-side captcha warning and documented the operational flow. Transparency is part of a Hardened system. The interface is locked, loaded, and almost ready for review.
yay crt_blue_screen chrome

0

Log in to leave a comment

ChefThi worked on OmniLab

6 days ago

0h 24m logged

I refining more fallback workflows and cleaning up the edges. The focus is to make sure the reviewer gets a (a bit I hope) of the true Jarvis experience without hitting a wall. .

0

Log in to leave a comment

ChefThi worked on OmniLab

6 days ago

0h 39m logged

Taking a breath after the chaos breathe Just making this post to get calm now. The war against captchas and cloud restrictions almost burned me out, but ms-crazy the system is finally stabilizing. The architecture is Hardened

(I hope it continues like this, and that the changes to come don’t break anything)

0

Log in to leave a comment

ChefThi worked on OmniLab

7 days ago

0h 27m logged

Pure frustration against datacenter IP restrictions. Cloudflare and Google captchas are completely destroying my stealth agents for this version of the project. The workarounds are unstable. It’s brutal to have the logic perfect but get blocked by the cloud provider’s IP reputation. The system must adapt.

0

Log in to leave a comment

ChefThi worked on OmniLab

7 days ago

0h 38m logged

The bots are fighting back hard. Spent this time testing different CAPTCHA bypasses and stealth flags for Playwright. The API Free Tier quotas and anti-bot systems are relentless, but we adapt. Building the shield.

0

Log in to leave a comment

ChefThi worked on OmniLab

7 days ago

0h 17m logged

Quick session on the commute. Pushing minor UI and logic tweaks to keep the HUD polished. Every small commit adds to the Essence of the project. Keep building.

0

Log in to leave a comment

ChefThi worked on OmniLab

7 days ago

1h 5m logged

Stabilizing the Dual Realms Finally fixed the major environment issues. Pushed the fixes to align the local real environment and the DEMO version architecture. Running these tests on the Positivo laptop with Debian 13 to ensure reviewers get a stable experience no matter what. System breathing again

0

Log in to leave a comment

ChefThi worked on OmniLab

10 days ago

1h 7m logged

The Google walls were too high for datacenter IPs. I spent this session wiring DuckDuckGo as an alternative search engine. Trying to bypass the captchas

0

Log in to leave a comment

ChefThi worked on OmniLab

11 days ago

1h 2m logged

Working more towards the DEMO mode. On my PC and cell phone it worked in Chrome, but the Chromium (almost the same thing) that the reviewer uses didn’t. I think there must have been some error there. So I made sure to leave everything somewhat prepared and anticipate camera and microphone permission errors, that kind of thing.

0

Log in to leave a comment

ChefThi worked on OmniLab

12 days ago

0h 55m logged

feat: upgrade tactical HUD with voice commands, functional gestures and improved AI search flow (b1d1a40)

My Great Pivot (I’d say) The shipwrights rejected the local-only approach. I fired up the Positivo laptop on Debian 13 and executed the Migration to hybrid architecture. Splitting the Docker Backend for Railway and a Static Frontend to bypass their rules. I also locked in the Gemini 3 fallback router to survive API Free Tier quotas.

0

Log in to leave a comment

ChefThi worked on OmniLab

12 days ago

0h 34m logged

ironman

Dual-Source Vision Testing frame streaming on mobile data is pure pain. I pushed the real-time frame streaming and enabled dual-source vision for remote tracking. The commutes and many tests are not cool but the HUD now sees the world

0

Log in to leave a comment

ChefThi worked on OmniLab

12 days ago

0h 37m logged

Logged these hours while fighting the Playwright viewport height. The agent was cutting off the bottom of the screen again. Moving to the Web PC now to fix the server.py logic. No time for a long log, the bus is too shaky today

As you can see in the images I attached, I had some errors with the captcha. But I managed to fix some things and I’m improving it for the DEMO for you, the community in general, and the reviewer. Since I had lost time to fix some things…

google
cloudflare
pythonparrot

0

Log in to leave a comment

ChefThi worked on OmniLab

13 days ago

0h 33m logged

Testing more things in the DEMO version (for reviewers and other people who want to test it)

I had to regenerate new tunnels with Cloudflare to have the link to the DEMO
I fixed some errors
I tried to bypass the Captcha blocking in PlayWright, I started a mini implementation of DuckDuckGo as a search engine (but I haven’t finished it yet)

As you can see, their system easily detects automated access…

cryin

0

Log in to leave a comment

ChefThi worked on OmniLab

14 days ago

1h 29m logged

Writing a few things to continue in the web PC

This is in my PC:

I spent the last few sessions just fighting to get a decent DEMO ready for the reviewer. It’s one thing to run it locally, but making it stable for someone else is another story.

What I’ve been doing
Infrastructure: Had to swap a bunch of URLs and re-upload server.py. I’m using a Cloudflare tunnel with nohup now so the server stays alive even after I log off from the Web PC.

Screenshot Bug: Playwright was being a pain. Every time it took a print of the page to “see” what was happening, it would cut off the bottom half. The agent was basically blind to anything below the fold. I tweaked the settings so it shows almost everything now, but I still need to polish a few things to make it 100%.

And man when I writting this post appear images like emojis (this because I use the Spicetown extension). Taked a look liked this:

0

Log in to leave a comment

ChefThi worked on OmniLab

16 days ago

1h 38m logged

I’m writing this devlog in a rush. My laptop battery died, and since I’m editing this post on my phone (there’s no extension here to add commit changelogs), it ended up looking a bit hazy.

Summary: I spent about two weeks trying to ship this project, but Hack Club does not accept project demos on platforms that do not have a kind of “forever” free quota. Type of Render and Railway that I was using. I found out Yesterday I learned that Hack Club could provide servers for Hack Club members, so I signed up and was approved right away. I spent the whole day testing this with MediaPipe Tasks for Web and after a long time and with the help of Gemini CLI, I finally reached the point of deploying the project.

Unfortunately, Playwright doesn’t work because the server’s datacenter IP address is reporting thatSince it’s not from a regular user, it ends up blocking web requests.

0

Log in to leave a comment

ChefThi worked on OmniLab

17 days ago

1h 33m logged

Final cleanup of README and confirm Gemini 3 fallback router (84a9f86)
to
Fix: migrate to Stealth 2026 context manager (51ef0f7)

0

Log in to leave a comment

ChefThi worked on OmniLab

17 days ago

1h 28m logged

Fix hand tracking toggle, Render headless mode, and Playwright screenshot extraction (42a9636)
to
Refactor search to use direct Playwright with –no-sandbox for Render stability (d1637ed)

I refactored the browser search flow to stop relying on the MCP bridge for simple navigation and screenshots.

Replaced the old mcp_bridge.call_tool("browser_navigate") + screenshot logic with a dedicated capture_screenshot(url) helper that uses direct Playwright:

Key changes in server.py:

New async function with chromium.launch(headless=True, args=["--no-sandbox", ...])
Applied playwright_stealth + proper wait_until="networkidle" + 5s buffer for heavy JS
Screenshot as JPEG quality 60, returned as clean base64
Updated both manual search and HOMES_SEARCH_BROWSER paths
Improved error messages and status updates (“STARTING_PLAYWRIGHT”, “SEARCH_COMPLETE”)

The flow is now simpler, faster to debug, and much more reliable on Render/cloud deployments where sandbox and shared memory can cause issues.

Quick but focused session after classes. Seeing the screenshot arrive cleanly in the HUD without MCP complexity felt like removing unnecessary weight. The tactical interface stays responsive while the agent does real web work.

I still need to work on and improve all of this. I’ll have to add better logs to the system and Playwright screen, especially (they don’t all appear currently). Also, when I was preparing to make this Devlog, I received a notification on Slack that the reviewer rejected my ship, saying that rendering wasn’t allowed and that I should use something like Vercel or Cloudflare Pages… I didn’t understand because this is more backend-related, and he wanted deployment on sites that work with static archs

0

1

Log in to leave a comment

Comments

ChefThi 17 days ago

I didn’t want to format the rest of the post as .md

ChefThi worked on OmniLab

17 days ago

1h 42m logged

feat: add hand-tracking toggle and real-time HUD movement logic (921aaaf)
to
feat: enable dual-source vision (cloud + local) for remote tracking (4744461)

Today I tried to made the vision pipeline way more reliable by implementing dual-source frame acquisition in vision.py.

What changed:

Primary source: frames from cloud/WebSocket frame_queue
Automatic fallback: if no cloud frame arrives within 10ms, switches to local cv2.VideoCapture(0)
Added source tracking (“CLOUD” or “LOCAL”) and updated overlay text to show it clearly alongside gesture name
Kept full MediaPipe HandLandmarker processing unchanged
Safe cleanup: webcam is only released if it was actually opened

The gesture loop now stays alive even if the remote browser connection drops temporarily — perfect for long stealth automation sessions or when running in mixed environments.

Focused session after classes. Seeing the overlay switch smoothly between SRC: CLOUD and SRC: LOCAL while gestures kept working felt like removing the last weak link in the chain. No more vision dying when the stealth agent goes off-screen.

Combined with the recent persistent stealth Perplexity tools, MCP bridge, and robust demo/fallback HUD, OmniLab is becoming a true resilient local agent.

And also, as you can see, the model was in a mess when I went to prepare the Devlog. So it didn’t even receive the form that was sent. But this is fully functional.

Unfortunately, the HUD is being disobedient… I couldn’t get it to actually follow the hand gestures. Locally, things are going well, but to reflect in the render, I need to push the changes to the remote and wait for the modifications to build, which delays testing and debugging a bit. And the URL search does ot work

0

Log in to leave a comment

ChefThi worked on OmniLab

19 days ago

2h 45m logged

feat: implement browser-side camera capture for Render deployment (02abd48)
feat: implement full cloud-vision pipeline for Render (clean code) (c5ecb5d)

making this devlog for carry the commits changelogs (If I use other machine or my phone it’s don’t get availiable, because this changelog feature is an extension (Chrome/Firefox)

Here I made changes to change how the deployment on Render accessed the processed images and kept testing the system. I was trying to make the HUD work like locally (in this case there will be an annoying delay anyway. The distance, latency, and processing from the user’s browser to the Render server cause this…)
The capture of the frames had not yet been implemented here. I think somewhere in the middle of the edits I ended up getting lost and didn’t make many improvements in this part of the system.
In this part I used the Gemini CLI to try to explain to myself and debug how to make the system work on Render. Helping it helped, but I felt I could have done more… kind of didn’t get anywhere

0

Log in to leave a comment

ChefThi worked on OmniLab

21 days ago

2h 53m logged

refactor: stabilize MCP bridge and pivot branding to HOMES (1ca3baa)
to
feat: implement persistent stealth automation and Perplexity search engine (b40e296)

Today I pushed the automation layer to the next level with full persistent stealth capabilities using Playwright.

Created a suite of tools that reuse real Chrome sessions (launch_persistent_context + user_data_dir=./.playwright_data) and apply playwright_stealth to bypass detection. Focused on Perplexity AI as a powerful external brain:

New scripts added:

stealth_agent.py — headless/off-screen stealth navigation with anti-detection flags
perplexity_agent.py — persistent login flow (manual Gmail step + 180s wait)
find_history.py — searches and extracts OmniLab-related threads from sidebar
perplexity_chat.py — automates follow-up questions in existing threads
Helper scripts for layout inspection and screenshot validation

Intense after-class session. Seeing the stealth agent open Perplexity, find old OmniLab threads, and send a clean follow-up without triggering any blocks felt like unlocking a new superpower. The HUD can now decide to query Perplexity semantically via MCP and bring rich answers back to me.

Combined with the recent MCP bridge and robust demo mode, OmniLab is evolving into a true local command center that can use the entire web intelligently and feed my HOMES pipeline with high-quality data. Next: wire Perplexity actions directly into the gesture/voice flow and add mock versions for flawless demos.

** P.S. I used AI to structure this post. I organized and went through the things I had worked on and made a briefing of the things. Also, to test the scripts and some tests I used CLI to improve my errors and accelerate this test part.**

0

Log in to leave a comment

ChefThi worked on OmniLab

22 days ago

3h 47m logged

Revise README for improved clarity and structure (67f9237)
Update model_id to ‘gemini-3.1-flash-lite’ (ea6b70f)
Update AI technology version in README (41cfad9)
feat: implement demo mode mocks and prepare MCP architecture (5e3e9f6)
feat: implement McpAgentBridge for semantic browser automation (344ab15)

Today I took the biggest leap toward a real local agent: implemented the official Playwright MCP (Model Context Protocol) bridge.

Instead of fragile direct navigation or pyautogui clicks, OmniLab now talks to the browser through semantic tools. The new McpAgentBridge class starts the @playwright/mcp server via stdio, manages ClientSession, lists available tools, and executes them cleanly with call_tool().

Full McpAgentBridge with start/list_tools/call_tool/stop
Integrated into FastAPI lifespan alongside the existing browser setup
Updated handle_agent_action() so BROWSER_SEARCH_RECIPE now triggers real MCP tools
Cleaned up old direct calls and unused imports

The flow is now: Gesture/Voice → Gemini decides action → MCP executes semantically in real Chromium → status update back to the tactical HUD.

It still needs more tool mappings and human-like delays, but the foundation is solid and future-proof.

Seeing MCP Agent Connected to Playwright Tools” and the first semantic action fire without breaking the HUD felt like JARVIS finally getting hands. No more “just describe the frame” — now it can actually DO things on the web and feed my HOMES pipeline.

0

Log in to leave a comment

ChefThi shipped OmniLab

25 days ago

Shipped this project!

Hours: 35.61

Cookies: 🍪 933

Multiplier: 26.21 cookies/hr

I built OmniLab — a gesture-controlled AI HUD that runs in the browser. You control a 3D cyberpunk interface using hand gestures (pinch to scan, hold to trigger deep analysis), and the system uses a Python vision backend + Gemini to analyze your environment and report back via voice. The hardest part was making the demo work even without the backend: when the WebSocket disconnects, the HUD enters Demo Mode automatically — mouse controls the 3D cursor and the buttons run mocked AI responses with speech synthesis. Really proud that a reviewer can experience the full vibe without setting up anything. :)

ChefThi worked on OmniLab

25 days ago

0h 56m logged

feat: final tactical polish for SHIP - English demo mode and agent search fix (0710e20)
feat: implement high-fidelity International Demo Mode with mouse tracking and English localization (11bfc83)
docs: translate to English and enable automated GitHub Pages deployment (d7b8db4)
fix: ensure Demo Mode activates on GitHub Pages by handling Mixed Content WebSocket errors (71efe67)

Wrapping up things for this week of LockIn, I decided to make the DEMO via GitHub Pages as I had been doing, explaining I made it as a Mock for the reviewer and in general to let them test a bit of how it is without needing to install all the dependencies. Playwright, GEMINI_API_KEY, camera, etc.

The AI that I was using ended up making some changes in the server.py and html part so I tweaked a few things and delegated to it to fix what was missing. Then I asked for a deploy script in Actions and that’s what I got!

0

Log in to leave a comment

ChefThi worked on OmniLab

26 days ago

0h 44m logged

feat: implement automatic HUD orbit and fallback demo mode for reviewers (bfab249)

Today I made the HUD way more robust and demo-friendly — exactly what reviewers need.

Implemented an automatic fallback system: if no real data arrives from the WebSocket for more than 2 seconds (webcam offline, backend delay, or during recording), the HUD smoothly switches to demo mode with a beautiful orbiting cursor animation.

What was added in static/index.html:

Data flow monitoring with lastDataTime and 1-second checks
startDemoMode() using sin/cos math to simulate natural cursor movement, periodic pinch_progress scans, and fixed 60 FPS
Seamless transition: real WebSocket messages instantly stop the demo and take over
Improved onopen/onmessage/onclose handlers with auto-reconnect + fallback

The tactical UI now stays alive and immersive 100% of the time — perfect for videos, quick demos, or when showing the project without perfect hardware.

Quick but focused session after classes. Seeing the cursor start orbiting smoothly when I paused the vision server felt like magic. No more awkward “wait, it froze” moments during recordings.

Combined with yesterday’s AI mocks and DEMO_MODE, OmniLab is now extremely easy to showcase. Reviewers can open the page and immediately see the full Iron Man experience without any setup pain.

0

Log in to leave a comment

ChefThi worked on OmniLab

26 days ago

1h 29m logged

feat: add demo mode and ai mocks (8ff8f22)

Demo Mode + AI Mocks Zero-Dependency Showcase for my first ship

Today I added a full demo mode so OmniLab can run beautifully without a webcam or real Gemini API key — perfect for quick testing, recording timelapses, and showing the project to others.

What was implemented:

New DEMO_MODE flag in .env (true = mocks everything, false = production)
Cycling mock responses with realistic 0.6s simulated latency
Guarded Gemini client creation so it only initializes when needed
Added 4 hand gesture sample images in static/demo/ for visual consistency
/analyze endpoint now returns clean JSON with demo: true flag when in mock mode

The HUD and gesture pipeline stay exactly the same — you still see the tactical overlay, pulse effect, and “Deep Scan” flow, but everything is simulated and stable.

After a long day of classes I wanted something that would let me record clean demos without fighting hardware. Turning DEMO_MODE on and seeing the mock responses flow perfectly into the HUD felt super satisfying. No more “sorry, needs webcam” excuses.

This makes OmniLab way more shareable and production-like. Combined with the recent Playwright stealth work, we’re getting closer to a full local agent that can demo real browser actions without any external dependency.

0

Log in to leave a comment

ChefThi worked on OmniLab

27 days ago

1h 7m logged

fix: stabilize vision-server bridge and synchronize Gemini 3.1 models (81ac80e)

OMNILAB // RESILIENCE & BROWSER HANDS 🛡️

Spent the last session bulletproofing the core architecture. I refactored the vision module to use a multi-threaded loop so it doesn’t just die if the connection drops. Now it lowkey waits for the
server to come back online automatically—no more manual restarts. I also standardized everything on Gemini 3.1 Flash Lite for that low-latency speed boost.

The big win was expanding the gesture engine. I implemented Swipe, Thumbs Up, and Fist recognition, and mapped them to actual browser actions using Playwright. Seeing the HUD trigger a stealth search or navigate tabs just by moving my hand was the ultimate vibe check. I also hunted down a sneaky MediaPipe indexing bug that was causing hard crashes during fast movements. The invisible interface is finally starting to execute real intent instead of just describing the scene.

0

Log in to leave a comment

ChefThi worked on OmniLab

27 days ago

1h 19m logged

fix: resolve variable scoping in vision bridge and enhance loop stability (d38cadc)
feat: implement tactical control panel and visual telemetry fixes (1dd8917)

basically I found errors on the panel. It’s not appear correctly before.

I used the Gemini CLI for a quick and simple fix in this part :)

P.S. I noticed that my recorder don’t was saved. The Screenity extension got an error after I completed the video

Now there was a visitor at home then I went to greet them

0

Log in to leave a comment

ChefThi worked on OmniLab

28 days ago

0h 59m logged

fix: resolve playwright-stealth imports and fastapi validation errors (9d2ada5)

Playwright Stealth + FastAPI Validation Fixed Browser Control Now Stable

Quick but important cleanup session today.

Fixed two blocking issues that were breaking the new browser automation layer:

Corrected playwright_stealth import and usage: switched from stealth_async to stealth so the browser launches with proper human-like fingerprints (anti-detection for Cloudflare, Google, etc.).
Enforced proper Pydantic validation on the /analyze endpoint: changed request: any to request: AnalyzeRequest (BaseModel with base64 image field). This prevents malformed payloads and makes the API more reliable when Gemini or voice triggers actions.

Also added the new libs for v0.3 (Playwright ecosystem + dependencies).

The pipeline is now much more solid: MediaPipe gesture/voice → Gemini analysis → execute_system_action → Playwright with stealth can open real tabs, navigate, and interact without immediate blocks.

Short focused session after classes. Seeing the stealth apply correctly and the FastAPI endpoint stop throwing validation errors felt like removing training wheels. No more random crashes when the HUD tries to trigger a browser action.

OmniLab is evolving from “cool HUD that describes frames” into a true local agent that can actually use the browser as part of my HOMES workflow. Next target: full BROWSER_ACTION handler with human-like delays and real task execution (e.g. open recipe site → extract ingredients → trigger HOMES-Engine). The invisible interface just got way more powerful.

0

Log in to leave a comment

ChefThi worked on OmniLab

28 days ago

1h 43m logged

feat: core evolution - gesture control, Gemini 3.1 Thinking Mode, and modular HUD (64057c1)
add new libs for the v0.3 (634a0a4)

OmniLab Devlog // v0.3 Checkpoint

Yo, just dropping a quick update on what’s been happening with OmniLab. The last two commits were lowkey a mess—honestly, they were just checkpoints to save where I was at, so they didn’t really work out of the box.

The Struggle (aka The Errors)

So, when I tried to actually run the code from the recent pushes, the system basically threw a tantrum.

Import Drama: In server.py, I tried to pull in stealth_async from playwright_stealth, but it just wasn’t having it. Total ImportError. Had to swap it for the standard stealth function to get the browser agent to even start.
FastAPI Tantrum: The /analyze route was broken because I used any as a type for the request. FastAPI is super picky about that, so it crashed with a FastAPIError. I had to bring back the proper Pydantic models to make it happy again.
Browser Missing: Playwright was installed but the actual Chromium browser wasn’t there. Pro-tip: playwright install sometimes fails, so using python -m playwright install chromium is the way to go.

What’s Actually New

Even though it was bumpy, we got some cool stuff in:

Gemini 3.1 Thinking Mode: The brain is officially upgraded. It’s faster and actually “thinks” before it gives you the tactical report.
Pinch-to-Scan: This is the best part. You don’t have to yell at the mic anymore. Just hold a pinch gesture for 1.5s, the HUD ring scales down and changes color, and boom—it triggers a deep scan.
OmniBrowser Agent: We added Playwright so the HUD can lowkey browse the web for you. It’s not fully “Jarvis” level yet, but it can navigate and pull data in the background.
HUD v2.1: New tactical UI with a log console at the bottom and real-time FPS/latency tracking so you know the system isn’t lagging.

0

Log in to leave a comment

ChefThi worked on OmniLab

29 days ago

1h 41m logged

feat: evolve OmniLab into an active command center for HOMES ecosystem (dc3b7ba)

OmniLab becomes Active Command Center for HOMES 🔥 Gesture → Real Action

Today I took the biggest step yet: turning OmniLab from a passive scan tool into a true command center that can execute actions inside the HOMES ecosystem.

Major refactor in server.py:

WebSocket connections now use sets for true O(1) operations
Re-used the image caching + resize pipeline (MD5 dedup + 512×512 JPEG)
Added execute_system_action() handler with real examples:
- “HOMES_EXECUTE_TASK” → placeholder to trigger Termux workers / video rendering
- “BROWSER_NAV_NEXT” → pyautogui hotkey (Ctrl+Tab) as proof-of-concept
Broadcast logic cleaned up so vision → HUD communication stays rock-solid

The flow is now: Pinch gesture (or voice) → MediaPipe → Gemini analysis → action decision → execute locally or fire HOMES pipeline.

It still needs the actual webhook to HOMES-Engine, but the architecture is solid and the HUD stays responsive.

After classes I went straight into a long refactoring session. Seeing the action handler print “Executing HOMES_EXECUTE_TASK” for the first time felt like JARVIS finally waking up. No more “just describe the frame” — now it can DO something.

OmniLab + HOMES together are starting to feel like a real personal AI operating system. Next: full voice + gesture synergy and actual integration with HOMES worker queue. The invisible interface is getting dangerous. 🤖⚡

I get a bit lost during this development. But we got improvements!

0

Log in to leave a comment

ChefThi worked on OmniLab

29 days ago

1h 41m logged

perf: implement image caching, O(1) connections, and asset optimization for sidequest v0.2 (1738e81)

I just finished a heavy optimization session to kill the lag in OmniLab. I sat down with my AI assistant to tear apart the bottlenecks, and we managed to turn this from a “cool prototype” into a high-performance local AI.

What we changed (The “Brain” Upgrade):
Smart Memory: The system now remembers what it just saw. Using Image Caching, it won’t waste time or API tokens re-analyzing the same frame if nothing has moved. It’s like giving the HUD a 30-second short-term memory.

Instant Connections: I swapped how the HUD tracks connections. By moving from “lists” to “sets,” the system now handles multiple data streams instantly, no matter how many are running.

Lightweight Assets: We automated an image-shrinking process. Before sending anything to the cloud, the HUD now compresses and resizes frames. This makes the data 84% lighter without losing the “vision” quality Gemini needs.

The Numbers (Why this matters):
Speed: Response time dropped from 820ms to 540ms. It feels way snappier.

Efficiency: We went from 60 API calls per minute down to just 3 or 8. No more wasting tokens on duplicate images.

Stability: The HUD is buttery smooth now, even during heavy “Deep Scans.”

It started as a quick after-class session and turned into a solid grind. Seeing the latency numbers drop in real-time was incredibly satisfying.

0

Log in to leave a comment

ChefThi worked on OmniLab

30 days ago

1h 35m logged

feat: add HUD demo mode, scan pulse effect, and dynamic port binding (90ab4c8)
feat: core HUD improvements and repo cleanup (c82deea)
chore: ensure all private project files are untracked (2daea10)
feat: implement concurrent vision processing and HUD fail-safe systems (a954cfb)

Concurrent Vision + HUD Fail-Safes Parallel Power Unlocked

Big day — I finally tackled the last major bottleneck: sequential scan delays.

Implemented concurrent vision processing so frame capture, MediaPipe analysis, WebSocket transmission and Gemini 3 Flash calls can run in parallel without blocking the main HUD thread. Added robust fail-safe systems (graceful degradation, timeout recovery, and fallback states) so the interface never freezes even if the LLM takes longer than expected.

What landed in this session:

Full concurrent pipeline using Python asyncio + ThreadPoolExecutor for vision tasks
HUD fail-safe layer with visual indicators when processing is happening in background
Minor core improvements and repo cleanup (removed private files from tracking)
Combined with yesterday’s demo mode, scan pulse effect and dynamic port binding — the whole system now feels way more stable and production-like

After classes I went straight into a long session. Seeing the scan pulse animate while the AI thinks in the background without any stutter… that’s the JARVIS moment I’ve been chasing.

FOR THIS DEVLOG ONE THING ARE HAPPENED. THE MODEL OF I SET (gemini-3-flash-preview) WAS EXPERIENCING A HIGH DEMAND. SO I SWITCHED FOR THE 3.1-lite-preview

0

Log in to leave a comment

ChefThi worked on OmniLab

about 1 month ago

0h 26m logged

feat: complete gesture-to-scan logic and HUD v2.1 tactical UI (8f09f3b)

Gesture-to-Scan Complete + Tactical HUD v2.1

Today I finally closed the loop on the most important interaction of OmniLab: turning a simple hand gesture into a full AI-powered scan.

The big challenge was making the flow feel instant and reliable. I refined the MediaPipe Tasks API logic so the Pinch gesture (held for 1.5s) now reliably captures the webcam frame, sends it through the local FastAPI pipeline, and triggers Gemini 3 Flash Vision without breaking the HUD.

What’s new in this push:

Improved state management so the system no longer queues scans sequentially — each Deep Scan now feels more independent
Small cleanups in api/v1, core, vision.py and utils for better maintainability

It’s still not 100% min-latency (Gemini still takes a moment to think), but the difference from last week is huge. The HUD now truly reacts to my hand like JARVIS would.

Late-night session after classes, but seeing the tactical report pop up instantly after the pinch hold made it all worth it. The invisible interface is getting closer every commit.

1

0

Log in to leave a comment

ChefThi worked on OmniLab

about 1 month ago

3h 28m logged

feat: implement tactile gesture activation and HUD v2.1 modular update (133994c)

Deep Scan & Tactical Gestures🖐️👁️

After a few days, I finally implemented the Deep Scan system. The challenge was: how to trigger an AI analysis without touching the keyboard?

I used MediaPipe to create a "Pinch" trigger. By holding the gesture for 1.5s (Tony Stark style calibrating the HUD), the system captures the frame and sends it to the Gemini 3 Flash brain. The result: I get a simple instant tactical report directly on the display, running with very low latency thanks to the new local architecture.

I liked all of this, thought these new updates were cool. The thing is, there’s still a certain delay that I think makes it trigger the Scan one after another, not getting the complete description of the first one.

The OmniLab not only shows data now; it understands what I see. 🎧🔥

0

Log in to leave a comment

ChefThi worked on OmniLab

about 1 month ago

1h 20m logged

Refactor project details and shipping status (79c8e53)
feat: HUD v2 evolution with TTS, real-time diagnostics, and Gemini 3 Thinking Mode (674e279)

🚀 The HUD Just Leveled Up

The gap between thought and execution is getting smaller. I’ve just pushed a massive round of updates to the interface, bringing that “Stark Tech” vibe closer to reality.

What’s New:

Clean Decoupling (ada_v2 style): I moved the entire HUD interface to a static/ directory. By separating the Three.js frontend from the FastAPI backend, I can now tweak the UI instantly without touching the server logic.

Gemini 3 Thinking Mode: Deep reasoning is now live. When you trigger an analysis, the HUD displays DEEP SCANNING… while Gemini grinds through the image metadata to deliver a high-precision report.

J.A.R.V.I.S. Talk-Back: The HUD finally has a voice. Using the Web Speech API, the system now talks back during scans, making the whole experience feel way more immersive.

Real-Time Diagnostics: I added a telemetry overlay to monitor FPS and latency. It’s essential for keeping everything buttery smooth on my local Debian 13 setup.

Pinch-to-Lock Gestures: The “Pinch” gesture now locks the cursor and toggles system states, allowing for much tighter physical interaction with the 3D interface.

The “invisible interface” is finally start to be real

0

Log in to leave a comment

ChefThi worked on OmniLab

about 2 months ago

11h 47m logged

Turning OmniLab into a real HUD assistant: voice, vision and a more proactive AI persona 🎧🖐️

OmniLab has been my experimental lab for interfaces: 3D HUD, hand‑tracking, voice input, and AI all living in the same space. At the same time, life got busier: I started Computer Engineering, the campus is ~10 km away, and I’ve been splitting my time between classes, Blueprint hardware projects, and these software labs. That’s why commits came in bursts instead of daily drips — most of the work happened in small, tired, late‑night sessions.
Earlier this year I refactored the architecture to favor local‑first vision (removing a cloud version that was too high‑latency) and added the Web Speech API to the HUD, so I could trigger Gemini analyses via voice while the system tracked my hands in real time. That was the turning point: OmniLab stopped being “just a cool 3D scene” and started behaving like a genuine interface between my body, my voice and an AI brain.
Recently I pushed a big “SHIP‑ready” upgrade: Gemini integration is now first‑class, tests and CI/CD are in place, and the HUD feels more stable as a product, not just a demo. On top of that, I refined the AI persona: instead of only answering direct questions, OmniLab now makes proactive observations about what it sees and hears — it can comment on the scene, suggest next actions, and feel more like a lab partner than a tool.

Most of this evolution happened while juggling buses, deadlines and other projects, with Perplexity helping me reason about trade‑offs (what to keep in 3D, what to simplify, where AI actually adds value). This devlog is my way of catching the Flavortown timeline up with the reality: OmniLab grew quietly, but it grew a lot. ✨

0

Log in to leave a comment

ChefThi worked on OmniLab

2 months ago

5h 15m logged

OmniLab Devlog #1

I’ve officially kicked off OmniLab on my first laptop! Coming from a background of mobile development and browser-based IDEs, my first instinct was to keep everything “off-device”. I spent a good chunk of these 5 hours attempting to run the processing stack on a remote VM (Firebase Studio) and tunneling the HUD via a web page. However, the latency was unbearable for real-time tracking. I quickly realized that for a “Jarvis-like” experience, the vision loop must be 100% local.

Technical Hurdles & Git Mess

The first challenge was MediaPipe. I started with legacy code, but it wouldn’t play nice. I had to dive into the latest MediaPipe Tasks API docs to rewrite the landmark detection core. It’s much more efficient now, but the documentation shift caught me off guard.

Since I was jumping between cloud editing and local testing without properly cloning the repo first, I ended up with a mess of Git conflicts. I used the Gemini CLI as a mentor to help me untangle the branches, resolve the “already exists” errors, and get the local and remote repositories back in sync. It was a great lesson in maintaining a clean workflow on a new machine.

Current Progress

I’ve successfully implemented the “pinch” gesture logic (calculating the hypotenuse between thumb and index) and set up a local FastAPI server to bridge vision data to a Three.js HUD. The HUD now runs locally on Debian 13 (XFCE), which eliminated all the lag from my previous VM tests.

Timelapses

0

1

Log in to leave a comment

Comments

ChefThi 2 months ago

To clarify the technical choices: I’m focusing heavily on keeping the HUD lightweight on my new machine by using Debian 13 (XFCE) and optimizing the Python vision loop. I’m also studying the ada_v2 repository to implement better modularity in the UI layer. Integrating these clean interface concepts into a zero-latency environment is the main goal for the next update.

1 Follower

Shipped this project!

This is in my PC:

Comments

Shipped this project!

OmniLab Devlog // v0.3 Checkpoint

The Struggle (aka The Errors)

What’s Actually New

OmniLab Devlog #1

Technical Hurdles & Git Mess

Current Progress

Timelapses

Comments