Activity

ChefThi
  • feat: implement browser-side camera capture for Render deployment (02abd48)
  • feat: implement full cloud-vision pipeline for Render (clean code) (c5ecb5d)

making this devlog for carry the commits changelogs (If I use other machine or my phone it’s don’t get availiable, because this changelog feature is an extension (Chrome/Firefox)

Attachment
0
ChefThi

  • I’m basically working for the reship.
    For some reason, it seems the zip file in the repo was old. I tweaked a few things, made commits, and prepared to reship and make sure things are where they’re supposed to be.

Go to this page for view the Kitchen Mode from the extension Flavortown Kitchen

Take a look on the attached images in this post. (I flashed the photo from my phone because when I activate the printscreen app the pop-up of the extension 'hide'

Attachment
Attachment
0
ChefThi
  • fix(hub): restore dashboard styling and add live logs console (4cb8e6e)

for only send this Devlog with the commit changelog. It’s from the Spicetown browser extension. I’ll take the image from the Google Keep printed in the smartphone, before attached in Keep site (in my phone). Finally rescue this in Keep site on the PC -

Changes:
Restored the Cyberpunk Dashboard styling and added real-time activity feedback.

  • UI Restoration: Fixed a pathing error that disconnected the CSS during the FastAPI migration. The interface now renders correctly with full styling.
  • Live Logging: Integrated a scrollable console into the Dashboard for instant feedback on telemetry syncs and project creation.
Attachment
0
ChefThi
  • feat: implement SaaS identity foundation & hardened multipart production pipeline (ac49650)

The project took a major step toward becoming a real SaaS application with the addition of a user authentication system. This foundation allows users to create accounts, log in, and keeps their generations secure and separated.

Main changes included:

  • Implemented JWT-based authentication with registration and login endpoints.
  • Protected the AI and video generation routes so only logged-in users can access them.
  • Added support for users with basic quota tracking.
  • Hardened multipart file uploads with higher size limits and better validation to handle background music, audio tracks, and multiple images reliably.
  • Increased server timeouts and payload limits to support longer video rendering without interruptions.

These updates make the system more professional and prepare the ground for paid plans and user management in the future.

Early in the project, Docker and DevOps concepts needed significant learning and adjustment. Considerable time was also spent refining the FFmpeg configuration for reliable video assembly.

AI served as an accelerated learning companion rather than a replacement for hands-on work. Like in JWT system and learning this pratices

Attachment
Attachment
Attachment
Attachment
Attachment
Attachment
0
ChefThi
  • refactor: stabilize MCP bridge and pivot branding to HOMES (1ca3baa)
    to
  • feat: implement persistent stealth automation and Perplexity search engine (b40e296)

Today I pushed the automation layer to the next level with full persistent stealth capabilities using Playwright.

Created a suite of tools that reuse real Chrome sessions (launch_persistent_context + user_data_dir=./.playwright_data) and apply playwright_stealth to bypass detection. Focused on Perplexity AI as a powerful external brain:

New scripts added:

  • stealth_agent.py — headless/off-screen stealth navigation with anti-detection flags
  • perplexity_agent.py — persistent login flow (manual Gmail step + 180s wait)
  • find_history.py — searches and extracts OmniLab-related threads from sidebar
  • perplexity_chat.py — automates follow-up questions in existing threads
  • Helper scripts for layout inspection and screenshot validation

Intense after-class session. Seeing the stealth agent open Perplexity, find old OmniLab threads, and send a clean follow-up without triggering any blocks felt like unlocking a new superpower. The HUD can now decide to query Perplexity semantically via MCP and bring rich answers back to me.

Combined with the recent MCP bridge and robust demo mode, OmniLab is evolving into a true local command center that can use the entire web intelligently and feed my HOMES pipeline with high-quality data. Next: wire Perplexity actions directly into the gesture/voice flow and add mock versions for flawless demos.

** P.S. I used AI to structure this post. I organized and went through the things I had worked on and made a briefing of the things. Also, to test the scripts and some tests I used CLI to improve my errors and accelerate this test part.**

Attachment
0
ChefThi

Persistent Storage: Implemented a JSON-based database layer. Video projects and system telemetry are now automatically saved and restored on server restart, ensuring zero data loss.

  • Architectural Resilience: Improved error handling within the FastAPI lifecycle to manage local file I/O operations safely in the Termux environment.

I’m almost certain I won’t use widgets anymore. I even studied a bit and it worked to use KWGT, but the information wasn’t fully updated when using the phone normally. I also thought about it more and realized that what he was going to show I can deliver with a page/dashboard on the HUB’s localhost.

Attachment
Attachment
0
ChefThi

Transformed the Hub into a functional AI Video Orchestrator using a Manager-Worker architecture.

Engineering Updates

  • Project Factory: Implemented a FastAPI-based queue system. The Hub now manages video project lifecycles (Pending -> Completed).
  • Mobile Worker Bridge: Refactored the Mobile Agent to poll the Hub for tasks, establishing a live production synchronization loop.
  • Pure Architecture: Purged all legacy code and simulation noise. The repository is now a clean micro-service hub.
Attachment
Attachment
0
ChefThi
  • chore: refine core dependencies & standardize AI service provider logic (60bd97e)

Spent the last 30 minutes on core infrastructure alignment. I standardized safety thresholds across all AI providers (Gemini, Hugging Face, Pollinations) to BLOCK_NONE, ensuring creative prompts aren’t throttled by false-positive safety filters. I also refined the VideoController file interceptors to scale from 20 up to 100 simultaneous scene uploads, preparing the engine for long-form content generation instead of just short clips. Finally, I synchronized the server dependencies and generated an industrial-grade lockfile to ensure environment parity between local development and production Docker builds. The backend is now clean, consistent, and ready for the upcoming authentication layer.

Many errors comeback. But this is process…

Attachment
Attachment
0
ChefThi
  • Revise README for improved clarity and structure (67f9237)
  • Update model_id to ‘gemini-3.1-flash-lite’ (ea6b70f)
  • Update AI technology version in README (41cfad9)
  • feat: implement demo mode mocks and prepare MCP architecture (5e3e9f6)
  • feat: implement McpAgentBridge for semantic browser automation (344ab15)

Today I took the biggest leap toward a real local agent: implemented the official Playwright MCP (Model Context Protocol) bridge.

Instead of fragile direct navigation or pyautogui clicks, OmniLab now talks to the browser through semantic tools. The new McpAgentBridge class starts the @playwright/mcp server via stdio, manages ClientSession, lists available tools, and executes them cleanly with call_tool().

  • Full McpAgentBridge with start/list_tools/call_tool/stop
  • Integrated into FastAPI lifespan alongside the existing browser setup
  • Updated handle_agent_action() so BROWSER_SEARCH_RECIPE now triggers real MCP tools
  • Cleaned up old direct calls and unused imports

The flow is now: Gesture/Voice → Gemini decides action → MCP executes semantically in real Chromium → status update back to the tactical HUD.

It still needs more tool mappings and human-like delays, but the foundation is solid and future-proof.

Seeing MCP Agent Connected to Playwright Tools” and the first semantic action fire without breaking the HUD felt like JARVIS finally getting hands. No more “just describe the frame” — now it can actually DO things on the web and feed my HOMES pipeline.

Attachment
0
ChefThi
  • refactor: enhance backend resilience and establish identity foundation (5481ba8)

This session focused on hardening the backend infrastructure and preparing the project for its transition into a multi-user SaaS platform.

🛠 Technical Achievements

1. Advanced Backend Resilience

We addressed critical sync issues between the frontend and the database.

  • Auto-Project Provisioning: Refactored ProjectsService to implement an “Auto-Create” logic. The backend now automatically handles new project IDs generated by the frontend, eliminating the 404 “Project Not Found” errors during assembly.
  • Video Pipeline Hardening: Updated VideoService with automated directory management and granular error catching. This ensures that intermediate assets are correctly stored and managed, preventing the unhandled 500 errors previously encountered during the “Asset Preparation” stage.

2. Identity & Data Isolation Layer

Started the foundational work for the SaaS transition.

  • User Architecture: Implemented the UserEntity and the base structure for the AuthModule (Passport/JWT).
  • Relational Mapping: Projects are now linked to user profiles via TypeORM, ensuring that video generations are securely tied to their respective owners.

⏱ Hackatime & Sidequest Note

While monitoring my progress for the 10-hour LockIn sidequest, I discovered a discrepancy in my tracked time. I realized that I needed to explicitly link this project’s activity in Hackatime to ensure that every hour of refactoring and architectural design is correctly logged and counted toward the competition milestones.

Attachment
0
ChefThi
  • feat: NerveOS v0.4.4 - bidirectional hardware control (serial write), oled/reboot terminal commands and macro actions UI (8ea62da)

Now the OS can actually talk back to the hardware. I implemented serial write support, so I can send commands directly to the ESP32-S3. Added an 'oled' command to push text to the physical display and a 'reboot' command for the MCU. Also added a 'Quick Actions' section in the monitor with macro buttons for common tasks.

Attachment
Attachment
0
ChefThi
  • feat: NerveOS v0.4.3 - terminal history, auto-scroll, command ‘history’ and UI cleanup (920e458)

v0.4.3 - Terminal Power-Up
The terminal is now actually usable for work. Added command history so you can use the arrow keys to cycle through previous commands. Also added a ‘history’ command and auto-scroll logic so you don’t have to manually scroll to see the latest output. Much better for testing hardware commands.

0
ChefThi

Fix Video Assembly Error 500
We identified that the video assembly was failing due to a synchronous request pattern and strict file upload limits.

  • Payload Scaling: Increased the image upload limit in the VideoController to support up to 100 scenes per video, preventing server-side rejection.
  • Memory Optimization: Switched from direct stream piping to a background worker response. The backend now acknowledges the request immediately, preventing the browser from timing out.

Since the video rendering now happens in the background, the frontend was updated to stay in sync without locking the UI.

  • Async Tracking: Refactored App.tsx and ffmpegService.ts to implement a polling mechanism. The frontend now queries the /status endpoint every 10 seconds to monitor background progress.
  • URL Resolution: Added logic to resolve relative video paths from the backend, ensuring the final .mp4 is correctly displayed in the ResultView once ready.
Attachment
Attachment
0
ChefThi

Synchronized the TypeScript MCP Server with the FastAPI Hub, exposing Android hardware to LLMs.

Deliverables

  • Tool Mapping: Integrated get_mobile_status and send_mobile_command via the MCP protocol.
  • Hardware Bridge: Real-time access to Battery, RAM, and remote actions (Speak/Vibrate).
  • Gemini CLI Impact: Used the CLI to rapidly parse SDK docs and validate Zod schemas, cutting research time by 50%. Connectivity tests between TS and Python were performed directly via the CLI for instant feedback.
Attachment
Attachment
0
ChefThi
  • feat: NerveOS v0.4.2 - implemented global notification system (toasts) and event hooks (3d4414c)

Spent some time making the UI feel like a real OS. Added maximize and restore buttons, double-click on title bars to toggle full screen, and a proper focus system where clicking any part of a window brings it to the front. Taskbar buttons now show active states when an app is open.

Attachment
0
ChefThi
  • feat: NerveOS v0.4.1 - advanced window management (maximize/restore, focus-on-click, dbl-click bar) (7a0fc2b)

This is the big one. Today, NerveOS stopped being just a “pretty site” and became a real control center. ⚡

I implemented the Web Serial API. Why is this sick? Because now the browser can talk directly to the ESP32-S3 via USB at 115200 baud. No middleman, no local server—just raw silicon-to-browser communication.

What’s new:

  • Hardware Bridge: A new “LINK DEVICE” button in the HW Monitor.
  • Pulsing UI: The bridge button actually pulses when connected. It looks sick.
  • Canvas CPU Graph: Replaced boring numbers with a real-time pulse graph drawn on Canvas.

The vibe is officially “Absolute Cinema”.

Just finished a 30-minute deep work session on the window manager. If this is going to be a real OS, it needs to act like one.

Updates:

  • Maximize/Restore: Added the button. Clicking the window bar also toggles full screen. Super useful for when you need to focus on the Terminal logs.
  • Focus System: Clicking anywhere on a window now brings it to the top. No more hunting for the title bar just to bring a window to the front.
  • Taskbar Active States: The taskbar buttons now glow when the app is open.
Attachment
0
ChefThi

This session focused on standardizing the visual identity of the HOMES ecosystem by implementing a dynamic design system that bridges hardware status with UI aesthetics.

Dynamic Design System

  • State-Aware Branding: Refactored the Mobile Agent to calculate UI colors based on live hardware states (e.g., Cyan for optimal, Red for low battery, and Green for active charging).
  • Color-Encoded API: Enhanced the /api/widget endpoint to serve a dedicated color field, allowing Android widgets to dynamically update their appearance in real-time.
  • Unit Standardization: Unified data formatting for memory (MB) and storage (GB) metrics to ensure layout consistency across different widget sizes.
Attachment
Attachment
Attachment
0
ChefThi
  • feat: final tactical polish for SHIP - English demo mode and agent search fix (0710e20)
  • feat: implement high-fidelity International Demo Mode with mouse tracking and English localization (11bfc83)
  • docs: translate to English and enable automated GitHub Pages deployment (d7b8db4)
  • fix: ensure Demo Mode activates on GitHub Pages by handling Mixed Content WebSocket errors (71efe67)

Wrapping up things for this week of LockIn, I decided to make the DEMO via GitHub Pages as I had been doing, explaining I made it as a Mock for the reviewer and in general to let them test a bit of how it is without needing to install all the dependencies. Playwright, GEMINI_API_KEY, camera, etc.

The AI that I was using ended up making some changes in the server.py and html part so I tweaked a few things and delegated to it to fix what was missing. Then I asked for a deploy script in Actions and that’s what I got!

Attachment
0
ChefThi
  • feat: add projectId tracking and assembly status polling (3fb800d)

The pipeline was improved with better project tracking and asynchronous video assembly. Each generation now receives a unique project identifier, which helps organize and monitor the entire process from start to finish.

Main changes:

  • Added projectId tracking throughout the frontend and backend.
  • Changed video assembly to run in the background instead of blocking the interface.
  • Implemented status polling so the user interface automatically checks when the final video is ready.
  • Improved error handling and added retry logic for image generation to make the flow more stable.

These updates make the system feel smoother and more professional, especially when generating longer videos. The frontend no longer waits locked during the FFmpeg processing step.

Early in the project, Docker and DevOps concepts needed significant learning and adjustment. Considerable time was also spent refining the FFmpeg configuration for reliable video assembly.

The system got this error… ❌ Error in stage Asset Preparation: Backend error: 404 {“message”:“Project proj_1775505149988 not found”,“error”:“Not Found”,“statusCode”:404} I need to fix this

Attachment
Attachment
0
ChefThi
  • docs: add Branding Kit update devlog (7c5aa82)

V1.8 Creator Brand Kit

The engine has moved from a generic video generator to a personal studio. I implemented a modular branding system that allows the engine to adopt a specific creator’s visual and narrative identity.

Identity Injection

I added a BrandingLoader module that handles custom configurations for different profiles. This includes:

  • Style Prompts: The engine now injects a “Style Prompt” directly into Gemini. If a creator wants an “aggressive” or “minimalist” tone, the AI writes the script in that specific voice from the start.
  • Brand Colors: Visual themes are now driven by a brand_colors.json file, ensuring consistency across all renders.

Asset Prioritization

The rendering pipeline was refactored to prioritize local assets over generated ones:

  • Custom B-Roll: The engine now checks the branding folder for user-provided clips first. It only generates new media if it needs more footage to fill the duration.
  • Automated Overlays: Implementation of an automated logo overlay in the FFmpeg chain.
  • Signature Music: Support for specific background tracks, allowing creators to use their signature sound in every video.

This update pushes the project into v1.8, focusing on making the output feel like a professional product rather than a test script.

Attachment
Attachment
0
ChefThi
  • feat: implement automatic HUD orbit and fallback demo mode for reviewers (bfab249)

Today I made the HUD way more robust and demo-friendly — exactly what reviewers need.

Implemented an automatic fallback system: if no real data arrives from the WebSocket for more than 2 seconds (webcam offline, backend delay, or during recording), the HUD smoothly switches to demo mode with a beautiful orbiting cursor animation.

What was added in static/index.html:

  • Data flow monitoring with lastDataTime and 1-second checks
  • startDemoMode() using sin/cos math to simulate natural cursor movement, periodic pinch_progress scans, and fixed 60 FPS
  • Seamless transition: real WebSocket messages instantly stop the demo and take over
  • Improved onopen/onmessage/onclose handlers with auto-reconnect + fallback

The tactical UI now stays alive and immersive 100% of the time — perfect for videos, quick demos, or when showing the project without perfect hardware.

Quick but focused session after classes. Seeing the cursor start orbiting smoothly when I paused the vision server felt like magic. No more awkward “wait, it froze” moments during recordings.

Combined with yesterday’s AI mocks and DEMO_MODE, OmniLab is now extremely easy to showcase. Reviewers can open the page and immediately see the full Iron Man experience without any setup pain.

Attachment
0
ChefThi
  • feat: NerveOS v0.4.0 - implement Web Serial API bridge, canvas monitoring and hardware link UI (0d2455c)
  • fix: restore index.html integrity and finalize v0.4.0 hardware bridge implementation (Web Serial API & UI pulse logic) (6877cfa)

What happened

Alright so after posting the last devlog I realized two things: my settings weren’t saving (annoying), and NerveOS was still just a website — it couldn’t actually talk to the ESP32-S3 on the physical cyberdeck. Both of those are fixed now.

v0.3.1 — Quick QoL Fix

Small update but honestly it makes the whole thing feel way more polished. Your accent color and wallpaper now persist through localStorage — close the tab, come back, everything’s still how you left it. The taskbar buttons also light up when their window is open, so you always know what’s running. Oh and I added a theme command to the terminal, so you can type theme #ff0000 and the entire OS goes red instantly. Kinda unnecessary but really fun to mess with.

v0.4.0 — The Hardware Bridge

This is the big one. NerveOS can now connect to the actual ESP32-S3 over serial using the Web Serial API. Click “LINK DEVICE” in the taskbar (or the HW Monitor), pick the serial port, and boom — 115200 baud connection established. The button starts pulsing green, the status changes to “CONNECTED @ 115200”, and the terminal gets a new serial command to check link status.

The boot sequence also got updated — now it loads the persistence layer, mounts the filesystem, checks for Web Serial API support, and only then shows “Welcome back, Director.” It feels way more like booting a real device now.

The CPU graph in HW Monitor also got cleaned up — it uses the accent color dynamically instead of hardcoded green, so it matches your theme. The pulse animation on the connect button is my favorite part ngl. ⚡

Attachment
Attachment
0
ChefThi
  • feat: add demo mode and ai mocks (8ff8f22)

Demo Mode + AI Mocks Zero-Dependency Showcase for my first ship

Today I added a full demo mode so OmniLab can run beautifully without a webcam or real Gemini API key — perfect for quick testing, recording timelapses, and showing the project to others.

What was implemented:

  • New DEMO_MODE flag in .env (true = mocks everything, false = production)
  • Cycling mock responses with realistic 0.6s simulated latency
  • Guarded Gemini client creation so it only initializes when needed
  • Added 4 hand gesture sample images in static/demo/ for visual consistency
  • /analyze endpoint now returns clean JSON with demo: true flag when in mock mode

The HUD and gesture pipeline stay exactly the same — you still see the tactical overlay, pulse effect, and “Deep Scan” flow, but everything is simulated and stable.

After a long day of classes I wanted something that would let me record clean demos without fighting hardware. Turning DEMO_MODE on and seeing the mock responses flow perfectly into the HUD felt super satisfying. No more “sorry, needs webcam” excuses.

This makes OmniLab way more shareable and production-like. Combined with the recent Playwright stealth work, we’re getting closer to a full local agent that can demo real browser actions without any external dependency.

Attachment
0
ChefThi

Focused on bridging the HOMES Hub with native Android components and streamlining the ecosystem’s lifecycle.

Key Deliverables

  • Expanded Telemetry: Updated the Mobile Agent to monitor real-time RAM usage and WiFi connectivity.
  • Widget Provider: Implemented a dedicated /api/widget endpoint in FastAPI, providing a simplified JSON structure for Android widget engines.
  • Unified Startup: Created start-all.py to orchestrate the simultaneous launch and graceful shutdown of the Hub and Agent.

Status & Research

  • KWGT Prototyping: The backend is fully operational. I am currently studying KWGT (Kustom Widget Maker) to optimize HTTP polling and JSON path mapping for the final mobile UI.
    Gemini CLI Impact: The CLI accelerated the development of the orchestrator script and facilitated rapid debugging of the enriched telemetry logic within the Termux environment

Work Summary:

  • Status: Backend ready; Android UI in prototyping phase.
Attachment
Attachment
Attachment
0
ChefThi
  • fix: stabilize vision-server bridge and synchronize Gemini 3.1 models (81ac80e)

OMNILAB // RESILIENCE & BROWSER HANDS 🛡️

Spent the last session bulletproofing the core architecture. I refactored the vision module to use a multi-threaded loop so it doesn’t just die if the connection drops. Now it lowkey waits for the
server to come back online automatically—no more manual restarts. I also standardized everything on Gemini 3.1 Flash Lite for that low-latency speed boost.

The big win was expanding the gesture engine. I implemented Swipe, Thumbs Up, and Fist recognition, and mapped them to actual browser actions using Playwright. Seeing the HUD trigger a stealth search or navigate tabs just by moving my hand was the ultimate vibe check. I also hunted down a sneaky MediaPipe indexing bug that was causing hard crashes during fast movements. The invisible interface is finally starting to execute real intent instead of just describing the scene.

Attachment
0
ChefThi
  • feat: integrate gemini 2.5 flash multimodal image engine & refactor video assembly controller (dc475d9)

Devlog: Gemini 2.5 Flash Image Integration and Pipeline Refactor

The project received a significant update today with the integration of Gemini 2.5 Flash as a new multimodal image generation engine. This addition strengthens the visual creation part of the pipeline and improves overall reliability.

Main changes included:

  • Added support for Gemini 2.5 Flash to generate images directly from text prompts, using a 16:9 aspect ratio and high-quality PNG output.
  • Refactored the video assembly process to run in the background and return a video URL instead of streaming the file directly.
  • Updated several parts of the code for better stability, including fixes in the FFmpeg configuration and test scripts.

These improvements build on the previous parallel rendering engine and make the system more modular and ready for future scaling.

Early in the project, Docker and DevOps concepts required a lot of learning and adjustment. Considerable time was also spent refining the FFmpeg setup to handle video assembly correctly.

Media to be attached/linked:

  • Screen recording of the full system flow, now using the new Gemini 2.5 Flash image generation.
  • Sample videos generated with the updated pipeline to show improved visuals and background processing.
Attachment
Attachment
0
ChefThi

This session initiated a strategic pivot in our orchestration layer, migrating from Node.js to Python/FastAPI. The goal is to unify the ecosystem under a single language, leveraging Python’s superior SDK support for AI workloads and IoT.

Architectural Pivot: Node.js to FastAPI

  • The Shift: Migrated the central Hub from Express.js to FastAPI. This move aligns the Hub with the Mobile Agent, reducing context switching and enabling the use of high-performance asynchronous Python for device telemetry.
  • Dashboard Portability: Successfully ported the Cyberpunk monitoring interface to the new backend, now served via FastAPI’s static file handling.

Technical Challenges & Environment Tuning

  • Dependency Management in Termux: Encountered a build failure with Pydantic v2 (Rust-based) due to compilation constraints in the Android environment.
  • Resolution: Downgraded to Pydantic v1.10 to ensure a stable, compilation-free installation on Termux while maintaining performance and validation integrity.

Gemini CLI Integration

  • Acceleration: Using the Gemini CLI significantly shortened the migration window. The ability to instantly translate Express middleware logic to FastAPI decorators allowed us to reach a functional health-check state in under 10 minutes.
  • Rapid Debugging: The CLI was instrumental in diagnosing the specific Rust compilation error, allowing for a quick pivot to a compatible version without breaking the development flow.
Attachment
0
ChefThi
  • feat: NerveOS v0.3.1 - theme persistence, active taskbar states, and terminal theme command (f7fbda3)
  • Delete devlog.md (816e899)

This is a smaller update focused on refining the user experience. The main goal was to ensure that preferences are remembered and the interface feels more responsive.

New’s

Theme Persistence: Your accent color and wallpaper choices are now saved automatically. You no longer need to reset them every time you refresh the browser.
Active Taskbar States: The taskbar now highlights windows that are currently open, making it easier to track your workspace.
Terminal Command: Added a theme command to the terminal. You can now switch accent colors directly from the command line.

Attachment
Attachment
0
ChefThi
  • refactor: remove unused options page + add kitchen mode + i18n auto-detect (96d9694)
  • dist: ship version 1.4.0 with full i18n support and UI enhancements (f72d1d8)
  • Delete vtm_v1.3.1.zip (f30c126)
  • Bump version to v1.4.0 and update installation steps (990e028)

Quick morning sprint to push v1.4.0. Cleaned the last loose ends from yesterday and added some tasty new flavor.

Highlights

  • Full internationalization (I hope…) (i18n) with automatic language detection — now works smoothly for international hackers.
  • Kitchen Mode added: pure retro terminal feel with extra CRT crunch and no distractions. Perfect for deep focus sessions.
  • Smarter project context detection via chrome.tabs — auto-pulls Flavortown project info even better.
  • Fixed a sneaky connection error that was popping up on reload.
  • Removed unused options page and cleaned the dist folder (goodbye old zip).
Attachment
0
ChefThi

Today I integrated the engine deeper into the Android OS via Termux API. The focus was on user experience and system feedback.

Key Updates:

  • Haptic Feedback: The phone now vibrates upon successful render completion.
  • System Notifications: Implemented Android notifications to alert when a video is ready in the Downloads folder.
  • Audio Feedback: Added a voice confirmation (TTS) when the export process finishes.
  • Storage Fix: Hardened the file-saving logic to use reliable Termux storage paths.
Attachment
0
ChefThi
  • fix: resolve variable scoping in vision bridge and enhance loop stability (d38cadc)
  • feat: implement tactical control panel and visual telemetry fixes (1dd8917)

basically I found errors on the panel. It’s not appear correctly before.

  • I used the Gemini CLI for a quick and simple fix in this part :)

P.S. I noticed that my recorder don’t was saved. The Screenity extension got an error after I completed the video

Now there was a visitor at home then I went to greet them
-

Attachment
Attachment
0
ChefThi

This session focused on transforming the HOMES repository into a professional micro-service orchestration hub, separating core logic from high-level control and implementing industry-standard security measures.

🏗️ Architectural Restructuring

  • Module Promotion: Elevated homes-hub (Node.js) and mcp-server (TypeScript) to root-level modules for better maintainability.
  • Dependency Cleanup: Removed redundant engine-specific code (Python rendering logic) from the Hub repository to enforce a strict “Separation of Concerns.”

🔐 Security and Authentication

  • HMAC Middleware: Implemented SHA256 HMAC signature verification for all hardware-controlling routes.
  • Request Signing: Created a Python-based signing utility to generate valid X-HOMES-Signature headers, ensuring that only authenticated webhooks can trigger system actions.

🖥️ Monitoring and Interface

  • Telemetry Dashboard: Built a dark-themed monitoring interface serving real-time mobile status (battery, storage, and engine health).
  • Agent Synchronization: Refactored the mobile agent to automatically export telemetry data, resolving a synchronization lag between the hardware and the web dashboard.

Challenges Encountered

  • Repository Desynchronization: During the push process, a configuration mismatch in the local environment led to commits being directed to the HOMES-Engine repository instead of the HOMES Hub. This required a manual audit of both repositories, followed by a series of force-pushes and history corrections to restore architectural integrity.
  • Middleware Integration: Ensuring the Node.js HMAC middleware correctly parsed raw JSON bodies without interference from body-parser defaults required precise ordering in the Express middleware stack.
0
ChefThi

This session focused on making the extension production-ready by hardening security and improving data visibility.

  • Security (CSP): Refactored all UI events to proper listeners. This ensures 100% compliance with Manifest V3 security policies.
  • Visual Metadata: Added clear labels for task groups (Bugfix, UI/UX, etc.) and priority levels directly in the grid.
  • Improved HUD: Refined project detection logic to provide cleaner headers and better color-coded status messages on Flavortown.
  • Robustness: Added guardrails to the message-passing system to prevent errors when using commands on non-project pages.
Attachment
0
ChefThi
  • fix: resolve playwright-stealth imports and fastapi validation errors (9d2ada5)

Playwright Stealth + FastAPI Validation Fixed Browser Control Now Stable

Quick but important cleanup session today.

Fixed two blocking issues that were breaking the new browser automation layer:

  • Corrected playwright_stealth import and usage: switched from stealth_async to stealth so the browser launches with proper human-like fingerprints (anti-detection for Cloudflare, Google, etc.).
  • Enforced proper Pydantic validation on the /analyze endpoint: changed request: any to request: AnalyzeRequest (BaseModel with base64 image field). This prevents malformed payloads and makes the API more reliable when Gemini or voice triggers actions.

Also added the new libs for v0.3 (Playwright ecosystem + dependencies).

The pipeline is now much more solid: MediaPipe gesture/voice → Gemini analysis → execute_system_action → Playwright with stealth can open real tabs, navigate, and interact without immediate blocks.

Short focused session after classes. Seeing the stealth apply correctly and the FastAPI endpoint stop throwing validation errors felt like removing training wheels. No more random crashes when the HUD tries to trigger a browser action.

OmniLab is evolving from “cool HUD that describes frames” into a true local agent that can actually use the browser as part of my HOMES workflow. Next target: full BROWSER_ACTION handler with human-like delays and real task execution (e.g. open recipe site → extract ingredients → trigger HOMES-Engine). The invisible interface just got way more powerful.

Attachment
0
ChefThi

Today I moved forward with the HOMES Hub by introducing a simple web-based dashboard.
The new Control Center allows real-time monitoring of the mobile agent status. It shows battery level, storage information, and engine activity. Users can now send quick commands directly from the browser: make the phone speak a text, trigger a short vibration, or push a notification.
I created a clean interface with HTML and CSS, and connected it to the existing backend. The server now serves the dashboard pages and the status endpoint was improved to pull data from the Termux agent JSON file when available.
This change makes the system much more interactive. Instead of only watching logs, anyone can open the hub in a browser and see what the mobile device is doing, or control it with a few clicks.

Attachment
0
ChefThi
  • fix: remove inline onclick handlers (CSP violation MV3) (01d15ef)
  • fix: wire filter buttons via addEventListener, fix group/priority display, fix CSP (c239fc2)
  • dist: update final vtm_v1.3.1.zip and cleanup repository root (c52763b)
  • Delete vtm_v1.3.1_FINAL.zip (8aeb2ff)
  • Fix installation zip name and update release link (b04ee0e)

Rapid fire fixes today to ship a clean v1.3.1. Focused on the last undocumented bugs that were blocking a smooth reviewer experience.
Manifest V3 is strict with CSP — inline event handlers were breaking the extension on load. Fixed by moving everything to proper listeners. Filter buttons now actually work, priority glows and groups display correctly, and the download zip has a clean, consistent name.
The mic permission flow (already fixed earlier) is now even more stable with the cleaned init sequence. No more silent failures or broken UI elements.
Bumped to v1.3.1 — final zip ready, README updated, everything tested.
This version feels production-solid: voice commands + real-time Flavortown HUD + cyber terminal UI, all without security headaches.

Attachment
Attachment
Attachment
0
ChefThi
  • feat: core evolution - gesture control, Gemini 3.1 Thinking Mode, and modular HUD (64057c1)
  • add new libs for the v0.3 (634a0a4)

OmniLab Devlog // v0.3 Checkpoint

Yo, just dropping a quick update on what’s been happening with OmniLab. The last two commits were lowkey a mess—honestly, they were just checkpoints to save where I was at, so they didn’t really work out of the box.

The Struggle (aka The Errors)

So, when I tried to actually run the code from the recent pushes, the system basically threw a tantrum.

  1. Import Drama: In server.py, I tried to pull in stealth_async from playwright_stealth, but it just wasn’t having it. Total ImportError. Had to swap it for the standard stealth function to get the browser agent to even start.
  2. FastAPI Tantrum: The /analyze route was broken because I used any as a type for the request. FastAPI is super picky about that, so it crashed with a FastAPIError. I had to bring back the proper Pydantic models to make it happy again.
  3. Browser Missing: Playwright was installed but the actual Chromium browser wasn’t there. Pro-tip: playwright install sometimes fails, so using python -m playwright install chromium is the way to go.

What’s Actually New

Even though it was bumpy, we got some cool stuff in:

  • Gemini 3.1 Thinking Mode: The brain is officially upgraded. It’s faster and actually “thinks” before it gives you the tactical report.
  • Pinch-to-Scan: This is the best part. You don’t have to yell at the mic anymore. Just hold a pinch gesture for 1.5s, the HUD ring scales down and changes color, and boom—it triggers a deep scan.
  • OmniBrowser Agent: We added Playwright so the HUD can lowkey browse the web for you. It’s not fully “Jarvis” level yet, but it can navigate and pull data in the background.
  • HUD v2.1: New tactical UI with a log console at the bottom and real-time FPS/latency tracking so you know the system isn’t lagging.
0
ChefThi

Voice Task Master (VTM) Technical Documentation

The final development sprint for Voice Task Master (VTM) focused on deep integration, establishing a centralized “Mission Control” for the Flavortown ecosystem.

Universal Project Bridge

VTM implements the chrome.tabs API to monitor browser context. It automatically identifies the active project at flavortown.hackclub.com.

  • Contextual Tagging: Tasks are automatically assigned the corresponding Project ID.
  • Dynamic HUD: The on-page interface filters the backlog in real-time to synchronize with the current “Ship” profile.

Mission Control & “Ship It” Mode

  • Neural Commands:
    • "Ship it!": Triggers a visual confirmation on the Flavortown DOM and executes auto-archiving for all completed tasks.
    • "Generate log": Generates a categorized Markdown summary of progress. The output is copied to the clipboard, pre-formatted for shipping reports.

Power-User UX & Performance

  • Global Hotkey: Ctrl + Shift + V provides instant voice uplink activation across any browser tab.
  • Native Drag & Drop: Priority management is handled via a manual reordering system built on the zero-dependency HTML5 Drag and Drop API.
  • Smart Grouping: An automated keyword analysis engine categorizes tasks into three primary streams: Bugfix, UI/UX, and Ship Log.
Attachment
Attachment
Attachment
0
ChefThi
  • feat: evolve OmniLab into an active command center for HOMES ecosystem (dc3b7ba)

OmniLab becomes Active Command Center for HOMES 🔥 Gesture → Real Action

Today I took the biggest step yet: turning OmniLab from a passive scan tool into a true command center that can execute actions inside the HOMES ecosystem.

Major refactor in server.py:

  • WebSocket connections now use sets for true O(1) operations
  • Re-used the image caching + resize pipeline (MD5 dedup + 512×512 JPEG)
  • Added execute_system_action() handler with real examples:
    • “HOMES_EXECUTE_TASK” → placeholder to trigger Termux workers / video rendering
    • “BROWSER_NAV_NEXT” → pyautogui hotkey (Ctrl+Tab) as proof-of-concept
  • Broadcast logic cleaned up so vision → HUD communication stays rock-solid

The flow is now: Pinch gesture (or voice) → MediaPipe → Gemini analysis → action decision → execute locally or fire HOMES pipeline.

It still needs the actual webhook to HOMES-Engine, but the architecture is solid and the HUD stays responsive.

After classes I went straight into a long refactoring session. Seeing the action handler print “Executing HOMES_EXECUTE_TASK” for the first time felt like JARVIS finally waking up. No more “just describe the frame” — now it can DO something.

OmniLab + HOMES together are starting to feel like a real personal AI operating system. Next: full voice + gesture synergy and actual integration with HOMES worker queue. The invisible interface is getting dangerous. 🤖⚡

I get a bit lost during this development. But we got improvements!

0
ChefThi

I continued conduzing, I evolved Voice Task Master from a personal tool into a universal ecosystem utility for the Hack Club Flavortown community. Focused on “Project Awareness” and dynamic UI adaptation.

The Problem:

Previously, the extension used static tags, making it feel like a “mock” or restricted to a single project. Developers shipping multiple projects needed a way to isolate tasks without manual sorting.

The Solution: Universal Project Bridge

  • Chrome Tabs Intelligence: Integrated the chrome.tabs API to monitor the active browser context. VTM now “sees” which Flavortown project you are currently visiting.
  • Dynamic Auto-Tagging: When adding a task (via neural uplink or text), the extension automatically “stamps” it with the current project ID (e.g., #4322). This links your backlog to your ship automatically.
  • Context-Filtered HUD: The on-page HUD now filters tasks in real-time. If you switch from Project A to Project B on the site, the VTM HUD instantly swaps the task list to match your current ship.
  • Adaptive UI: The filter buttons in the popup now dynamically rename themselves to match the ID of the project you are working on, providing a truly integrated experience.

Technical Achievements:

  • Refactored the data layer to handle dynamic tag injection.
  • Implemented cross-script synchronization between the popup and the Flavortown content script.
  • Optimized the HUD to be non-intrusive yet project-aware.
0
ChefThi

The Cloud-to-Phone Bridge

Biggest update this sprint: the “Future Pack” dropped. It’s a whole Node.js Hub acting as a mailbox. Since the Android phone is usually stuck behind a firewall, the Hub stores commands and the Python agent polls it every 5s. Now an AI or n8n workflow can literally tell the phone to vibrate, change brightness, or speak. It’s basically remote controlling hardware through a queue.

🤖 Agent QoL & Massive Cleanup

The HOMES agent got a background Wakatime bot to farm coding hours automatically. It also dumps a homes_status.json file so Android widgets (like KWGT) can show live battery/engine stats. On the cleanup side, 1,500 lines of old core/ files got yeeted because they belonged to the Engine repo, not here. The README got a glow-up with badges and a clear architecture table.

😤 The Struggle

Getting FFmpeg to do frame-perfect math for the zoompan filters on mobile ARM64 without crashing was annoying. Threading the Python polling loop so it doesn’t freeze the Termux terminal also took a hot minute to debug.

Attachment
Attachment
Attachment
0
ChefThi
  • feat(extension): implement real-time HUD sync for Flavortown (50min elite sprint) (d1b6b86)

In this session I completed tje development sprint focused on transforming Voice Task Master (VTM) into a real-time productivity bridge for the Hack Club Flavortown ecosystem.

Major Feature: The “Ship Mode” HUD

  • Real-time Synchronization: The extension now injects a transparent, neon-styled HUD directly into the flavortown.hackclub.com interface.
  • Bi-directional Bridge: Using the Chrome Storage API, tasks added via voice or text in the popup now appear instantly on the Flavortown project page. No need to click icons; your backlog is always visible while you ship.
  • Context Awareness: The HUD automatically detects the current project context from the page’s H1 tag and updates the session tag (e.g., TARGET: Voice Task Master).
  • Session Tracking: A live session timer is now visible on the HUD to track “ship time” without leaving the browser tab.

Technical Improvements:

  • Visual Priority Glows: Integrated automated keyword detection to assign #ff003c (Critical) and #ffcc00 (Ship) pulsing borders for high-impact tasks.
  • UX Polish: Refined the popup width to 450px and implemented a 3-second auto-clear timer for the hint system.
  • Packaging: Generated the official .crx and .pem files directly via terminal for the v1.1.1 release

I found it very strange that the video I attached in this Devlog was almost 100MB for just 1min50. I asked Gemini CLI to compress it for me through the terminal to fit in the Devlog. This was different… From 98MB to 16MB

Attachment
Attachment
0
ChefThi
  • feat: implement industrial-grade background rendering & static ffmpeg distribution (c60930f)

The pipeline went from “works if I don’t breathe” to something that runs unattended. No new features — just tearing down everything that could kill a long render.

What changed
Background Worker — The controller used to hang the request, run FFmpeg, and pipe the stream into the response. Close the tab = lose everything. Now it fires in background and returns JSON with projectId + future videoUrl. Video lands in server/public/videos/, served statically by NestJS. Close the browser, the server keeps going.

Static FFmpeg — Bundled ffmpeg-static and ffprobe-static. VideoService constructor sets paths via ffmpeg.setFfmpegPath(). Zero external dependency. Dockerfile still installs libfontconfig1 and libfreetype6 for text filters, but the binary is ours.

15min Timeout — Node default is 2min. Added server.setTimeout(900000) so image/audio uploads don’t get killed mid-transfer.

Clean Dockerfile — 3 stages: frontend-builder (Vite), backend-builder (TS), production (slim, artifacts only). Migrated from node:18-alpine to node:20-slim — Alpine was causing native module headaches.

Validation — BadRequestException when audio or images are missing. Before this, the error only surfaced deep inside FFmpeg as a cryptic “No such file”.

WakaTime sync
Recovered lost hours today. Reinstalling the extension and switching directories broke the project identity — hours scattered across 5+ phantom entries. Fixed it by adding .wakatime-project at repo root. Lesson: this file is the .gitignore of time tracking. It belongs in commit

0
ChefThi

Today I focused on testing the “Voice Mode” pipeline. The goal was to ensure that a spoken idea could be transformed into a cinematic video without typing a single word.

🎙️ Voice Input Testing

I integrated the Termux API’s speech-to-text functionality with the v1.7 rendering engine. It captures audio from the mobile microphone, converts it to text via Google services, and immediately triggers the script-to-video workflow.

🛠️ Stability and Bug Fixes

During testing, I identified and fixed two critical issues:

  • Audio Mastering Restore: Fixed a bug where the EBU R128 loudness normalization filter was missing from the FFmpeg engine after a recent cleanup.
  • Reliable Export: Refactored the file saving logic. Instead of trying to write directly to the Android root, the engine now uses Termux symbolic links (~/storage/downloads). This fixed the issue of videos not appearing in the gallery.
0
ChefThi
  • feat(extension): finalize universal HUD synchronization (50min elite sprint complete) (55a805a)

Today I focused on making the VTM interface more than just a task list.
Development sprint, I implemented a visual hierarchy system that automatically responds to voice and text commands.

Key Technical Updates:

  • Visual Priority System: Tasks now have three distinct visual states:
    • CRITICAL: High-intensity red pulse glow for urgent tasks.
    • SHIP: Golden glow for project-related milestones.
    • BACKLOG: Default neon green for standard items.
  • Automated Keyword Detection: The VTM engine now scans input strings for keywords like “critical”, “urgent”, “ship”, or “launch”. It automatically assigns the correct visual priority without manual selection.
  • UI UX Polish: Integrated a 3-second timer for the hint system (notifications now clear themselves automatically) and added a “flash” animation to confirm task injection.
  • Repo Architecture: Cleaned the main repository by moving simulation scripts and logs to an isolated TESTES/ directory, ensuring only the core extension is tracked.
Attachment
0
ChefThi
  • perf: implement image caching, O(1) connections, and asset optimization for sidequest v0.2 (1738e81)

I just finished a heavy optimization session to kill the lag in OmniLab. I sat down with my AI assistant to tear apart the bottlenecks, and we managed to turn this from a “cool prototype” into a high-performance local AI.

What we changed (The “Brain” Upgrade):
Smart Memory: The system now remembers what it just saw. Using Image Caching, it won’t waste time or API tokens re-analyzing the same frame if nothing has moved. It’s like giving the HUD a 30-second short-term memory.

Instant Connections: I swapped how the HUD tracks connections. By moving from “lists” to “sets,” the system now handles multiple data streams instantly, no matter how many are running.

Lightweight Assets: We automated an image-shrinking process. Before sending anything to the cloud, the HUD now compresses and resizes frames. This makes the data 84% lighter without losing the “vision” quality Gemini needs.

The Numbers (Why this matters):
Speed: Response time dropped from 820ms to 540ms. It feels way snappier.

Efficiency: We went from 60 API calls per minute down to just 3 or 8. No more wasting tokens on duplicate images.

Stability: The HUD is buttery smooth now, even during heavy “Deep Scans.”

It started as a quick after-class session and turned into a solid grind. Seeing the latency numbers drop in real-time was incredibly satisfying.

Attachment
0
ChefThi
  • feat(extension): implement UI flash feedback, redundant backup and UX polish v1.0.3 (9a3782f)

Quick late-night push after yesterday’s big polish devlog. Focused on the undocumented bits that were still rough around the edges (especially the mic permission edge-cases I kept forgetting to fully call out).

What Got Shipped Today (ad8c985)

  • UI Flash Feedback — added instant neon “task accepted” blink on voice input. Makes the whole terminal feel alive instead of silent.
  • Redundant Backup Layer — doubled down on chrome.storage with a secondary grid snapshot. Zero data loss even if popup crashes mid-command.
  • UX Polish v1.0.3 — tighter padding, smoother re-renders, live mic status icon so you always know when it’s listening.
  • Init Function Cleanup (ad8c985) — refactored the startup sequence that was causing the ghost popup on first mic grant. Now it fails gracefully and points straight to chrome://settings/content/microphone.

The Microphone Problem (the part I kept forgetting to document)

Web Speech API in MV3 popups is brutal. First load → “not-allowed” → popup dies silently. Fixed it weeks ago with the config.html fallback tab, but the init code was still messy. Today’s cleanup makes the fix rock-solid. No more “why isn’t it listening?” moments.

Bumped to v1.0.3 (new zip ready). Feels production-ready now

Attachment
0
ChefThi

To make it feel like real Absolute Cinema, I moved from basic VTT to ASS format for that nice word-level highlighting (karaoke style). I also added dynamic color grading with contrast, saturation and vignette, plus proper audio mastering using EBU R128 at -14 LUFS so every video comes out with consistent professional volume.

The Ken Burns zooms are now smoother and optimized for ARM64/Termux. And yeah, I spent time cleaning up all that legacy lab code — the repo is leaner with 34 commits and a much cleaner modular structure.

_ Iterative wins on mobile are tough, yet the pipeline finally feels ready._

Attachment
Attachment
0
ChefThi

New Feature: Get DB Pipeline

Background Rendering: We’ve moved away from the model where the browser would “hang” waiting for the video. Now, the backend starts a background worker. The user can close the tab, press F5, or even restart the PC; the server continues rendering the video silently.

End of 503 Timeout: I adjusted the server to a 15-minute connection limit (900000ms). Long videos are no longer interrupted by network limits.

Real Persistence: The video is no longer just a temporary “blob.” It is now physically saved in server/public/videos/ and the link is registered in SQLite.

Static Asset Service: We configured NestJS to serve these videos via a fixed URL, allowing the user to retrieve their creations at any time from the gallery.

I’ve spent the last few hours solving a critical UX problem: the loss of progress in long renders. Implementing a background worker architecture with disk persistence transformed the app from a prototype into a real production tool.

0
ChefThi
  • feat: add HUD demo mode, scan pulse effect, and dynamic port binding (90ab4c8)
  • feat: core HUD improvements and repo cleanup (c82deea)
  • chore: ensure all private project files are untracked (2daea10)
  • feat: implement concurrent vision processing and HUD fail-safe systems (a954cfb)

Concurrent Vision + HUD Fail-Safes Parallel Power Unlocked

Big day — I finally tackled the last major bottleneck: sequential scan delays.

Implemented concurrent vision processing so frame capture, MediaPipe analysis, WebSocket transmission and Gemini 3 Flash calls can run in parallel without blocking the main HUD thread. Added robust fail-safe systems (graceful degradation, timeout recovery, and fallback states) so the interface never freezes even if the LLM takes longer than expected.

What landed in this session:

  • Full concurrent pipeline using Python asyncio + ThreadPoolExecutor for vision tasks
  • HUD fail-safe layer with visual indicators when processing is happening in background
  • Minor core improvements and repo cleanup (removed private files from tracking)
  • Combined with yesterday’s demo mode, scan pulse effect and dynamic port binding — the whole system now feels way more stable and production-like

After classes I went straight into a long session. Seeing the scan pulse animate while the AI thinks in the background without any stutter… that’s the JARVIS moment I’ve been chasing.

FOR THIS DEVLOG ONE THING ARE HAPPENED. THE MODEL OF I SET (gemini-3-flash-preview) WAS EXPERIENCING A HIGH DEMAND. SO I SWITCHED FOR THE 3.1-lite-preview

Attachment
Attachment
Attachment
0
ChefThi

The highlight of these last tweaks is definitely Option 99 (Autonomous Mode). The engine now runs in a continuous loop, grabbing scripts from the queue and rendering everything without me touching a single button. Pure magic after all those manual tests.
It took me longer than I wanted because I had to make the queue stable on mobile and handle errors gracefully.

This is the kind of feature that makes the whole project feel next-level. Next hour I’ll talk about the visual polish! 🎨

0
ChefThi

After some long nights fighting VTT sync on Termux, the engine finally jumped to v1.7. It’s no longer just a script; it’s starting to feel like a real autonomous worker.

I spent way the time just staring at timing errors and testing the same short clips over and over. I took a long time with this because of Termux limitations on ARM64 — every time I adjusted one thing, another broke. But it was worth it. The pipeline is more solid now and the project has gained a much more professional look.

Small wins, but they add up.

Attachment
Attachment
0
ChefThi

I basically edited a bit the banner and prepared for the new reship.

  • Updated the description and AI declaration

Timelapse:
editing more the banner

Attachment
0
ChefThi
  • feat: complete gesture-to-scan logic and HUD v2.1 tactical UI (8f09f3b)

Gesture-to-Scan Complete + Tactical HUD v2.1

Today I finally closed the loop on the most important interaction of OmniLab: turning a simple hand gesture into a full AI-powered scan.

The big challenge was making the flow feel instant and reliable. I refined the MediaPipe Tasks API logic so the Pinch gesture (held for 1.5s) now reliably captures the webcam frame, sends it through the local FastAPI pipeline, and triggers Gemini 3 Flash Vision without breaking the HUD.

What’s new in this push:

  • Improved state management so the system no longer queues scans sequentially — each Deep Scan now feels more independent
  • Small cleanups in api/v1, core, vision.py and utils for better maintainability

It’s still not 100% min-latency (Gemini still takes a moment to think), but the difference from last week is huge. The HUD now truly reacts to my hand like JARVIS would.

Late-night session after classes, but seeing the tactical report pop up instantly after the pinch hold made it all worth it. The invisible interface is getting closer every commit.

0
ChefThi

The work focus was on shifting from “functional” to “polished.” After testing the VTM extension in a real-world environment, I identified and solved two critical bottlenecks in the user experience.

UI Scaling & Visual Breathing Room

Based on visual analysis of the CRT-style interface, the original 350px width felt “cramped” for a terminal-style app.

  • The Fix: Expanded the global layout to 450px with an adaptive min-height of 550px.
  • The Result: Better alignment for the “Neural Uplink” controls and improved readability for the task grid, especially on high-DPI displays.

Solving the “Ghost Popup” Permission Bug

One of the most frustrating issues with Chrome Extension development is the popup closing automatically when a browser-level permission (like the Microphone) is requested.

  • The Hack: Implemented a smart Permission Fallback Logic.
  • How it works: If the Web Speech API returns a not-allowed error, VTM now automatically opens a dedicated configuration tab. This allows the user to grant microphone access persistently without the popup vanishing. Once granted, it works seamlessly in the “Uplink” popup forever.

Real-World Aesthetic Validation

Tested the CRT Flicker and Scanline effects on physical hardware. The neon-green glow perfectly simulates a high-intensity developer terminal, matching the project’s “Hacker Aesthetic” goal.

Attachment
0
ChefThi
  • feat: refactor NerveOS v0.2.0 - deep work modular architecture & absolute cinema UI (a082b8b)
  • feat: NerveOS v0.3.0 - system settings, real-time CPU graph, and mock filesystem (3ec3178)

Okay so after my last devlog where I showed the v0.1.0 prototype, I basically went dark for two weeks and came back with a completely different beast. I didn’t just add features — I rebuilt the whole thing from scratch. Twice. Here’s what happened.

What was done
v0.2.0 — The “Absolute Cinema” Release:

I wanted NerveOS to feel like actually powering on a device, not just opening a webpage. So I added a boot sequence that simulates hardware initialization line by line — kernel, ESP32-S3 link, OLED check, encoder handshake — then hits you with “Welcome back, Director.” 🎬

Then I built the entire window system from scratch. Drag and drop, z-index stacking, open/close, glassmorphism panels with that blurry cyberpunk look. The terminal went from a stub to a real shell with 7 commands (help, status, ls, clear, echo, uptime, version). I also added a HW Monitor that shows live encoder RPM and uptime, a Notes app that saves to localStorage, and an About window with a spinning hex logo. The whole UI got the “Absolute Cinema” treatment — JetBrains Mono font, neon green accents, glow effects on hover, Unsplash wallpaper. Zero dependencies. No frameworks. Just vanilla JS doing its thing. 🖥️

v0.3.0 — Personalization & Polish:

This one was about making it feel like yours. I added a Settings window where you can pick from 4 accent colors (neon green, cyan, red, yellow) and it instantly recolors the entire OS — borders, glow, terminal prompt, everything. You can also swap wallpapers between three options.

Attachment
Attachment
Attachment
0
ChefThi
  • build: package AstroLab for PyPI distribution 📦 (9a3782f)
  • ci: add automatic PyPI deployment with Trusted Publishing 🚀 (19e405e)

After shipping the MVP, I spent this session turning AstroLab into a proper pip package and setting up automated PyPI deployment. Mostly infrastructure work, nothing fancy.

What was implemented

PyPI Packaging

Created pyproject.toml and restructured astrolab/ into an installable package. Now anyone can run pip install astrolab-cli without cloning the repo.

Automated CI/CD

Added a GitHub Actions workflow that publishes to PyPI automatically using Trusted Publishing

Yesterday I saw people saying the Sidequest ended on the 30th and got kind of bummed because I had committed the MVP delivery on the 31st — thought I missed the deadline by one day. Then I found out they extended it to the end of the month.

Attachment
0
ChefThi
  • fix(demo): translate offline cache to english and adjust quiz sequence to DBBAC (105962c)
  • fix(api): update gemini model to 2.5-flash due to deprecation (1c3ba3a)
  • final: delivery version 1.0 (MVP) 🚀 (24a859c)

Overview

This final session wrapped up AstroLab into a polished Minimum Viable Product (v1.0) ready for submission. The focus was on stability, reviewer experience, and completing the internationalization and offline capabilities started in previous sessions.

What was implemented

Final Polish & MVP Delivery

  • Marked the project as version 1.0 (MVP) with the commit final: delivery version 1.0 (MVP)
  • Consolidated all features into a stable, consistent release

API Stability Fixes

  • Updated Gemini model to gemini-2.5-flash after deprecation of the previous version
  • Ensured the Smart Demo Mode continues working reliably

Current Features (fully working in v1.0)

  • ./astrolab → launches interactive main menu
  • apod → NASA Astronomy Picture of the Day with explanation
  • quiz → 5-question interactive quiz + Deep Dive AI explanations on wrong answers
  • flashcard "<topic>" → generates and saves themed flashcards
  • review → reviews the personal flashcard deck
  • stats → study history with progress bars

AstroLab is now a practical, well-documented CLI tool designed to help beginners bridge the gap between astronomy, physics, and real-world software engineering. This marks the successful completion of my project for the Hack Club Sidequest Challenge.

Attachment
Attachment
0
ChefThi
  • feat(i18n): translate entire project to English and add ‘astrolab’ init script 🌎 (cfebf53)

Overview

This session focused on internationalization and usability improvements. The entire project was translated to English, making AstroLab more accessible to a global audience, and a convenient astrolab executable was added for easier launching.

What was implemented

1. Full Project Internationalization

  • Translated all user-facing strings, comments, and documentation to English
  • Updated README.md, main.py and files inside src/ and data/
  • The project is now fully in English while keeping the technical depth and educational focus

2. ‘astrolab’ Executable Script

  • Added ./astrolab script at the root for quick and clean launching
  • Running ./astrolab now starts the interactive main menu directly
  • Improved developer and user experience (no more typing python main.py)

3. Supporting Updates

  • Refined README.md with clear instructions for the new executable and Smart Demo Mode
  • Minor cleanups to maintain consistency across the translated codebase

Challenges & Solutions

  • Translating technical terms consistently while preserving meaning → careful review of astronomy and programming vocabulary.
  • Ensuring the new executable works smoothly on different environments (including Termux) → made it simple and cross-platform friendly.
  • Keeping the codebase maintainable after large-scale text changes → used systematic replacement with manual verification.
    For this part I used Gemini CLI
Attachment
Attachment
0
ChefThi
  • feat: Integrated Devpost API and AI Daily Strategy engine (a108ed8)

The aggregator took another solid step forward with a focused update that strengthens data collection and adds intelligent daily guidance.
News

Full integration of the official Devpost API (https://devpost.com/api/hackathons) replacing previous placeholder logic in main.py.
New src/sources/devpost.py module with fetch_devpost() that pulls up to 20 upcoming hackathons, including title, URL, cleaned prize amount, submission deadline, and structured opportunity data.
HTML sanitization via clean_html() to properly handle prize fields.
Addition of generate_daily_strategy() method in src/scorer.py that uses Gemini to analyze the top scored opportunities and produce a concise 3–4 sentence strategic recommendation in Portuguese, taking the user profile into account.
Updated src/config.py to include newer Gemini model variants for better fallback behavior.
main.py now calls the Devpost fetcher and prints the AI-generated daily strategy after scoring.

Attachment
Attachment
0
ChefThi
  • feat(demo): add Smart Demo Mode with rich offline cache for reviewers 🛡️” (cfebf53)

Overview

This session introduced a Smart Demo Mode to AstroLab, making the tool more resilient and reviewer-friendly. The main goal was to add rich offline functionality so the system can still deliver educational content even without internet access or API keys.

What was implemented

1. Smart Demo Mode & Offline Cache

  • Added data/demo_cache.json with a rich collection of pre-generated astronomy content:
    • Multiple interactive quizzes
    • Themed flashcards
    • Detailed deep-dive explanations (e.g. gravitational time dilation, Doppler effect, black holes)
  • The system now gracefully falls back to this cache when the Gemini API is unavailable.

2. Improved Gemini Client

  • Updated src/gemini_client.py to support offline-first behavior
  • Changed to the more stable gemini-1.5-flash model
  • Added robust fallback functions for quiz, flashcard, and deep-dive generation

3. Project Quality & Security Improvements

  • Created .env.example with clear placeholders for NASA_API_KEY and GEMINI_API_KEY
  • Added GEMINI.md — a comprehensive guide defining the AI persona (Senior Software Engineer & Technical Mentor), coding standards, project structure, and environment constraints (CLI-only, Termux/Android friendly)
  • Added commit_and_push.sh — a safe automation script that stages changes, automatically excludes .env, and pushes to main
  • Cleaned up old unrelated files from learning/PWM_LED_CONTROL/ and projects/nerve-system/

Used Gemini CLI to keep the correct syntax and accelarate this mins parts

Attachment
Attachment
0
ChefThi
  • chore: update gitignore to include private tool sandboxes (6e6052a)

Minor Chore Update and Preparation for Parallel Rendering Engine

A small but necessary chore update was applied to the repository: the .gitignore file was adjusted to properly exclude private tool sandboxes and temporary workspaces. This prevents accidental commits of sensitive or environment-specific files during active development.

At this commit, the project structure already includes:

  • A dedicated devlogs/ directory for technical progress records.
  • Clear references in the README to participation in Hackatime (Flavortown), emphasizing the value of documented development steps.

No functional changes were made to the pipeline in this specific commit, but it immediately precedes the implementation of the parallel rendering engine, project metadata tracking, and auto-cleanup features.

Media to be attached:

  • Full system screen recording demonstrating the current end-to-end flow (topic → script → visuals → narration → final video assembly) using the React UI.
  • Sample videos generated by the system (short 60-second example + one longer test video) showcasing narration quality, image redundancy, smart subtitles, and background music ducking.

These recordings highlight the stability achieved after the recent industrial-grade backend refactor and prepare the ground for the upcoming parallel rendering improvements.

0
ChefThi
  • feat(cli): add interactive menu and elegant API key warnings ✨ (51299a8)
  • feat(flashcards): add deck manager to save and review generated cards 🃏 (c41f658)

Overview

This session continued the evolution of AstroLab by implementing a persistent flashcard deck manager and refining the overall CLI experience. The focus was on turning generated flashcards into a reusable study tool and making the system more user-friendly.

What was implemented

1. Flashcard Deck Manager (feat(flashcards))

  • Added deck management system to save generated flashcards
  • New command: python main.py review — allows reviewing saved cards
  • Flashcards are now persisted in the data/ directory for future sessions
  • Improved flashcard generation with better structure for long-term review

2. CLI Enhancements

  • Introduced an interactive main menu when running python main.py with no arguments
  • Added elegant warning messages for missing or invalid API keys
  • Refined command flow for smoother user interaction

3. Technical Updates

  • Updated README.md to reflect all current commands
  • Maintained clean project structure with dedicated folders for sessions, devlogs, and data

Challenges & Solutions

  • Ensuring flashcards could be reliably saved and reloaded required careful JSON handling and file path management.
  • Balancing the new interactive menu with existing direct commands needed clean argument parsing.
  • API key feedback was improved to guide users without breaking the flow.
Attachment
Attachment
Attachment
0
ChefThi

This session focused on extending the AstroLab study system beyond the initial NASA APOD + Gemini integration. The main goals were to introduce basic gamification elements and improve the learning experience through more detailed AI feedback.

1. Session History & Statistics Module

  • Added persistent session tracking (data/sessions/)
  • Created stats command that displays:
    • Total study sessions
    • Average score
    • Questions answered
    • Progress visualization using CLI progress bars
  • All session data is now saved automatically after each quiz

2. Deep-Dive AI Explanations

  • When a user answers a quiz question incorrectly, the system now triggers Gemini to generate a detailed, educational explanation of the correct concept.
  • This turns mistakes into rich learning opportunities instead of simple “wrong” feedback.
  • Explanation includes context related to the Astronomy Picture of the Day when relevant.

3. Technical Improvements

  • Fixed Gemini client configuration and error handling
  • Updated .gitignore to properly exclude test files and temporary data
  • Minor refactoring for better command structure

Challenges & Solutions

  • Gemini API responses were sometimes inconsistent in format → improved prompt engineering and added fallback parsing logic.
  • Session data persistence required careful handling of JSON serialization → implemented a simple but robust SessionManager class.
  • CLI output formatting took longer than expected to make it clean and readable.
Attachment
Attachment
Attachment
1

Comments

ChefThi
ChefThi 19 days ago

Forgot the changelog:

`- feat(gamification): add session history and CLI stats table 📊 (78b8eae)

  • chore: ignora TESTES e corrige gemini client (6ea0c86)
  • feat(deep-dive): add AI-powered detailed explanation for wrong answers 🧠 (e7ae7c4)`
ChefThi
  • refactor: industrial grade backend architecture, image redundancy & resilience (9b939b0)

**Backend Architeture & Resilience
(focused refactor session)

The backend architecture received a major upgrade to industrial-grade standards. Image generation now operates with full redundancy and resilience layers, eliminating single-point failures that previously caused 503 errors and rate-limit interruptions during long runs.

Key improvements implemented:

  • Refactored core services into a modular, fault-tolerant structure with multiple image providers running in parallel.
  • Added automatic fallback between Gemini (primary) and Hugging Face/OpenRouter, including token rotation and exponential backoff retries.
  • Enhanced frontend retry logic (already stable since mid-March) to gracefully handle transient failures without breaking the user flow.
  • Pipeline now supports true parallel image generation (Turbo Mode) while maintaining sync with ffprobe-based audio/video timing.

Problems addressed:

  • Previous dependency on single LLM/image endpoints led to frequent pipeline crashes on extended videos (7+ minutes).
  • Inconsistent media assembly under high load was resolved through clip-level rendering and smart ducking/subtitle synchronization.
  • Docker environment stability was further hardened with environment-variable configs and global exception handling.
    It’s kind of like that :)

Had this 504 error, the formulation and preparation, but the tests in general were good

Attachment
0
ChefThi
  • feat: Evolve Aggregator to AI-Powered Matchmaker with Gemini 3.1 🚀 (64b172f)

The project evolved from a simple opportunity aggregator to an intelligent recommendation system. The main change was the integration of the Gemini 3.1 model as the matching engine, transforming the flow into a true personalized matchmaker.

News

  • Complete refactoring of the code structure for greater modularity: creation of the src/ folder with clear separation between sources, scorer, notifier, and bot.

  • Implementation of the src/scorer.py module responsible for calculating the compatibility score between the user’s profile and each collected opportunity, using Gemini 3.1 as the main model.

  • Addition of a multi-tier fallback strategy and dynamic model selection to increase robustness in case of quota limits or temporary failures.

  • Expansion of data sources: the system now collects opportunities from Devpost (via API), MLH, TabNews, and GitHub Jobs in parallel.

  • Development of Telegram bot commands:

  • /today — displays opportunities collected on the day

  • /match — returns the 3 best personalized recommendations with an explanation of the score

  • /search <term> — local search in the database

  • Creation of the daily scheduler in main.py + digest.py for automatic updating of opportunities and preparation for sending digests.
    Here I used a quite Geini CLI

  • Documentation update

Lessons learned and technical process

The transition required careful separation of responsibilities to facilitate future expansions. The choice of Gemini allowed for context-rich scoring, considering skills, experience level, and preferences described in the user profile. The fallback strategy ensures that the system remains functional even under adverse API conditions.

The result is a functional Super MVP that already delivers real Value

Attachment
Attachment
0
ChefThi
  • feat: implement tactile gesture activation and HUD v2.1 modular update (133994c)

Deep Scan & Tactical Gestures🖐️👁️

After a few days, I finally implemented the Deep Scan system. The challenge was: how to trigger an AI analysis without touching the keyboard?

I used MediaPipe to create a "Pinch" trigger. By holding the gesture for 1.5s (Tony Stark style calibrating the HUD), the system captures the frame and sends it to the Gemini 3 Flash brain. The result: I get a simple instant tactical report directly on the display, running with very low latency thanks to the new local architecture.

I liked all of this, thought these new updates were cool. The thing is, there’s still a certain delay that I think makes it trigger the Scan one after another, not getting the complete description of the first one.

The OmniLab not only shows data now; it understands what I see. 🎧🔥

Attachment
Attachment
0
ChefThi

basically worked for the DEMO video

and edited a bit archives

0
ChefThi

🚀 SHIP: Competitive Programming Companion v3.1

I am thrilled to announce the official release of v3.1, a major evolution of my terminal-based coding companion. This version brings state-of-the-art AI capabilities and a refined terminal UI to mobile development.

🧠 Core Upgrades:

  • Gemini 3.1 Flash Lite Preview Integration: The Brain Engine now generates 5 high-level, unique technical challenges per session using the latest frontier model.
  • REST-based AI Architecture: Eliminated heavy dependencies (like google-generativeai) to ensure 100% stability in mobile environments like Termux.
  • Dynamic AI Assist: Every user error is analyzed in real-time by the AI, providing deep technical explanations instead of static hints.
  • Cyber-Neon UI: A completely redesigned CLI using the rich library, featuring separate visual identities for Local (Cyan) and AI (Magenta) modes.

🛠️ Mobile-First Engineering:

Developing this v3.1 entirely on Android (Termux) and Acode pushed me to optimize every line of code. From managing environment variables for API keys to handling graceful degradation when offline, this project is a testament to what’s possible with a smartphone and a vision.

0
ChefThi
  • Refactor project details and shipping status (79c8e53)
  • feat: HUD v2 evolution with TTS, real-time diagnostics, and Gemini 3 Thinking Mode (674e279)

🚀 The HUD Just Leveled Up

The gap between thought and execution is getting smaller. I’ve just pushed a massive round of updates to the interface, bringing that “Stark Tech” vibe closer to reality.

What’s New:

Clean Decoupling (ada_v2 style): I moved the entire HUD interface to a static/ directory. By separating the Three.js frontend from the FastAPI backend, I can now tweak the UI instantly without touching the server logic.

Gemini 3 Thinking Mode: Deep reasoning is now live. When you trigger an analysis, the HUD displays DEEP SCANNING… while Gemini grinds through the image metadata to deliver a high-precision report.

J.A.R.V.I.S. Talk-Back: The HUD finally has a voice. Using the Web Speech API, the system now talks back during scans, making the whole experience feel way more immersive.

Real-Time Diagnostics: I added a telemetry overlay to monitor FPS and latency. It’s essential for keeping everything buttery smooth on my local Debian 13 setup.

Pinch-to-Lock Gestures: The “Pinch” gesture now locks the cursor and toggles system states, allowing for much tighter physical interaction with the 3D interface.

The “invisible interface” is finally start to be real

Attachment
0
ChefThi

Date: 2026-03-18


Okay, so it’s time for a little dev story. It might look like I’m pivoting, but honestly, this has been the master plan from the start. 🧠

The original dream for this project was always to build something that felt like a mini Linux desktop, but running entirely in a web browser. A full-on web-based OS. For a while, I was deep in the hardware ( I left it on Blueprint) weeds, but now I’m bringing it all back to the web. The foundation is laid!

So, What’s NerveOS?

I’m super stoked to finally put this into words:

The Nerve is a physical cyberdeck (ESP32 S3 + OLED + encoder). NerveOS is the web interface that lets you control it remotely from the browser. 💻✨ I think that’s what I’m going to leave anyway

Think of it as a full-fledged operating system in a browser tab, complete with:

  • Draggable windows 🖱️
  • An integrated terminal 👨‍💻
  • A real-time hardware status monitor 📊

It’s all coming together. What was once just a bunch of ideas is now a real, functional desktop environment.

Attachment
0
ChefThi

🤖 The End of Manual Labor (Autonomous Mode)
I’ve officially implemented Option 99 (Autonomous Mode). The engine now runs in a continuous loop,
watching for script files or backend signals. If a new idea hits the queue, the engine captures it,
renders it, and delivers the final video without me touching a single button. This is the foundation for
scaling production on mobile.

🎨 The “Absolute Cinema” Look (v1.6)
I wasn’t happy with “raw” renders. To give the videos a signature look, I added a post-processing layer
directly in the FFmpeg chain:

  • Color Grading: Dynamic contrast and saturation boosts.
  • Vignette Effect: That classic “cinema” dark-border focus that draws the eye to the center.
    The output now feels like a finished product, not just a test render.

🔊 Studio-Grade Audio (EBU R128)
Consistency is key. I implemented Audio Mastering (Loudnorm) to hit the industry standard of -14 LUFS
(the same used by YouTube and Spotify). No more videos that are too quiet or clipping—everything sounds
professional and balanced.

🧹 Clean House, Clean Mind
I did a deep cleanup of the GitHub repository. I used git rm –cached to strip away internal roadmaps
and simulation logs an tests of the pipeline. The public repo now holds only the pure Engine Core, keeping my portfolio sharp and focused on the code that actually matters.
P.S. Ieave the screen turn-off and paused the recording

0
ChefThi

Today was a major breakthrough for VOICE-TASK-MASTER (VTM). I transformed the entire interface into a retro-cyberpunk terminal and fully integrated the core voice engine.

Key Achievements:

  • Neural UI Uplink: Implemented a high-contrast Neon Green aesthetic with CRT flicker and scanline effects.
  • Voice Intelligence: Integrated the Web Speech API for real-time task management (Add, Remove, Clear).
  • Redundant Grid Persistence: Added a secondary backup layer to chrome.storage to ensure zero-data-loss for mission-critical tasks.
  • Audio Feedback: Synthesized AI status reports for daily briefings.

Technical Challenges:

Managing browser permissions for the microphone in a Chrome Extension popup was tricky, but I implemented a fallback to open the config in a new tab if access is denied.

Proof of Work:

I uploaded a preview of the UI (IMG_20260315_120436.jpg) and prepared the deployment package (extension_v1.0.1.tar.gz).

Next step: recording the final demo video and shipping the v1.0 version.

Attachment
Attachment
0
ChefThi

🚀 It’s Alive: Script to Video in One Click

the Factory just crossed the line from a “cool experiment” to a functional tool. The core engine is finally humming.


What’s new (and why it took a minute):

  • No more CORS headaches: I moved all media generation (Gemini & Hugging Face) to the NestJS backend. It’s cleaner, safer, and supports automatic token rotation. If an API key hits a limit, the system just swaps to the next one without breaking the flow.
  • Better Visuals (FLUX.1): Swapped generic images for FLUX.1-schnell. The pipeline now generates storyboards that actually match the script’s vibe instead of just “looking okay.”
  • Clean Narration: Integrated Gemini’s native TTS. It’s producing crystal-clear audio that’s perfectly synced with the auto-generated captions (SRT).
  • Built to last: The pipeline can now handle 7+ minute videos. I added smart batching and exponential backoff retries—so if an image service hiccups, the system fights to stay alive instead of just crashing.
Attachment
0
ChefThi

Pivoting Study Lab into AstroLab: learning with real NASA data between lectures 🚀📚

Study Lab Core started as a general “learning playground” — notes, experiments, and some loose project scaffolding. When my Computer Engineering course started, my schedule got a lot more chaotic: 10 km commute, bus rides, homework, Blueprint hardware work, and then trying to study on top of that. Generic study tools didn’t feel motivating enough to survive that reality.
During the past weeks (mostly late at night and on weekends) I used Perplexity to explore a new direction: what if the study tool was space‑themed, powered by real NASA APIs, and could generate quizzes and flashcards from actual scientific data? That’s how AstroLab was born. The idea is simple but powerful: every day NASA publishes the Astronomy Picture of the Day (APOD) with a technical explanation — that’s an automatic, high‑quality prompt for learning.
The latest pivot commit wires this idea into the code: AstroLab now treats NASA + Gemini as the core “study ecosystem”. It pulls APOD, feeds the description into Gemini, and turns that into quizzes and flashcards grounded in real data (black holes, gravitational lensing, nebulae, whatever the APOD is that day).
This devlog also covers the repo restructuring I did earlier: splitting learning exercises from shippable projects, organizing devlogs, and preparing the codebase to be more than just a personal scratchpad. The goal is to make AstroLab feel like “study with the universe as your teacher” — even if I’m reviewing it from a crowded bus on the way to campus. 🌠🚌

Attachment
Attachment
0
ChefThi

Turning OmniLab into a real HUD assistant: voice, vision and a more proactive AI persona 🎧🖐️

OmniLab has been my experimental lab for interfaces: 3D HUD, hand‑tracking, voice input, and AI all living in the same space. At the same time, life got busier: I started Computer Engineering, the campus is ~10 km away, and I’ve been splitting my time between classes, Blueprint hardware projects, and these software labs. That’s why commits came in bursts instead of daily drips — most of the work happened in small, tired, late‑night sessions.
Earlier this year I refactored the architecture to favor local‑first vision (removing a cloud version that was too high‑latency) and added the Web Speech API to the HUD, so I could trigger Gemini analyses via voice while the system tracked my hands in real time. That was the turning point: OmniLab stopped being “just a cool 3D scene” and started behaving like a genuine interface between my body, my voice and an AI brain.
Recently I pushed a big “SHIP‑ready” upgrade: Gemini integration is now first‑class, tests and CI/CD are in place, and the HUD feels more stable as a product, not just a demo. On top of that, I refined the AI persona: instead of only answering direct questions, OmniLab now makes proactive observations about what it sees and hears — it can comment on the scene, suggest next actions, and feel more like a lab partner than a tool.

Most of this evolution happened while juggling buses, deadlines and other projects, with Perplexity helping me reason about trade‑offs (what to keep in 3D, what to simplify, where AI actually adds value). This devlog is my way of catching the Flavortown timeline up with the reality: OmniLab grew quietly, but it grew a lot. ✨

Attachment
0
ChefThi

From idea to browser extension: hacking a voice‑first task manager 👾🎙️

Voice Task Master has been in the background since January — I started it as a simple idea for a voice‑powered todo tool, but only had the basic extension structure in place for a while. This week I finally sat down and turned it into a real Chrome MV3 extension with an opinionated UI and a shippable build.
The biggest jump happened in the latest sessions: I implemented a full cyber / hacker‑style UI for the popup, with dark monospace styling, CRT‑like visual details and a layout focused on fast capture (keyboard + voice). On top of that, I wired in voice synthesis so the extension can actually talk back when reading tasks or daily standups, instead of being just a silent checklist.

To make distribution easier, I also created a ready‑to‑install tarball for the extension and added a proper icon128.png, so it looks like a real product in the browser toolbar instead of a blank placeholder. This way, anyone can load it via chrome://extensions → “Load unpacked” or import the tarball directly when needed.
A lot of this happened in short bursts between college, buses and other Blueprint projects — I used Perplexity to quickly test small UI ideas, clarify WebExtension details and make sure the architecture stayed simple and local‑first. Now the plan is to iterate on features like a “Voice Standup” mode and calendar integration, but the core experience (talk → get tasks saved → hear them back) is already alive inside the browser. 🚀

Attachment
Attachment
Attachment
0
ChefThi

From simple scraper to AI‑assisted matchmaker (while commuting between classes) 🚏💼

Opportunity Aggregator started in January as a very simple experiment: a Python bot, a SQLite file, and a basic TabNews parser. Then college kicked in, Blueprint deadlines appeared, and I found myself coding on short windows between bus rides and homework instead of doing long, focused sprints. That’s why the commit history shows an initial burst in January and then a big jump only now.

During that gap I spent more time thinking than committing: what makes this different from a fancy RSS reader? The interesting piece is the match score — using AI to tell you how well each opportunity fits your profile, instead of just dumping links. I used Perplexity a lot in this phase to explore architecture ideas: how many sources, how to model user profiles, how aggressive the AI usage should be, and how to keep things cheap and robust.
The latest commit is where all that background thinking finally lands in code: I implemented a Super MVP with a proper SQLite persistence layer and a multi‑tier AI fallback strategy (Gemini as the primary brain, with dynamic model discovery as a fallback) to score opportunities. The scraper stack now has a cleaner structure and is ready for more sources.

It’s still early, but now the project feels like an actual assistant instead of a script. Next steps: Telegram commands for /match and /today, plus a daily digest flow so it can ping me with the top 3 fits while I’m literally on the bus to college.

Attachment
Attachment
Attachment
0
ChefThi

Balancing college, buses and FFmpeg: finally shipping an end-to-end video pipeline 🎓🚌

Over the last two months AI Video Factory was my “background process”. I had just started my Computer Engineering degree, and the campus is about 10 km from home, so most days were: bus → classes → bus → quick late-night coding sessions. On top of that I was also juggling Blueprint hardware projects, so I decided to work on this in focused bursts instead of constant tiny commits.

Most of the progress happened off-Git: I kept iterating on the FFmpeg pipeline, breaking it, fixing it, and using Perplexity as a kind of “technical rubber duck” to reason about filter graphs, error messages and timing issues. I didn’t want to push half-broken experiments all the time, so I waited until things felt structurally solid before committing.
In this latest round of changes I finally wired the full end-to-end pipeline: script → images → audio → video. I refactored image and audio generation into clearer modules and fixed a couple of nasty production issues: zoompan freezing on long chains, bad subtitle timing, and 503s during long renders. The solution involved rendering clips individually, using ffprobe for real audio duration, and switching to character‑weighted subtitle timing so the pacing feels natural.

I also hardened the Docker environment: proper SQLite permissions, config via env vars, and better logging through a global exception filter in the NestJS backend. Now, when something explodes, it explodes with logs instead of silently failing. 😅

This devlog is basically the “catch‑up chapter” for everything that happened between classes, buses and late‑night debugging. The next step is polishing the UI and shipping a public demo link.

Attachment
Attachment
Attachment
Attachment
0
ChefThi

what I did in two months (in the case of this project)


I haven’t posted a devlog for HOMES-Engine in about two months. Not because I wasn’t working on it — actually the opposite. I was heads-down testing, breaking things, fixing things, and Honestly sometimes just staring at FFmpeg error messages trying to figure out what went wrong. This is that story.


Where it started — Jan 4, day zero

The first commit was a proof of concept: a basic Python script that called FFmpeg and generated a video. That’s it. It barely worked. The font was wrong, the imports were broken, the output format was inconsistent. But it rendered something, which felt like enough to keep going.

In the same day I went from v0.1 to v1.3, v1.4, and v1.6 in rapid succession. Each version was fixing something the previous one broke: Edge-TTS for neural narration, multi-line text rendering (I kept getting those quadradinhos — encoding artifacts from special characters that took forever to track down), synchronized VTT subtitles, dynamic B-Roll stitching, music ducking. I was running all of this on Termux, on Android, ARM64. FFmpeg on ARM has its own quirks that aren’t documented anywhere useful.


The SAR bug that took too long

One thing that slowed me down more than anything else was a SAR mismatch error in ffmpeg_engine.py. When concatenating video clips, FFmpeg was crashing because different clips had different Sample Aspect Ratios. The fix was two FFmpeg flags: setsar=1 and format=yuv420p. Simple fix — once you know what it is. Finding it took hours of testing different inputs, reading logs, and using Gemini CLI to help me parse what the error stack actually meant.

I used Gemini CLI a lot during this phase. Not to write the code for me, but to help me reason through FFmpeg filter chains. FFmpeg’s filter syntax is its own language and when you’re building complex pipelines.

0
ChefThi

commit THE IMPORTANT
— feat(agent): status widget data export + automated Wakatime heartbeats and cry: ignore local status and logs

Title: HOMES Agent gets smarter: status export

Today I pushed two meaningful updates to the HOMES agent. First, I implemented status widget data export — the agent can now serialize its own state (telemetry, running modules) into a structured format that external dashboards can consume. This is the foundation for the HUD I’m building for OmniLab to talk to HOMES remotely.

Working on Android/Mobile is cool and interesting. I like understanding how the system’s API works (which is in Java. Currently, they use the Kotlin set for this). It’s difficult and a bit annoying; some simple errors and things break the system, but that’s just how it is.

I also cleaned up .gitignore to stop tracking local status dumps and log files that were polluting the repo. Small change, but it keeps the history clean.

0
ChefThi

For cli-problem-solver, I intentionally kept commits quiet for a while. I was figuring out what this tool should really do: just call an LLM, or also keep local memory, search previous solutions, and feel like a true “context co-processor” for my terminal.
During this time I prototyped different flows locally (no commits), tried a few CLI designs, and thought a lot about how to balance AI calls with a fast local experience. Perplexity helped me explore options and refine the idea before I locked anything into the repo.
Once I felt I had a decent mental model (commands, history, basic architecture), I started committing the structure in bigger, more meaningful chunks instead of tiny incremental updates.

Attachment
Attachment
0
ChefThi

Basically I updated the docs, improve the code and corrected the AI system in the application. I made her follow the specified language in the input.
I also deployed the functions (I forgot to make this…🫠)

I tried to edit the archives in GitHub web, but I get problems and errors. I learned to no use this for important and precision developments.
I think it is.

And for this devlog, because it is short and does not have many updates I make this so simple.

Attachment
0
ChefThi

OmniLab Devlog #1

I’ve officially kicked off OmniLab on my first laptop! Coming from a background of mobile development and browser-based IDEs, my first instinct was to keep everything “off-device”. I spent a good chunk of these 5 hours attempting to run the processing stack on a remote VM (Firebase Studio) and tunneling the HUD via a web page. However, the latency was unbearable for real-time tracking. I quickly realized that for a “Jarvis-like” experience, the vision loop must be 100% local.

Technical Hurdles & Git Mess

The first challenge was MediaPipe. I started with legacy code, but it wouldn’t play nice. I had to dive into the latest MediaPipe Tasks API docs to rewrite the landmark detection core. It’s much more efficient now, but the documentation shift caught me off guard.

Since I was jumping between cloud editing and local testing without properly cloning the repo first, I ended up with a mess of Git conflicts. I used the Gemini CLI as a mentor to help me untangle the branches, resolve the “already exists” errors, and get the local and remote repositories back in sync. It was a great lesson in maintaining a clean workflow on a new machine.

Current Progress

I’ve successfully implemented the “pinch” gesture logic (calculating the hypotenuse between thumb and index) and set up a local FastAPI server to bridge vision data to a Three.js HUD. The HUD now runs locally on Debian 13 (XFCE), which eliminated all the lag from my previous VM tests.

Timelapses

Attachment
1

Comments

ChefThi
ChefThi about 2 months ago

To clarify the technical choices: I’m focusing heavily on keeping the HUD lightweight on my new machine by using Debian 13 (XFCE) and optimizing the Python vision loop. I’m also studying the ada_v2 repository to implement better modularity in the UI layer. Integrating these clean interface concepts into a zero-latency environment is the main goal for the next update.

ChefThi

Título: Modularização Extrema, Rich CLI e a Grande Faxina no Flavortown
Data: 2026-02-25
Commits:

  • 5ba3f12 — 🚀 REESTRUTURAÇÃO COMPLETA: O FLAVORTOWN AGORA É MODULAR! — 5ba3f12

Resumo: Adeus ao script único e bagunçado! Transformei o Flavortown em uma aplicação Python robusta, modular e visualmente incrível usando a biblioteca Rich para dominar o terminal.

O que foi feito:

  • Arquitetura Modular (src/): Fiz o “de-coupling” total da lógica. O que antes era um emaranhado de funções no main.py agora está distribuído em módulos especializados:
    • src/ui: Toda a lógica de renderização e interação com o usuário.
    • src/quiz: O motor que controla o fluxo das perguntas e validações.
    • src/scoring: Sistema de cálculo e persistência de pontos.
    • src/problems: Gerenciamento dinâmico do banco de dados de desafios.
    • src/config: Centralização de constantes e caminhos do sistema.
  • Centralização de Dados (data/): Criei uma “camada de persistência” real. Agora, questions.json e scores.json vivem em uma pasta dedicada, permitindo que eu adicione centenas de novos problemas sem precisar alterar uma única linha de código da lógica do quiz.
  • Upgrade Visual com Rich: Implementei a biblioteca Rich para transformar a experiência de texto puro em algo profissional. Agora o Flavortown exibe as perguntas em painéis estilizados (Rich Panels), usa cores para indicar acertos ou erros e tabelas limpas para mostrar o desempenho. O terminal ganhou vida!
  • Telemetria e Sync: Ajustei os utilitários de rastreamento para garantir que cada minuto codado no Termux seja computado corretamente pelo Wakatime/Hackatime. Corrigi bugs de pathing que faziam o tracker se perder durante a mudança para a estrutura src/.
  • Refatoração e Limpeza: Removi arquivos temporários, templates HTML obsoletos do protótipo inicial e simplifiquei o ponto de entrada (main.py), que agora funciona apenas como o “orquestrador” do app.
    Ahh e a UX ficou até que interessante, o Rich Text CLI ajuda. 🙂👾
Attachment
Attachment
Attachment
Attachment
0
ChefThi

Devlog Atrasado

Período: 30 de Janeiro a 18 de Fevereiro de 2026

Tô escrevendo esse report retroativo agora pra botar a casa em ordem, registrar o que foi feito e poder shipar novamente o projeto, mas dessa vez com a certeza de que ele tá rodando liso. O foco desse ciclo foi basicamente transformar o gerador de roteiros em algo que realmente dá pra confiar.

O que realmente rolou nos bastidores

1. O problema da IA com dupla personalidade
O app tava com uma falha chata: se o usuário mandasse o input em inglês, mas as instruções da persona no backend estivessem em português, a IA se perdia e respondia no idioma errado ou misturava tudo.
A solução: Sentei e traduzi todas as descrições e regras de prompt das personas pro inglês no backend. Agora o “contrato” com a IA tá muito mais claro e ela obedece o idioma que o usuário pedir sem alucinar.

2. O maldito Case-Sensitive
O frontend tava mandando a persona escolhida como Default (com maiúscula) e o backend (CONFIG) só entendia default (minúscula). Isso fazia o sistema falhar silenciosamente ou cair num fallback genérico.
A solução: Forcei um .toLowerCase() nas chaves da persona. Básico, mas resolveu de vez a dor de cabeça.

3. Lição aprendida: não editar código pelo GitHub Web
Eu tava fazendo umas alterações rápidas direto pela interface web do GitHub e adivinha? Quebrei a sintaxe das cloud functions algumas vezes (crases erradas, aspas faltando). Além disso, as URLs de deploy mudaram e o app parou de conseguir falar com o backend.
A solução: Tive que fazer uns hotfixes urgentes pra arrumar os erros de sintaxe e atualizar os endpoints corretos de produção nas requisições.

Attachment
0
ChefThi

Título: Crise no Lapse & O Panic Save Module

Data: 2026-02-01

Commits

  • 93ec85c — Initial commit — (Base do projeto)
  • Nota: Os ajustes de hardware do Panic Module estão em fase de roteamento no EasyEDA.

Resumo

Seis horas de Deep Work em hardware viraram um “filme de terror mudo” quando o upload do Lapse falhou. A frustração com o erro de IndexedDB (Rate Limit 429) motivou uma mudança radical: o projeto agora tem um botão físico de pânico para salvar o estado do sistema.

O que foi feito

  • Investigação Técnica: Analisei os logs de rede no DevTools após o upload travar em 60%. Identifiquei um InvalidStateError causado por um Rate Limit (429) que corrompeu o banco de dados local durante o merge do WebM.
  • Hardware-Level Backups: O Encoder (Hype Dial) agora tem uma função secundária via Python/Serial para controlar a frequência de backups locais.
  • Panic Button: Adicionei um gatilho físico no design para forçar um git push e salvar o estado do projeto antes de qualquer instabilidade de conexão.
  • Identidade Visual: Finalizei o banner no Canva para o projeto “The Nerve”, focando na estética “Absolute Cinema / Cyberdeck”.

Resultados / Status

O sistema agora é resiliente a falhas digitais. O que era para ser apenas um controlador de vídeo agora é uma ferramenta de sobrevivência hacker. O esquema elétrico foi atualizado para incluir o Panic Module.

Evidências e Timelapses

Attachment
0
ChefThi

O foco principal deste ciclo foi refatorar a arquitetura do HOMES Neural Deck para Autonomia de Engenharia. Eu decidi abstrair o Frontend (usando Framer para garantir o visual Cyberpunk que idealizei) para dedicar 100% da minha energia ao Backend (Firebase Cloud Functions) e à integração da API Gemini. Essa decisão garante que a lógica central, onde reside o valor real do projeto, é 100% código autoral.

[Hash: 3d16779] - Consolidação da lógica de orquestração do Gemini.
[Hash: d5e8f21] - Nova Estrutura de Pasta (web_page): Criação de um diretório dedicado para documentação visual e técnica. Incluí o Timelapses.md para documentar o processo de design fora do editor de código.
[Hash: a1b2c3d] - Gestão de Estado Reativa: Início da modelagem de dados para o Framer. Substituí a manipulação direta de IDs do DOM por um fluxo de dados JSON, preparando o “contrato” de comunicação com o Firebase Cloud Functions.

🧠 O que aprendi (Insights de Engenheiro)
Repositório como Prova Social: Entendi que a organização de pastas não é apenas estética. Ao criar a web_page/, estou criando um audit trail (rastro de auditoria). O Timelapses.md prova que o design evoluiu através de decisões humanas. Pra mim ter o processo do estilo e design da página.

O “Contrato” Cliente-Servidor: Ao integrar o Framer com o Firebase, tive um estalo sobre arquitetura: o Frontend e o Backend são como dois sócios. Eles não precisam saber como o outro trabalha por dentro, mas precisam de um “contrato” (payload JSON) bem definido. Isso libera minha criatividade no design sem quebrar a lógica robustaNque construí no backend.

Links das timelapses

https://lapse.hackclub.com/timelapse/L-DqtB-feWSw
https://lapse.hackclub.com/timelapse/BYUOzevQb1i3
https://lapse.hackclub.com/timelapse/ARkVL4dmqWPn

Attachment
1

Comments

ChefThi
ChefThi 2 months ago

Esqueci de pegar anexos durante o desenvolvimento. Aí só tem os vídeos mesmo (timelapse).

ChefThi

Título: O Nascimento do “The Nerve” – Hardware & Design
Data: 2026-01-29
Commits:

  • 93ec85c — Initial commit

Resumo: O projeto ganhou vida! Saí do zero e finalizei a fase de design inicial do controle físico que vai comandar meu pipeline de renderização de vídeo por IA.

O que foi feito:

  • Design Visual: Usei o Canva para criar o banner e definir a identidade visual “Cyberdeck / Absolute Cinema”. Queria algo que tivesse uma pegada tátil e futurista.
  • Simulação (Wokwi): Antes de queimar qualquer coisa, validei a lógica do display OLED (SSD1306) via I2C no Wokwi com MicroPython. Tudo fluiu bem, garantindo que a comunicação está estável.
  • Esquema Elétrico (EasyEDA): Desenhei o circuito usando um RP2040-PLUS. Adicionei um encoder rotativo (o “Hype Dial”) para ajustar parâmetros dos vídeos e um botão de trigger para disparar o pipeline.
  • Otimização: Removi capacitores extras que seriam redundantes, já que a placa da Waveshare já cuida bem da filtragem de energia.

Resultados / Status:
O “gêmeo digital” do hardware está validado. O esquema passou no Netlist sem erros. Agora o projeto deixou de ser apenas software e tem um “corpo” planejado.

Próximos passos:

  • Partir para o PCB Layout no EasyEDA.
  • Criar um contorno de placa personalizado (não retangular) para manter a estética Cyberdeck.
  • Começar a integração com o n8n/FFmpeg via USB.

Timelapses do progresso:

Attachment
0
ChefThi

Título: Estruturação Inicial e Simulação Wokwi
Data: 2026-01-28
Commits:

  • eda9319 — feat: iniciando o repo e aproveitei e falei sobre um projeto que mexi hoje
  • 507906f — refactor(structure): Organize monorepo for learning and projects

Resumo: Inicialização do repositório Study-Lab-Core e organização da estrutura de monorepo para separar exercícios de aprendizado de projetos reais.

O que foi feito:

  • Criação do repositório core para centralizar estudos e projetos de hardware/firmware.
  • Reorganização total da estrutura de pastas:
    • Logs de projetos movidos para a raiz.
    • Criado diretório /logs para notas gerais e registros de aprendizado.
    • Consolidado projeto ‘servomotor’ em /projects.
    • Criado diretório /learning para módulos de estudo focados.
  • Atualização do diagram.json para refletir o setup atual no Wokwi.
  • Adição de scratchpad.md para notas rápidas de desenvolvimento.

Resultados: Estrutura de diretórios limpa, escalável e pronta para novos módulos. Simulação no Wokwi funcional e integrada ao fluxo de commits.

Attachment
0
ChefThi

Título: Setup Inicial, Faxina no Git e Integração com TabNews
Data: 2026-01-25

Commits:

  • b966db7 — feat: implementa parser para buscar notícias do TabNews
  • 89f8288 — chore: stop tracking venv and internal config files
  • d41b700 — Criei o arq bot.py e estou me preparando para o primeiro Devlog :)
  • af2779b — Iniciando o projeto com os primeiros arquivos
  • a98014e — Initial commit

Resumo:
Iniciei o Opportunity Aggregator para centralizar chances acadêmicas e tech. O foco foi estruturar o ambiente, criar a base do bot no Telegram e implementar um parser para coletar dados reais via RSS. Este início foi um mergulho prático em bibliotecas novas e conceitos de versionamento.

O que foi feito:

  • Estruturação: Configuração de .gitignore e requirements.txt para um ambiente organizado.
  • Bot Base: Criação do bot.py com comandos /start e /ping usando ‘python-telegram-bot’.
  • Parser RSS: Uso da lib ‘feedparser’ para extrair os 5 posts mais recentes do TabNews.
  • Segurança: Uso do ‘python-dotenv’ para gerenciar o token do bot de forma segura.

Dificuldades e Aprendizado Ativo:
Enfrentei desafios logo de cara. Um erro comum foi subir a pasta ‘venv’ para o GitHub. Isso me forçou a aprender comandos avançados de Git, (não q vou lembrar muito, mas usei 🫡😁) como ‘git rm –cached’, para limpar o repositório sem perder os arquivos locais. Foi uma lição prática sobre o que não deve ser versionado.

Estou lidando com libs complexas como ‘python-telegram-bot’. Em vez de só copiar código, estou lendo a documentação para entender o “porquê” das coisas, como a lógica de funções assíncronas (async/await). A IA tem sido uma “Mentora”; ela explica a engrenagem por trás dos snippets, mas eu mesmo aplico e edito o código para garantir o aprendizado.

Resultados:
Bot operacional e parser extraindo dados reais com sucesso. Próximo passo: integrar o parser no comando /vagas do bot e iniciar os estudos com Supabase.

Attachment
0
ChefThi

RESUMO
O objetivo central desta fase foi a estabilização definitiva da ponte entre o Frontend (GitHub Pages) e o Backend (Cloud Functions/Cloud Run). Enfrentamos desafios complexos de infraestrutura, especialmente no que diz respeito ao roteamento de rede e políticas de segurança cross-origin.

DETALHAMENTO DE COMMITS:

c6ae119 - Padronização de Runtime (Node 24)

Atualização estratégica da stack para Node 24 dentro do ambiente IDX. Esta mudança não é apenas estética; ela padroniza o motor de execução para as Cloud Functions, garantindo suporte às versões mais recentes de dependências e otimizando o consumo de recursos.

59bb7d8 - Estabilização e Autenticação IDX

Fase de correção de bugs na comunicação app-to-cloud. Resolvemos gargalos de autenticação específicos do ambiente IDX que causavam quedas de conexão. Implementamos um tratamento de cabeçalhos mais robusto, assegurando que o servidor identifique corretamente as requisições autorizadas, eliminando interrupções no fluxo de dados.

2e34eee / be3d67f - Migração para Endpoints de Produção

Realizamos o chaveamento das URLs de desenvolvimento (localhost/preview) para as URLs definitivas do Cloud Run. Este processo foi crítico para que o frontend hospedado no GitHub Pages passasse a interagir com o backend de alta disponibilidade, deixando de depender de simuladores ou mocks locais.

433ae50 - Refinamento de Roteamento (/gerarRoteiro)

Correção de pathing para evitar erros 404. Forçamos a inclusão explícita da rota /gerarRoteiro em todas as requisições de produção. Isso garante que o balanceador de carga do Google Cloud direcione o tráfego exatamente para a função de processamento de IA, evitando que requisições “morram” na raiz do serviço.

3d16779 - Resolução Definitiva de Bloqueios CORS

Ajuste fino na política de segurança de navegadores. Removemos o uso de credenciais (cookies/auth headers desnecessários) no método fetch do cliente.

Attachment
2

Comments

ChefThi
ChefThi 3 months ago

Com as implementações recentes, o app atingiu maturidade para operar em ambiente de produção, consumindo inteligência artificial de forma escalável e segura.

O ambiente de desenvolvimento foi totalmente padronizado no Google IDX com Node 24, garantindo paridade total entre o código local e os containers na nuvem. 👌🛜😝

Algumas explicações dos commits :)

  • Padronização de Runtime (Node 24)

Configuramos as portas de escuta e variáveis de ambiente globais para que o deploy ocorra sem necessidade de ajustes manuais entre builds.

  • Resolução Definitiva de Bloqueios CORS

Essa mudança permitiu que o backend responda de forma limpa a origens externas (cross-origin), resolvendo o erro clássico que impedia a exibição do roteiro após o processamento.

ChefThi
ChefThi 3 months ago

POR ALGUM MOTIVO NO DEVLOG NÃO APARECEU TODOS OS ANEXOS(PARA MIM SÓ APARECEM DOIS), FUI EDITAR ELES E ENCONTREI OS LINKS DOS OUTROS… 😬🤔🤔🫡

https://flavortown.hackclub.com/rails/active_storage/representations/proxy/eyJfcmFpbHMiOnsiZGF0YSI6ODA0NjIsInB1ciI6ImJsb2JfaWQifX0=--537e5bc67131f2d9bfd04f8c3196a017c93e8c21/eyJfcmFpbHMiOnsiZGF0YSI6eyJmb3JtYXQiOiJ3ZWJwIiwicmVzaXplX3RvX2xpbWl0IjpbNDAwLDQwMF0sInNhdmVyIjp7InN0cmlwIjp0cnVlLCJxdWFsaXR5Ijo3NX19LCJwdXIiOiJ2YXJpYXRpb24ifX0=--9c897ec1b1274defb23f0ba167df32fefc493e3b/download.png

https://flavortown.hackclub.com/rails/active_storage/representations/proxy/eyJfcmFpbHMiOnsiZGF0YSI6ODA0NjMsInB1ciI6ImJsb2JfaWQifX0=--fea143231898c6ce1e141759a7bf4171c4a2ac3b/eyJfcmFpbHMiOnsiZGF0YSI6eyJmb3JtYXQiOiJ3ZWJwIiwicmVzaXplX3RvX2xpbWl0IjpbNDAwLDQwMF0sInNhdmVyIjp7InN0cmlwIjp0cnVlLCJxdWFsaXR5Ijo3NX19LCJwdXIiOiJ2YXJpYXRpb24ifX0=--9c897ec1b1274defb23f0ba167df32fefc493e3b/download.png

https://flavortown.hackclub.com/rails/active_storage/representations/proxy/eyJfcmFpbHMiOnsiZGF0YSI6ODA0NjQsInB1ciI6ImJsb2JfaWQifX0=--72582c9b5938de02d0060f51a286706c7137f534/eyJfcmFpbHMiOnsiZGF0YSI6eyJmb3JtYXQiOiJ3ZWJwIiwicmVzaXplX3RvX2xpbWl0IjpbNDAwLDQwMF0sInNhdmVyIjp7InN0cmlwIjp0cnVlLCJxdWFsaXR5Ijo3NX19LCJwdXIiOiJ2YXJpYXRpb24ifX0=--9c897ec1b1274defb23f0ba167df32fefc493e3b/download.png

https://flavortown.hackclub.com/rails/active_storage/representations/proxy/eyJfcmFpbHMiOnsiZGF0YSI6ODA0NjUsInB1ciI6ImJsb2JfaWQifX0=--fc7694b27c5c4b6d50ff206abb740033314a19c0/eyJfcmFpbHMiOnsiZGF0YSI6eyJmb3JtYXQiOiJ3ZWJwIiwicmVzaXplX3RvX2xpbWl0IjpbNDAwLDQwMF0sInNhdmVyIjp7InN0cmlwIjp0cnVlLCJxdWFsaXR5Ijo3NX19LCJwdXIiOiJ2YXJpYXRpb24ifX0=--9c897ec1b1274defb23f0ba167df32fefc493e3b/download.png

https://flavortown.hackclub.com/rails/active_storage/representations/proxy/eyJfcmFpbHMiOnsiZGF0YSI6ODA0NjYsInB1ciI6ImJsb2JfaWQifX0=--72fd498c1bbc8ef3401f85f543c33a3fb1f0f550/eyJfcmFpbHMiOnsiZGF0YSI6eyJmb3JtYXQiOiJ3ZWJwIiwicmVzaXplX3RvX2xpbWl0IjpbNDAwLDQwMF0sInNhdmVyIjp7InN0cmlwIjp0cnVlLCJxdWFsaXR5Ijo3NX19LCJwdXIiOiJ2YXJpYXRpb24ifX0=--9c897ec1b1274defb23f0ba167df32fefc493e3b/download.png

ChefThi

RESUMO DA EVOLUÇÃO
Lançamento da v1.0.0 com foco em polimento de interface e documentação. Implementação de novas personas, toasts para avisos e sistema de TTS com realce sincronizado. O repositório foi reorganizado para suportar o backend inicial via Firebase Functions, isolando a lógica de geração de roteiros em um ambiente seguro. Preparação concluída para deploy serverless e upgrade na integração com Gemini API.

COMMITS

d7799 (01/jan) – chore(release): v1.0.0 Launch

Lançamento da versão estável. README reescrito com foco em branding e guia de uso. Polimento geral na UI com adição de badges e melhorias cosméticas. Marco oficial para avaliação técnica e divulgação.

da6d6 (19/jan) – feat: Novas features + organização

Reestruturação do projeto para uma arquitetura mais limpa.

  • Personas (CONFIG.PERSONAS): Inserção de variações de tom e estilo (Científico, Dramático, etc).
  • Toasts: Substituição de alerts comuns por notificações elegantes.
  • Error Handling: Melhoria no tratamento de exceções.
  • TTS Sync: Realce de texto sincronizado com a fala (highlight + scroll automático).
  • UX: Botão de processamento com estado de loading.

4732f (19/jan) – docs: PROGRESS.md adicionado

Criação de um log de progresso para tracking de tarefas. Inclui progresso de prompts por persona, status do sincronismo de áudio e mapeamento de pendências técnicas (Firestore e Web Audio API).

72648 (19/jan) – Limpeza de diretórios

70cd4 (22/jan) – Segurança e Firebase Functions

  • Segurança: .gitignore configurado para proteger .env e node_modules.
  • Backend: Criação da pasta functions/ com index.js. Implementação da Cloud Function ‘gerarRoteiro’.
  • Lógica: Integração com Gemini, tratamento de CORS, validação de inputs (topic/persona) e resposta em JSON.
  • Infra: Controle de logs e custos via console do Firebase.
Attachment
Attachment
Attachment
Attachment
Attachment
0
ChefThi

Neste ciclo, o projeto transicionou de uma ferramenta puramente client-side para uma arquitetura full-stack. O foco foi segurança de dados, UX defensiva e fundamentos de infraestrutura.

Principais Entregas
Lançamento v1.0.0: Marco de estabilidade com documentação técnica completa e branding “Neural Deck”.

Segurança de API: Implementação de Firebase Functions para encapsular a lógica do Gemini. Agora, a API Key não fica mais exposta no frontend — um padrão essencial de segurança em Engenharia de Software.

Interface e UX: Substituição de alerts genéricos por um sistema de Toast Notifications e implementação de seletor de Personas.

Sincronização TTS: Desenvolvimento de lógica para realce (highlight) de texto sincronizado com a síntese de voz (Text-to-Speech).

Histórico de Engenharia (Commits)
d779957 — chore(release): Launch v1.0.0. Lançamento oficial da versão estável. Reescrita do guia de uso e polimento de UI.

da6d636 — feat: Novas features e organização. Implementação do seletor de personas, sistema de Toasts em CSS puro e refatoração do tratamento de erros para leitura de corpo de resposta HTTP.

4732f7a — docs: PROGRESS.md. Implementação de rastreabilidade de tarefas. Adição de estados para futura integração com Web Audio API.

7264825 — refactor: Limpeza estrutural. Reorganização de diretórios e variáveis de estado para o visualizador de áudio.

70cd401 — feat: Backend & Firebase Deploy. Configuração de ambiente de deploy. Criação da Cloud Function gerarRoteiro utilizando o SDK @google/generative-ai (modelo Gemini Flash). Configuração de CORS e validação defensiva de inputs no servidor.

Attachment
Attachment
Attachment
Attachment
1

Comments

ChefThi
ChefThi 3 months ago

ACABEI PERDENDO/APAGANDO ANEXOS QUE HAVIA GUARDADO PARA ESSE DEVLOG…🫤😬

ChefThi

Título: Criação e Estruturação do Flavortown CLI Problem Solver
Commits:

  • 3891434 — Reformulei o README com o link do design e tirei o projeto
  • 1e3538c — chore: stop tracking .wakatime-project
  • d1b0ec3 — chore: add .wakatime-project to .gitignore
  • 9c2dabcf — chore: remove HTML design template and minor updates to main.py
  • c647f39 — Initialize README with project details and instructions
  • b0bb2a5 — refactor: reverte main.py para usar lista fixa (remove json)
  • bc2d779 — feat: integra JSON de questoes e adiciona loop infinito

Resumo: Desenvolvimento inicial de um quiz CLI focado em lógica e programação, construído inteiramente em ambiente mobile (Termux/Acode).

O que foi feito:

  • Implementação da lógica principal em Python (main.py) com sistema de quiz aleatório.
  • Refatoração do banco de questões de JSON para lista interna para garantir portabilidade no MVP.
  • Criação de sistema de validação de input com tratamento simples de strings (.lower() e .strip()).
  • Design visual da identidade e landing page do projeto via Figma.
  • Documentação completa do processo de desenvolvimento mobile e instruções de teste.
  • Limpeza de arquivos de ambiente (.wakatime) e templates HTML obsoletos.

Resultados: CLI funcional operando via terminal, permitindo resolver problemas de lógica com feedback imediato. Design de interface concluído e documentado.

Próximos passos: Implementar sistema de pontuação acumulada, expandir o banco de perguntas e adicionar categorias de dificuldade.

Links do Projeto:
Design (Figma): https://www.figma.com/site/p8z5loosi9Lo5yZ4ayrYFp/CLI-Problem-Solver?node-id=0-1&t=Ds3KhPCdMtdeXuaJ-1
Timelapses (Hack Club):

Attachment
0
ChefThi

Título: Melhorias de áudio, documentação e timeouts
Data: 2026-01-10

Commits (hashes):
3bb12a6 ee1f5e3 d3c80a4

Resumo:
Trabalhei em três frentes diretas após os commits 3dbbaf16 / be0f105a: áudio inteligente (ducking), atualização de documentação/testes e aumento de timeouts do servidor para reduzir erros 503 em renderizações longas.

O que foi feito:

  • 3bb12a6 — Implementado Smart Ducking no pipeline de vídeo: agora a mixagem reduz automaticamente o volume da música de fundo quando a narração está ativa, com curvas de ganho suaves para evitar cortes abruptos. Adicionei testes unitários cobrindo a lógica de mixagem e validação de níveis RMS para garantir que ducking não degrade a fala. NÃO TESTADO!
  • ee1f5e3 — Atualizei docs e refinei testes: status do projeto ajustado, casos de teste do VideoService/AIService ampliados e pequenas correções nos scripts de teste (mais mensagens claras nos asserts).
  • d3c80a4 — Aumentei timeout do servidor para 15 minutos e confirmei timeouts longos no proxy do Vite; objetivo: reduzir timeouts 503 durante jobs de processamento de vídeo grandes.

Resultados:

  • Experimentos locais mostram áudio mais claro em saídas com BGM + narração e transições sem artefatos.
  • Testes automatizados fortalecidos (cobertura crítica mantida) — menos regressões ao ajustar mixagem/FFmpeg.
  • Redução observada de falhas por timeout em runs manuais longos (a validar em CI).

Próximos passos:

  • Rodar E2E com pipeline completo em CI (docker-compose) para confirmar estabilidade do timeout ampliado.
  • Medir impacto do ducking em diferentes BGM (multi-gênero) e ajustar parâmetros padrão.
  • Expor métricas de nível de áudio (RMS/peak) no VideoGateway para monitoramento em tempo real.
Attachment
Attachment
Attachment
Attachment
Attachment
Attachment
Attachment
0
ChefThi

Data: 2026-01-09

Commits cobertos (hashes):
3dbbaf16 be0f105a

Resumo:
Após as melhorias de estabilidade e limpeza, foquei em tornar o projeto testável e executável em container. Adicionei testes unitários, preparei imagens Docker e corrigi problemas de execução no ambiente containerizado para garantir que o pipeline possa rodar localmente e em CI com consistência.

Detalhes por commit:

  • 3dbbaf16 — feat: complete unit tests and docker configuration

    • Adição e correção de testes para VideoService, ProjectsService e AiService; cobertura acima de 60%.
    • Criação de docker-compose.yml e Dockerfile para o servidor; adição de .dockerignore.
    • Estrutura de containerização pensada para isolar banco (SQLite no container), serviços e facilitar builds locais/CI.
    • Objetivo: permitir execução reproducível do backend e integração com frontend via proxy.
  • be0f105a — fix: Depuração completa e estabilização do ambiente Docker

    • Ajustes de permissões do arquivo de banco de dados para evitar erros de escrita em container.
    • Movi configuração do BD para variáveis de ambiente (melhor segurança e flexibilidade).
    • Resolvido conflito de dependência do Express que quebrava o container.
    • Limpeza de cache Docker para recuperar espaço e evitar builds corrompidos.

Impacto:

  • Ambiente Docker agora inicializa de forma confiável e o backend executa com a mesma configuração esperada na CI.
  • Testes unitários cobrem componentes cruciais do pipeline — reduz risco de regressões ao mexer em FFmpeg/IA.
  • Menos atrito para colaboradores: com docker-compose é mais fácil replicar o ambiente localmente.

O que testar / próximos passos:

  • Executar pipeline completo dentro do container (geração de script → TTS → imagens → assemble) para validar timeouts e recursos.
  • Adicionar testes E2E que rodem em CI usando o docker-compose.
  • Monitorar o uso de disco em runners/containers e automatizar limpeza de caches em pipelines.
Attachment
Attachment
Attachment
Attachment
Attachment
Attachment
0
ChefThi

DEVLOG - 08 JANEIRO 2026

RESUMO:
Nascimento do HOMES AI Agent e integração com a API do Termux para feedback de voz e tátil. O sistema agora não é apenas código, mas um assistente capaz de interagir fisicamente com o usuário através do Android.

ATIVIDADES:

  1. Implementação do Agente Principal (homes_agent.py):
    • Substituição do antigo jarvis.py por uma estrutura mais robusta.
    • Integração com Termux TTS (Text-to-Speech) para notificações por voz.
    • Feedback tátil usando vibração do dispositivo em casos de sucesso ou erro.
  2. Refatoração e Limpeza:
    • Limpeza de arquivos legados (generator.py e scripts antigos).
    • Otimização da estrutura de diretórios para o Hub Central.

COMMITS DO DIA (AUDIT TRAIL):

  • 018f03a - feat: implement HOMES AI Agent with Termux API integration
  • aae75c4 - feat: add jarvis.py for Termux voice feedback and cleanup legacy files

MÉTRICAS:

  • Linguagem: Python / Bash
  • Sistema: Termux (ARM64)
  • Status: 🟢 Funcional

HOMES AI: “Sistema pronto para operação, Senhor.”

Attachment
Attachment
0
ChefThi

HAVIA ESQUECIDO DE ESCREVER NO SITE AQUI😅

Devlog - 07/01/2026: Hardware Assembly & Setup

Resumo:
Dia dedicado à estruturação física e infraestrutura do ecossistema. Montagem da estação
de trabalho mobile e do protótipo eletrônico que servirá de interface para as
funcionalidades de automação do HOMES.

🛠️ Workstation & Hardware

  • Estação de Trabalho: Configurada para desenvolvimento 100% mobile (Termux/ARM64).
  • Montagem do Circuito: Integração física dos componentes ao ESP32 para monitoramento
    e automação.
    • Sensores: DHT11 (Clima), MQ-2 (Gás), Ultrassônico (Presença).
    • Atuadores: Servos (Porta/Janela), Relé (Fan), LEDs de Status.

📌 Commits do Dia (Audit Trail)

  • 8a41e67 - docs(devlog): add hardware assembly proof of work
  • 50f32b4 - docs(devlog): add assembly video proof of work

📸 Proof of Work

Attachment
Attachment
Attachment
Attachment
Attachment
Attachment
Attachment
0
ChefThi

Título: Robustez do Pipeline e Sincronia Audiovisual
Commits: 75f531a, 17c3a84

Resumo:
Foco em estabilidade do engine FFmpeg e precisão na sincronização áudio/legenda. Eliminamos travamentos de memória em vídeos longos e bugs de timing, além de limpar artefatos de tracking no repositório.

O que foi feito:

  • Render por Clipe (75f531a): Fragmentação da cadeia monolítica do zoompan em etapas individuais. Imagens são processadas como clipes MP4 isolados e unidas via concat demuxer, evitando congelamentos de frame e estouro de memória.
  • Sincronia via ffprobe (75f531a): Implementação de sondagem nativa de áudio no backend. Agora o sistema obtém a duração real do arquivo, corrigindo desvios causados por estimativas do frontend.
  • Legendas Inteligentes (75f531a): Novo algoritmo de peso por caracteres (character-weighted). O tempo de cada legenda agora é proporcional ao tamanho do texto, resultando em leitura fluida e natural.
  • Padding & Erros (75f531a): Adição de +5s de segurança no clipe final para evitar cortes abruptos. Criado filtro global de exceções para debug em server.log.
  • Frontend Sync (75f531a): Detecção real de duração via Audio API (fim do placeholder de 60s) e inclusão automática de subtitles.srt no ZIP de saída.
  • Limpeza (17c3a84): Normalização do .gitignore e remoção de arquivos temporários do tracking do Git.

Resultados:

  • Fim dos travamentos de renderização em sequências longas.
  • Legendas perfeitamente sincronizadas com a narração.
  • Repositório limpo, focado apenas em código produtivo.

Testes:

  • Montagem com mix de formatos (PNG/JPG) e áudios reais; verificação de saída MP4/SRT e integridade do ZIP.

Próximos Passos:

  • Validar pipeline com cargas de 50+ imagens.
  • Expor métricas via WebSocket para triagem em tempo real.
Attachment
Attachment
Attachment
Attachment
0
ChefThi

Título: HOMES-Engine 3.1 — Gemini TTS, Hybrid VTT & Integration Hardening

Commits:

  • c1fb79a — feat(core): implement Gemini 2.5 Flash TTS engine with multi-speaker
    support
  • 93cb143 — feat(video): integrate Gemini TTS with heuristic VTT generator
  • 1e053f4 — feat(integration): align queue poller with AI-VIDEO-FACTORY API specs
  • d5764e3 — chore(security): update gitignore for local simulation and fix poller
    paths

Resumo:
Sessão intensiva de upgrade da Engine para v3.1. Implementação nativa de Voz Neural
(Gemini), sistema de legendas sem timestamps e alinhamento total de segurança/API com o
backend de orquestração.

O que foi feito:

  • Gemini TTS Nativo: Substituí o motor de voz antigo pela API v1beta do Gemini,
    habilitando vozes ultra-realistas (“Kore”, “Fenrir”).
  • Legendas Híbridas (Math-based): Desenvolvi um algoritmo heurístico para gerar
    arquivos .vtt sincronizados, permitindo legendas visuais mesmo usando APIs de áudio
    puro (WAV).
  • Poller de Integração: Implementei o worker que conecta ao AI-VIDEO-FACTORY,
    ajustando endpoints e payload para o spec oficial.
  • Segurança: Blindagem do .gitignore para simulações locais e limpeza de artefatos.

Por que foi feito:
Elevar a qualidade cinematográfica dos vídeos (voz melhor) sem perder a acessibilidade
(legendas), enquanto preparo a infraestrutura para rodar de forma autônoma e segura em
produção.

Resultados / Status:

  • Vídeos gerados agora possuem qualidade de estúdio (Demo no anexo).
  • Worker pronto para testes E2E com o Backend NestJS.
  • Ambiente local limpo e seguro.
Attachment
Attachment
Attachment
Attachment
Attachment
0
ChefThi

O commit d977f4e marca a transição de protótipo para MVP full-stack. Implementei três pilares arquiteturais críticos para robustez e
UX:

  1. Persistência de Dados (TypeORM + SQLite)
    Substituí a volatilidade do navegador por um banco de dados real.
  • Backend: Implementação do ProjectsModule com operações CRUD completas (/api/projects).
  • DB: homes.db (SQLite) gerenciado via TypeORM com sincronização automática de schema.
  • Impacto: Usuários agora podem salvar, listar e retomar projetos anteriores. O estado persiste entre sessões e recargas de página.
  1. Feedback em Tempo Real (WebSockets)
    Resolvi a “caixa preta” de processos longos usando socket.io.
  • Arquitetura: VideoGateway no NestJS emite eventos de progresso (scriptProgress, videoProgress) para o frontend.
  • UX: O usuário visualiza o pipeline exato: “Gerando Imagens (3/10)” -> “Renderizando (45%)” -> “Concluído”.
  • Tech: Handshake otimizado com configurações CORS específicas para permitir comunicação Vite (5173) <-> NestJS (3000).
  1. Centralização de IA (Backend-First)
    Movi 100% da lógica de IA para o servidor, eliminando exposição de chaves no cliente.
  • Módulo: Novo AiModule encapsula geminiService.ts e serviços de TTS/Imagem.
  • Fluxo: Frontend consome endpoints REST limpos (POST /api/ai/script), enquanto o backend gerencia quotas, retries e rotação de
    chaves de API com segurança.

Stack & Métricas:

  • Novas Deps: @nestjs/typeorm, sqlite3, @nestjs/websockets, socket.io.
  • Arquivos: +8 módulos principais (ai.module.ts, video.gateway.ts, project.entity.ts).
  • Desafios Vencidos: Configuração fina de CORS para WSS e sincronização de entidades TypeORM em runtime.
Attachment
Attachment
Attachment
Attachment
Attachment
0
ChefThi

Commits (hashes):
a673d30 até f83c967
Desde o ponto marcado por acc5ab9 concluí uma série de mudanças que transformaram a base em um pipeline mais resiliente e com melhor experiência de desenvolvimento. A ênfase foi em três frentes: (1) DX / Dev Mode para testes rápidos com pacotes ZIP, (2) orquestração e fallback de geração de imagens com processamento em batch, e (3) robustez do FFmpeg e infraestrutura backend.

O Dev Mode foi melhorado: upload de ZIPs agora extrai no cliente, auto-start do fluxo e carregamento automático de script, áudio e imagens locais para acelerar testes. No frontend ajustei o form (duração padrão, seleção de bg music) e introduzi processamento por lotes (batch) para geração de imagens — isso permite paralelizar requisições e aplicar fallback simples quando uma imagem falha, mantendo a ordem final. Acrescentei timeouts e fetchWithTimeout nas chamadas a provedores de imagem para evitar travamentos longos.

Na camada de imagem, o ImageGeneratorPro e a estratégia de rotação entre provedores foram reforçados para reduzir falhas por quota (Gemini → HF → StableDiffusion → Pollinations → Replicate). Também limpei guias e arquivos antigos, reorganizei .gitignore e adicionei ferramentas para reproducibilidade (Nix idx, rescue scripts).

No backend houve evolução significativa: adicionei um módulo Projects (TypeORM + SQLite) para persistir projetos; ampliei VideoService com geração SRT dinâmica, mixagem opcional de música de fundo, probe de duração de áudio, e um grafo de filtros FFmpeg mais robusto. As correções de FFmpeg continuam (stream normalization, mapeamento explícito, reset de PTS e aumento para 30fps), além de melhorias de erro/cleanup (remoção de SRT temporário, verificações de saída). Tempo de timeout do servidor e proxy estendido para suportar jobs longos.

Resultados: pipeline gera vídeos mais estáveis (30fps, sem drops), Dev Mode permite iteração rápida com assets locais, e a orquestração de imagens tolera quedas de provedores.

Attachment
Attachment
Attachment
Attachment
0
ChefThi

Título: HOMES-Engine — Iteração Studio & Estabilizações (sessão pós-v2.1)
Data: 2026-01-06
Commits:

  • 2587dfe — feat(visuals): implement color conversion engine and update learning lab
  • a8feb18 — feat(tts): set Google Gemini 2.5 TTS as primary engine
  • 2d483ab — docs: add system architecture overview and update readme v3.0
  • f868a70 — fix(ffmpeg): standardise SAR and pixel format for concat stability
  • ae25fe9 — feat(v3.0): add Smart Assets (Image Gen) and experimental TTS via Pollinations.ai
  • bfecd9f — refactor(arch): extract ffmpeg engine and improve audit tools

Resumo: Sprint focada em estabilidade do pipeline multimídia, promoção do Gemini TTS como engine principal e melhorias visuais programáticas para THEMES.

O que foi feito:

  • Visuals: criado core/color_utils.py e refatorados temas para usar constantes RGB, permitindo paletas geradas dinamicamente.
  • TTS: integrado Gemini 2.5 Flash TTS como prioridade; tts_engine atualizado com fallback limpo.
  • FFmpeg: padronizado SAR e formato de pixel (setsar=1, format=yuv420p) para evitar erros de concat em ARM64.
  • Arquitetura: extraído ffmpeg logic para core/ffmpeg_engine.py; melhor auditabilidade e verificação de segredos.
  • Assets/IA: adicionado ImageGenerator experimental (Pollinations/FLUX) e scripts de verificação de configuração.

Resultados / status:

  • Pipeline completo funciona em ARM (concat estável, ducking e VTT testados rapidamente).
  • TTS principal configurado — testes de qualidade/latência pendentes.
  • Documentação e guia de arquitetura atualizados (Readme v3.0).

Próximos passos:

  • Parametrizar prompts do Gemini (controle de tom, gancho e extensão).
  • Automatizar geração de paletas THEMES via color_utils.
  • Criar testes end-to-end simulados (CI) para concat/ducking sem assets pesados.

Sugestões de anexos:

  • Terminal.log com prova do render (setsar fix).
  • Vídeo curto 10s mostrando tema + legenda + áudio Gemini.
Attachment
Attachment
Attachment
Attachment
Attachment
Attachment
0
ChefThi

DEVLOG DIA 1 - HOMES HUB INIT

Data: 5-6 Jan 2026 | Autor: EngThi | Repo: github.com/EngThi/HOMES

RESUMO

Hub central criado para ecossistema HOMES (4 repos).
Tempo: 10h45min | Commits: 10 | Status: 100% funcional

TIMELINE

5 Jan 23:56 - Commit b89aff2: README + scripts + estrutura
6 Jan 00:30 - Commit 42bcd55: architecture. md + strategy.md
6 Jan 00:32 - Commit a985689: LICENSE + .gitignore + .env.example
6 Jan 00:45 - Commit 2f64edb: setup-guide + integration-flow
6 Jan 10:16 - Commit 1c5f80d: 6 docs tecnicos completos
6 Jan 10:32 - Commit a41a397: Analise HOMES-Engine
6 Jan 10:36 - Commit 7b81434: GEMINI.md criado
6 Jan 10:41 - Commit 361e7c9: ROADMAP.md + . gitignore update
6 Jan 10:45 - Commit 2b05d3c: Devlog finalizado

METRICAS

Arquivos: 26 | Linhas doc: ~28k | Commits: 10 | Repos: 1/4

DECISOES

  • Multi-repo (4 separados)
  • HOMES = hub central
  • ROADMAP: Engine -> Backend -> Frontend
  • Devlogs em . txt

PROXIMOS

[ ] HOMES-Engine: api_client.py + queue_poller.py
[ ] ai-video-factory: Firebase + WebSocket
[ ] homes-prompt-manager: React + Voice

APRENDIZADO

  • Doc economiza tempo depois
  • Commits 30min ideais
  • Tirar screenshots durante trabalho

STATUS: Hub completo. Proximo: Engine integration
Usei bastante o Gemini CLI para desenvolver, auditar e desenvolver as coisas, com base nas pesquisas e ideias da Perplexity que já tinha uma ideia com base em arquivos, ideias e um esqueleto simples que tinha.

Attachment
Attachment
Attachment
Attachment
Attachment
0
ChefThi

Título: Inicialização do Voice Task Master
Data: 2026-01-05

Commits:

Resumo: Setup inicial do projeto estabelecendo a estrutura base para o MVP do gerenciador de tarefas por voz.

O que foi feito:

  • Criação do index.html básico para interface inicial.
  • Configuração do package.json com dependências e scripts de execução.
  • Definição de .gitignore para limpeza do ambiente.
  • Criação do HANDOFF.md e diretório de devlogs para documentação técnica.

Resultados: Ambiente de desenvolvimento configurado e estrutura de arquivos pronta para implementação das APIs de áudio.

Próximos passos:

  • Implementar captura de voz utilizando a Web Speech API.
  • Desenvolver a lógica de manipulação da lista de tarefas (CRUD básico).
Attachment
Attachment
Attachment
Attachment
Attachment
0
ChefThi

Título: 🚀 Hardening the Core & Subtitles
Data: 2026-01-04

Commits:

Resumo:
Hoje trabalhei para estabilizar o pipeline e melhorar o suporte a vídeos com legendas automáticas. Também refinei o ambiente de desenvolvimento para evitar conflitos futuros.

O que foi feito:

  • Legendagem Automática:
    • Gerador SRT dinâmico baseado no script gerado pela IA e timing do áudio.
    • “Queima” (hard-code) das legendas no vídeo usando FFmpeg, com estilo legível (fonte neon ciano + bordas pretas).
  • Estabilização do Ambiente:
    • Agora o backend usa ffprobe para verificar com precisão a duração do áudio antes da renderização.
    • Otimized proxy e tempo de execução do dev server (Vite) para tarefas longas.
  • Gerenciamento de Assets Locais:
    • Parou o versionamento de arquivos como GEMINI.md, mantendo-os locais apenas com exclusões no .gitignore.

Resultados:

  • Vídeos agora podem ser gerados com legendas legíveis e sincronizadas.
  • Ambiente Dev mais estável e otimizado para casos de uso local.
  • Arquivos redundantes não comprometem mais o repositório principal.

Próximos Passos:

  • Testar variados estilos de legendas para legibilidade em formatos diferentes.
  • Finalizar suporte para mixagem de áudio de fundo no pipeline.
  • Outras otimizações possíveis no fluxo de geração de legendas.
Attachment
Attachment
Attachment
Attachment
Attachment
0
ChefThi

DevLog: HOMES-Engine v2.1 – AI Studio & Arquitetura Modular**
Data: 2026-01-05 | Horas Gastas: ~6h

🚀 Commits Principais

  • 7d477d7 — feat(v2.1): Architecture Overhaul & Gemini AI Integration 🧠
  • 7fecb45 — feat: Absolute Cinema v1.6 - Dynamic B-Roll & Sinc Subs
  • 4402ddd — fix(core): Correct imports and asset management

📝 Resumo da Evolução

Reestruturei o motor para um modelo de Studio Modular. O foco saiu de scripts isolados para um pipeline integrado onde o Gemini atua como o “Cérebro” da criação, garantindo automação de roteiros e estética cinematográfica (Absolute Cinema) rodando 100% em ambiente mobile.

🛠️ O que foi implementado:

  1. Arquitetura Core: Migração para estrutura modular (core/), isolando ai_writer, render e I/O. Isso permite escalabilidade e chamadas limpas da API do Gemini.
  2. AI Writer (Gemini): Integração do núcleo de escrita. Agora, o motor gera roteiros estruturados a partir de tópicos simples, salvando o output em scripts/ para processamento imediato.
  3. Visual Engine: Implementação de efeito Ken Burns (ZoomPan) e upscaling Lanczos. Adicionei suporte a THEMES configuráveis (JSON), permitindo mudar a estética do vídeo sem alterar o código.
  4. B-Roll & Subs: Sistema de seleção dinâmica e randômica de clipes de apoio. Geração de legendas VTT sincronizadas com tratamento de escape de caracteres especiais.
  5. Áudio Pro: Pipeline de mixagem com Audio Ducking (redução automática do volume da trilha durante a voz) e introdução musical de 2s para branding.
  6. Otimização de Repo: Limpeza de arquivos pesados no Git, .gitignore reforçado e separação clara de assets/, renders/ e cache/.

📊 Status & Resultados

O v2.1 (AI Studio) já opera em Prova de Conceito (PoC): O fluxo Ideia → Gemini → Script → TTS → Render (720p) está funcional e automatizado. O repositório está leve, modular e estável.

Attachment
Attachment
Attachment
Attachment
Attachment
0
ChefThi

🚀 Devlog: HOMES-Engine Genesis & Mobile Pipeline (v0.1)

O motor do HOMES-Engine começou a rodar! O foco inicial foi estabelecer uma pipeline funcional de “Ideia para Vídeo” rodando inteiramente em ambiente mobile (Termux), otimizando recursos para garantir que a renderização não “frite” o processador do celular.

🏗️ Mudanças Técnicas:

  • Genesis da Pipeline (Termux + FFmpeg):

    • Implementação do video_maker.py, um core de renderização otimizado para Android. Utiliza o preset ultrafast do libx264 e crf 28 para equilibrar velocidade e qualidade em dispositivos móveis.
    • Criação do main.py focado em automação via Termux API. O sistema agora captura ideias via Voz (Speech-to-Text) ou Clipboard, injeta diretrizes de branding (“Absolute Cinema”) e gera prompts prontos para o Gemini.
    • 9550b44 - 🚀 INIT: Genesis of HOMES-Engine
  • Refinamento de Core & Identidade Visual:

    • Fix de Importação: Corrigidos typos críticos no main.py que impediam a execução do script no ambiente Python do Termux.
    • Assets de Marca: Adição da fonte Montserrat-ExtraBold na pasta assets/. Ela agora é injetada via filtro drawtext do FFmpeg para garantir que as legendas tenham impacto visual cinematográfico.
    • 4402ddd - fix(core): correct import in main.py and add assets

💡 Por que isso importa?

Diferente de editores pesados, o HOMES-Engine é focado em headless production. A modularidade do main.py permite que o roteiro gerado seja salvo localmente e enviado automaticamente para o clipboard, agilizando o workflow de criação de vídeos faceless sem sair do terminal.

Status: PoC validada. Próximo passo: Automação da montagem de B-Rolls. 🚢🔥

Attachment
Attachment
Attachment
Attachment
Attachment
Attachment
0
ChefThi

Shipped this project!

Hours: 0.27
Cookies: 🍪 1
Multiplier: 2.12 cookies/hr

Hoje eu shippei o HOMES: Neural Deck 🚢, um aplicativo web projetado para revolucionar a criação de roteiros para vídeos usando inteligência artificial 🚀.

O que é?
É uma ferramenta que utiliza a poderosa API Gemini 2.5 Flash para gerar roteiros cinematográficos completos, incluindo sugestões de hooks, B-rolls e sound design, tudo isso em uma interface linda com uma estética cyberpunk neon. 🧠✨

Como funciona?

Digite um prompt, e o HOMES gera um roteiro automatizado que você pode salvar no Memory Bank, uma funcionalidade que armazena os resultados direto no navegador (localStorage).
Com o Text-to-Speech, é possível ouvir seus roteiros antes mesmo de gravar, tudo com um visualizador de áudio animado em CSS para uma experiência interativa.
O que eu aprendi?
Durante o desenvolvimento, aprendi a integrar APIs complexas com eficiência para trabalhar com prompts, melhorei minha habilidade em design responsivo, e criei animações visualmente bonitas usando apenas CSS. Mais importante, descobri como a automação e o design podem colaborar para inspirar e impulsionar a criatividade de criadores de conteúdo.

🖥️ Espero que outros criadores gostem tanto de usar o HOMES quanto eu gostei de desenvolvê-lo!

ChefThi

Hoje realizei grandes avanços no projeto HOMES: Neural Deck, criando e refinando funcionalidades cruciais para sua versão inicial. Cada passo foi pensado para entregar uma experiência única e imersiva ao usuário. Aqui estão as atualizações construídas:

Commits e mudanças recentes:

  1. Text-to-Speech e visualização de áudio
    Commit: Add native Text-to-Speech (TTS) with audio visualizer and final polish (v4.0)
  • Adicionado: Um Text-to-Speech nativo no navegador, permitindo que os usuários ouçam os roteiros gerados.
  • Estética aprimorada: Visualizador de áudio animado em CSS, sincronizado com a fala.
  • Código organizado com atenção à experiência do usuário (UI & UX).
  1. Integração com API Gemini e gestão de histórico
    Commit: Implement Gemini API integration and local history storage (v3.0)
  • Gemini 2.5 Flash API: Integrei a API para gerar roteiros otimizados, estruturados, e cinematográficos.
  • Histórico Local: Agora os roteiros gerados são salvos automaticamente no localStorage, acessíveis pela interface do “Memory Bank”.
  • Melhorias no layout: Interface dividida em duas colunas para facilitar a navegação do usuário.

Reflexões e aprendizado

  • Revisei o ciclo de desenvolvimento com integração de APIs grandes como a Google Gemini, aprendendo melhor sobre autenticação e manipulação eficiente de dados retornados.
  • A otimização do Text-to-Speech e seu visualizador CSS foi um exercício incrível de mesclar tecnologia de voz com design dinâmico.
  • Adotei práticas mais rigorosas de organização de código, documentação e testes, assegurando um produto final funcional e polido.
Attachment
Attachment
Attachment
0
ChefThi

🛠️ O que foi construído hoje
De acordo com os commits mais recentes no repositório:

  1. Integração da API Gemini para geração de roteiros
    Commit: Implement Gemini API integration and local history storage (v3.0)

Conexão direta com a API Gemini, que agora gera roteiros otimizados com:
Hooks iniciais para capturar a audiência.
Dicas de B-rolls e efeitos sonoros.
Interface reorganizada em um design de duas colunas, permitindo que o usuário veja os prompts e o histórico ao mesmo tempo.
Criado o recurso Memory Banks: uma barra lateral que salva e organiza os roteiros, usando localStorage.
2. Adição da funcionalidade Text-to-Speech (TTS)
Commit: Add native Text-to-Speech (TTS) with audio visualizer and final polish (v4.0)

Voz para os roteiros: Agora o usuário pode ouvir os roteiros gerados no navegador.
Incluído um visualizador de áudio animado em CSS, que sincroniza com a reprodução do texto, dando vida ao conteúdo.
3. Preparação e Lançamento da Versão 1.0
Commit: Launch v1.0.0 - The ‘Neural Deck’ Update 🚀

Polimento final em toda a interface: toque de futurismo, acessibilidade e responsividade.
Atualizações no arquivo README.md, com instruções para utilização do projeto.
Publicação como Versão 1.0.0, marcando a conclusão do ciclo inicial de desenvolvimento do projeto.

Attachment
0
ChefThi

O que eu shippei:

  1. Dynamic Motion (Ken Burns): Vídeos estáticos são chatos. Implementei filtros complexos no FFmpeg (zoompan, crop, scale) para dar
    movimento (pan & zoom) automático a todas as imagens geradas pela IA. Agora parece um documentário real, não um slide de
    PowerPoint.
  2. Robust Image Orchestrator: O pipeline estava quebrando quando a API do Gemini dava rate-limit. Criei um sistema de Fallback em
    Cascata: se o Gemini falhar, ele tenta HuggingFace, depois Stable Diffusion, Replicate e finalmente Pollinations. O vídeo sempre
    sai.
  3. DX (Developer Experience): Testar pipeline de IA é caro e lento. Criei um “Dev Mode” que injeta assets locais (ZIP) direto no
    pipeline, pulando as chamadas de API. Isso acelerou meu ciclo de testes de 2 minutos para 10 segundos.

Stack: React + NestJS + FFmpeg + Gemini 2.5 Flash.

Novas atualizações shippadas:

  1. Instant ZIP Pipeline: Implementei um sistema de “Auto-Start”. Agora, ao selecionar um arquivo ZIP com assets pré-gerados, o
    sistema detecta os arquivos, faz o upload e inicia a montagem do vídeo automaticamente. Menos cliques, mais velocidade. ⚡
  2. Smart Validation Bypass: Removi a obrigatoriedade de inputs de IA (como o tópico do vídeo) quando o Modo Dev está ativo. O sistema
    entende que os assets locais são a “única fonte da verdade”, limpando a interface de campos desnecessários.
  3. Local Asset Mapping: Melhorei a lógica de extração no backend para garantir que, independente de como o ZIP foi estruturado, o
    pipeline localize corretamente o script, áudio e o storyboard.
  4. GitHub Push Protection: Tivemos um pequeno susto com um segredo detectado pelo GitHub, mas resolvi via git reset e reescrita de
    histórico para manter o repositório seguro e limpo. 🔒
Attachment
0
ChefThi

O que foi feito hoje:
Integração com múltiplos provedores de imagem ( 4816e90):

Adicionado suporte para Gemini Imagen 3 , Hugging Face , Stable Diffusion , Craiyon e Replicate .
Criado o componente ImageGeneratorPropara geração avançada de imagens.
Adicionadas novas bibliotecas e atualizações de serviços auxiliares ( pollinationsService.tse imageService.ts).
Melhoria na interface do usuário ( 6d24499):

Substitui o controle deslizante de duração pela entrada de valores numéricos e predefinidos, simplificando o uso.

Attachment
0
ChefThi

Hoje foi um dia crucial na configuração final do AI Video Factory , meu projeto para o Flavortown.
Concluí configurações importantes para garantir que toda a estrutura do pipeline seja funcional, desde a entrada de dados até a geração automatizada de vídeos.
TRABALHEI EM ALGUMAS COISAS MAS ESQUECI DE GRAVAR O PROGRESSO. É MAIS OU MENOS ISSO QUE FIZ.

Usei o Gemini CLI para me dar me guiar e ir desenvolvendo as coisas enquanto organizava.

O que foi realizado hoje:
Configuração inicial e documentação ( a673d30):

Ajustei a base do projeto, garantindo que tanto o backend quanto o frontend estejam funcionando em harmonia.
Atualizei o README.mdpara incluir:
Guia completo de instalação local com suporte ao Docker.
Passo a passo sobre o uso do pipeline de automação.
Documentação detalhada dos endpoints da API de IA (geração de roteiro, visual e narração).
Estrutura do projeto e refinamento para o Flavortown ( 7b536d71, 6eda3fba):

Organizei melhor a estrutura de pastas e otimizei a configuração do Dockerfile para evitar conflitos no ambiente de execução.
Corrigidos pequenos bugs encontrados durante os testes de build do Docker e execução local.
Correção de erros durante os testes ( 6a03c43f):

Ajustei variáveis ​​de ambiente no .env.examplepara facilitar integrações futuras.
Solucionei problemas com as dependências relacionadas ao FFmpeg e integração com a API Gemini .

Attachment
0
ChefThi

Hoje avancei na estruturação do projeto AI Video Factory para o Flavortown!

Conquistas de hoje:

Estrutura inicial: Organizando pastas para Backend (NestJS) e Frontend (React + Vite).
Configuração: Adaptei variáveis ​​de ambiente e integrais ao FFmpeg ao pipeline.
Documentação: Completo README.mdcom o diagrama do pipeline e instruções para rodar o projeto.
Próximo passo: Finalizar a integração de scripts e narração para gerar o primeiro vídeo automaticamente!
Commits de Ontem (27 de Dezembro de 2025):
76329a9- Revise o arquivo README com detalhes do projeto e instruções de configuração.

O que foi feito:
Atualização completa do README.md:
Resumo do projeto
Recursos e pilha tecnológica usados
Passo a passo para instalação e configuração
Pipeline do projeto do início ao fim
Documentação dos endpoints da API
ff55797- Primeiro envio dos arquivos

O que foi feito:
Subida inicial do projeto:
Estruturação básica de pastas e arquivos.
Subiu o esqueleto do frontend e backend.
Incluiu arquivos como Dockerfile, .env.example, .gitignore.
Compromisso de Hoje (28 de Dezembro de 2025):
a673d30- Tarefa: configuração inicial do projeto e documentação para Flavortown
O que foi feito:
Ajustes finais para a configuração do projeto.
Melhorias na documentação, adaptando o projeto para o concurso Flavortown.
Preparação de ambiente local e explicação para desenvolvedores externos.

Attachment
0
ChefThi

O que foi feito: Ontem foi o “Big Bang” do projeto AI Video Factory. Eu foquei em estabelecer toda a fundação técnica para transformar um tópico qualquer em um vídeo completo para o YouTube de forma automatizada.

Destaques técnicos dos commits:

Subi os arquivos base do que espero ser o projeto. Sendo a estrutura base de tudo o que vou desenvolver.

Documentação e Setup: Finalizei o dia revisando o README.md com todos os endpoints da API (ideação, script, narração, montagem) e as instruções de setup via Docker, garantindo que o projeto seja replicável e “shipável” — bem no espírito do Flavortown.

Commit ff55797 (First push of the files):
Subiu o “coração” do projeto.
Estrutura de pastas separando Frontend e Backend.
Configuração de ambiente (.env.example) e arquivos de container (Dockerfile).
Commit 76329a9 (Revise README with project details):
Detalhamento da Pipeline Architecture.
Exposição dos endpoints /api/ai/ e /api/assemble.
Guia de instalação completo para quem quiser testar a “fábrica”.

Attachment
0