Booktures: Multimodal AI Book-to-Visual Pipeline banner

Booktures: Multimodal AI Book-to-Visual Pipeline

4 devlogs
8h 15m 18s

I’m tired of books not having enough pictures, so I’m building a “visualizer engine.” It’s a FastAPI app that reads your PDFs and uses Generative AI to “paint” the story as you read. My goal is to maintain character consistency across 100+ pages using smart prompting and context-chaining. Shipping a slick Next.js frontend to view the book and art side-by-side. (NOTE: IF you visit the site https://booktures-snowy.vercel.app then wait for image to load for one page before going to other page if it dont load then move back and forth)

This project uses AI

I integrated Gemini into my workflow to handle the heavy lifting of boilerplate code. By using AI to scaffold the initial structures, API routes, and repetitive components, I was able to bypass the “blank page” phase and focus my energy on the complex logic—specifically the client-side PDF parsing and the custom fallback systems. This allowed me to move from concept to a fully deployed Vercel application much faster without compromising on code quality.

Loading README...

proffessors2807

Shipped this project!

I’ve officially reached the finish line with Booktures! This project started as a simple idea for an AI-powered reader, but it quickly evolved into a complex challenge in engineering resilience and speed. My goal was to create a digital terminal that doesn’t just display text, but actively “sees” the story alongside the reader, translating paragraphs into high-fidelity visuals without the usual lag associated with AI generation.

The most transformative part of this build was mastering API interaction. I hit a wall early on with rate limits and slow response times from single providers. To solve this, I designed a “Parallel Racing Engine.” I learned how to handle multiple asynchronous streams simultaneously, firing requests to Groq, Gemini, and Hugging Face all at once. Building the logic to detect the “winner” and handle failovers taught me more about production-grade backend logic than any tutorial ever could. It’s no longer just a wrapper; it’s a resilient system that refuses to break.

On the design front, I took a massive leap forward by learning Tailwind CSS from the ground up. I moved away from messy, hard-to-maintain stylesheets and embraced utility-first design to craft a “Tech-Noir” aesthetic. I spent hours fine-tuning the split-screen viewport, custom scrollbars, and glass-morphism effects to ensure the UI felt like a futuristic piece of hardware. Seeing the progress from a broken layout to a sleek, responsive dashboard was incredibly satisfying. This project pushed my limits in both logic and design, and I’m hyped to finally share the working engine!

proffessors2807

I’m excited to announce that Booktures is officially deployed and accessible to the public! You can check it out here: 👉 booktures-snowy.vercel.app

This update marks a major shift in the app’s architecture and stability. Here’s what went down in this sprint:

🌍 Deployment
The app is now hosted on Vercel. I chose Vercel for its seamless integration with the frontend stack, lightning-fast edge network, and incredibly easy deployment pipeline. Every push to main now automatically updates the live site.

🏗️ Architecture Shift: Client-Side PDF Parsing
Previously, PDF processing was handled on the backend. While functional, it introduced latency and increased server load.

The Change: I’ve moved the PDF parsing logic entirely to the client side.

The Benefit: By leveraging the user’s local hardware, we get near-instant results, reduced bandwidth usage, and a more private experience since the document doesn’t necessarily have to leave the browser for initial processing.

🛡️ Enhanced Reliability & Fallbacks
Moving to the client meant I had to account for different browser environments and potential processing failures.

Robust Fallbacks: I’ve implemented a multi-tiered fallback system. If the primary parsing method fails (due to a complex PDF structure or unsupported browser feature), the app gracefully switches to secondary extraction methods.

Error Handling: Users now get clear feedback if a file is corrupted or protected, rather than the app simply hanging.

Attachment
2

Comments

Zach Wilkinson-Rowe

Does that mean you are ready to ship?

proffessors2807

Shiped!!

proffessors2807

I’ve been working on this engine like crazy, and it’s come a long way from just being a basic PDF reader. In the beginning, I had it hardcoded to just draw one guy named Jax, but I realized that was way too limited. Now, I use Groq as a “Visual Director” to actually read the text and figure out what’s going on—like if it’s a person, a history lesson, or just a bunch of data.

The biggest headache was definitely when the Hugging Face servers started blocking me because I used the free limit too fast. To fix that, I built a “racing” system. Basically, I make the app ask both Hugging Face and Pollinations.ai for an image at the same exact time. I usually wait for the high-quality one from Hugging Face, but if that fails (which it does a lot lately), I just grab the Pollinations one so the screen never stays blank. I even added a 20-second timer so the app doesn’t just hang there forever if the internet is acting up.

I did run into a few bugs—like once I forgot to clear the timer and the app crashed, but I fixed that! Now, if the engine “overheats” or your Wi-Fi dies, you get a clear error message in a cool red box with a button to try again. It feels much more solid now, like a real tool instead of a buggy prototype.

Attachment
Attachment
0
proffessors2807

In today’s sprint, I successfully overhauled the Booktures core interface, transitioning from a basic layout to a high-fidelity, “tech-noir” upload experience. I implemented a custom Tailwind v4 theme that leverages color-mix for dynamic surface depths and a neon primary-gradient for the brand’s “Ignite Engine” aesthetic. The centerpiece is a sophisticated, interactive upload zone where I designed a “half-in, half-out” floating button anchored via absolute positioning and a group-hover logic that triggers a synchronized purple bloom effect. By hiding the clunky browser-default file inputs and utilizing CSS variables for consistent border glows, I’ve created a tactile, premium UI that feels responsive and ready for the intensive PDF processing to follow.

Attachment
Attachment
0
proffessors2807

Today was all about setting the mood for Booktures. We moved past the blank-slate phase and gave the app its soul, landing on a deep, cinematic dark mode that makes the colors pop like a neon sign in the rain. The upload screen isn’t just a utility anymore—it’s an invitation. We crafted a high-end interface where users can drop their manuscripts into a “digital fire,” trading boring buttons for a bold, interactive experience that feels more like starting an engine than filling out a form. The stage is set, the atmosphere is heavy, and the “kitchen” is officially prepped for the first chapter to start cooking.

Attachment
0