Research AI With Voice
Got help from Copilot for debugging, especially tricky audio errors, but I started the base myself.
Research AI With Voice
Got help from Copilot for debugging, especially tricky audio errors, but I started the base myself.
Mainly just patches and README
The theme has had a complete makeover and I have replaced all of the Chainlit defaults with Rawv. Now the AI chat avatar and favicon is custom, itās a smiling microscope. Like the icons Iāve made for past projects, I used emoji kitchen and combined smile and microscope. And now the theme is new. Thereās a baby blue and silver theme. When I first got it the chat text that you would type blended in with the chat box color - also white - so I needed to change that manually to prevent the clash.
Before, because of research settings, the thread would start early for no reason, and so what would end up happening is that once the app loads, the chat window would pop-up in one second. But now itās been stabilized to not send signals immediately, and there are options in the page itself. There is a start button and a button to enable and disable research mode. And of course I have the microscope mascot to bring character to the page.
Has all the tech stack, research flow, used models, ai transparency, and future upgrades. For reference, the core brain model is llama 8b, and the TTS model is en-US-AriaNeural and the STT model is whisper-large-v3-turbo. And the main python libraries in the tech stack are Langchain for wiring those models up and the main logic - backend, and then Chainlit for frontend designed for chatbots like Rawv. And of course the README talks about transcribe -> search -> browse -> synthesize -> quality check -> speak
I should be ready to ship soon
Log in to leave a comment
One of the largest times Iāve taken in 1 devlog, so Iām gonna use markdown
I made many changes to the core loop. The research has switched to DuckDuckGo mainly and the stuff that I had before in running chrome has finally made its way over to the main app. I already finished all of the heavy lifting in the other directory so this was just plugging it in. chainlit has prebuilt tools so I could easily display āresearching nowā - just like audio.
The summary is what is spoken
I have other projects linked like āfrosty-studyā and other seemingly unrelated projects. But that is not the case. They main ideas in rawv come from them. I built a language audio app named frosty and this helped me master the back and forth audio loop. and the other projects were tries for a model provider. I tried Github models by using a github PAT token and then sending requests to their endpoint with my token. But it was too slow and the rate limits were too low and it was too complex to set up. So I just stuck with groq. But if groq fails I have github models
Very great. From the chain of thought to the little icons But I need to mostly think chainlits pre-built parts for this. Because I did not go and hunt for this myself, and the icons and spaces were already there. And for each research step I used emojis like I always like to do likeš¬šfor a polished feel.
I needed this for the extra work I did and Iām proud of this devlog
Log in to leave a comment
I actually got working on the research part. I recovered the files from the Running Chrome and got the Chrome navigation working again. I cleansed the full project into a couple files that are the main project. I used to use Firecrawl for search but my free credits ran out so I had to switch to DuckDuckGo which was free. Most of the core pieces were connected with uid capture and js structuring and safe parsing and structure.
Log in to leave a comment
Add just added slight improvements - no big changes. I updated the theme and the way Chainlit works so it replaces most of its default Chainlit assets. I also added a custom avatar. This required changing the place where Chainlit looks and adding a new place for it to look by updating config.toml and adding a theme.json. Other than that I changed the llm and was experimenting with different models.
Log in to leave a comment
I got the UI working with the voice. I am using Chainlit for the frontend. Last project I used Gradio and then it imposed its strict style restrictions, so I think the switch to Chainlit is a good choice. The biggest problem that I faced here was getting speech info from frontend and back. It starting returning mime types and bytes and just incompatible formats because the frontend and backend dealt with audio differently. The fix was converting it again and again and guessing formats if nothing else works, but that process worked.
Log in to leave a comment
Just continuing startup and basics. I finally organized the Web Project so now only Version 3 is actually usable, and I have renamed some things. It was an old project I coded a year ago with vital code for web research, but now that I have organized it I will only touch it once I need the web stuff again. I have just started with the text to speech and speech to text, the text to speech is easy but speech to text is a little bit more troublesome. I am going to connect them.
Log in to leave a comment
I just started the project and am trying to layout the starting base code. I have scratch work in an other unimportant vs code project called āProjectā because I wanted to avoid what I did last time which was having lots of files with -testing at the end that were just junk and confusing. The other project is for controlling chrome, but the problem with it is that it is disorganized so I need to get the chrome source code and understand that project. Right now I just need to get these messy folders working
Log in to leave a comment