My own Jarvis banner

My own Jarvis

12 devlogs
27h 38m 28s

I want to create fully local AI assistant that will be copy of the J.A.R.V.I.S from the Iron Man movie. He will be helping me with every day tasks and just don’t let me to be bored.

This project uses AI

I used copilot to discuss libs that I can use and also to debug part of the project that related to sound input and output - it was hell.
And last usage - prepare for the pipx configuration(yeah yeah, I decided to return to pipx path.

Demo Repository

Loading README...

lusparkl

Now available on the pypi!

Attachment
0
lusparkl

Made final changes to the code base, so now Jarvis should work on any computer. And I made a full trailer for v0.1!! I’ve never created stuff like that before, so this 2 minute video took me 3 hours. But I think it look pretty nice and can touch hurt. This was really great time, but after the ship I will probably take break from jarvis project for like a week to create some stuff that I wanted to create previously. I already have huge plans on what I’ll do with v0.2, but I wont tell yeh hehe. And now shiiip!

Attachment
0
lusparkl

I’ve spent like a 5 unloged hours trying to setup everything to use pipx and understood - this project just not for this. There is too much troubles to setup it as a pipx package, because this project needs lot’s of customization and setting. So I fallen back to the github repo downloading. Now I’m setting up scripts and optimization so It will be really easy to setup it on your pc.

Attachment
0
lusparkl

I’ve added preferences functional. Now Jarvis remembers some basic facts and preferences about you, that helps to respond to questions a lot. It uses internal db that I created previously. Also I reworked config file, now it has clean structure and comments for each variable. Now I’m starting to work on publishing to the pipx and easy setup. There is probably 1-2 to devlogs before I finally finish v0.1!!

Attachment
0
lusparkl

I’ve added 2 new tools: 1 - interaction with user clipboard, it makes communication with Jarvis much easier. 2 - todo, now Jarvis can create, read and delete your todos, so you won’t forget anything, for this I created internal db - we can use it later for anything that we need just by creating new table. Also I rewrited readme, so now it looks almost solid and there is already some instruction how you can use Jarvis. Now I’m going to work on easy setup, so you’ll be able to setup everything in less than 10 munutes!!

0
lusparkl

I’ve added some sounds so Jarvis now feels more alive. It plays default sounds when the wake word detects “Hey Jarvis” and when we saving the chat, so we don’t need to see terminal to understand what’s happaning now. Also when it gets a user message it plays some of ready to use phrases like “Calculating” or “Working on it”, so our long latency don’t feel so long now. Also added him tools to fetch weather so now we can ask him about it, but sometimes he can choose wrong location like Kiev instead of Kyiv, but I’ll work on it!!

0
lusparkl

I cleaned project structure and added some logging(still haven’t finished with that). Then I started to test how Jarvis works and found really big issues with transcription and memory tools: 1. It transcribed some of things that he said, and then looped the chat speaking “with itself”. 2. When we retrieved memory there was too much context and he started to think that it isn’t just previous chat, but that it’s the chat we’re in and continued ended conversations.
I fixed both of this and now it works better than it was, but I understood that there is still lot’s of work till v0.1 is shipped😢

0
lusparkl

That was really lot’s of work, unloged work. I’ve set up tts(Yeah it sounds kinda weird and also it’s slow, but if it must be local it’s the best set up I’ve created so long), changed GPT model because previous wasn’t working with tools and thinking, set up memory for Jarvis using Chroma DB - best tool for this. Created lot’s of helper functions for all tasks that you can think off. Created audio player module that is working with tts on multithread(that was hard stuff for me to understand, but it saves us lot’s of responce time). And I tried to clean project structure, but there is still lot’s of things that I can do.
Talking about technical side - v0.1 is ready, so now I’m starting to clean the code, may be fix some bugs and prepare it for shipment. I’ll try to make devlogs more frequently cuz this one is too big.

0
lusparkl

This was a hella amount of work. I finished stabilization of speech transcribtion and connected local AI model that will respond to me, that was easy part. But now I’m literally for 12 hours trying to find a best TTS model/api to do work for me. I tried to fine tune model, but it was very bad idea with electricity problems in my country, then tried QWEN TTS - too slow. Now I’m writing it while downloading xtts model files, I really hope that it’ll work, cuz it’s already 11pm

Attachment
1

Comments

lusparkl
lusparkl 28 days ago

He finally has a voice!!! Tommorow I’ll show what’s new, hope I’ll be able to build first working pipeline.

lusparkl

Finaly! I added wake word functional, so now it only responds when I say “hey Jarvis”(just “Jarvis” works too). It works pretty well and even when I have background noice or music it captures this wake word.
Now I’m starting to work on real AI model that will run on my pc and give me responces, also I want to give it a voice of the real Jarvis from movies, so I’ll probably use some AI model to replicate the voice.
See you soon!!

0
lusparkl

Now Jarvis can hear me in realtime!! That was kinda hard to design, but easy to realize. I think I should play with models and decide which is better for me. In future it’ll be hard to combine lot’s of models on my single pc, but i’ll come up with some idea, I’m sure. Also my English is problem for transcribing, cuz model can’t understand what I’m saying, but I’ll fix it too!!

0
lusparkl

Worked on trying differet transcribing models, stopped on faster-whisper and then spent lot’s of time on installing drivers and everything else for it. Next step is to create alghorithm so it’ll transcribe me in real time and start to write promt after hearing word “Jarvis”.

0