a tool that would help me do other stuff while im in online lessons
(takes a screenshot, gets the answer from hcai, uses a local model to clone my voice and say the answer in my voice in a zoom meeting)
- f5 cloning debugging, voice model setup.
a tool that would help me do other stuff while im in online lessons
(takes a screenshot, gets the answer from hcai, uses a local model to clone my voice and say the answer in my voice in a zoom meeting)
built a tool that would help me do other stuff (or just be lazy) while im on online lessons, its a simple python cli that lets you take a screenshot of a question, sends the image to gemini on hc ai, gets the answer, and then uses f5_tts_mlx which is a local voice model, used to clone your voice, and use apple script to activate zoom, unmute your mic, and then say the answer in your voice! I am pretty happy with it, cuz it was a fun project to build on!
made a executable using pyinstaller, and the project would only work on macos, as it uses specific macos only utils like screencapture, and applescript! i hope the readme is well descripted, and makes it easy to run, fixed bugs wihth the release, it now works! a venv is needed with f5_tts_mlx installed, the readme better mentions it all.
Log in to leave a comment
made a simple tui interface using rich, i did have to change the two files a tiny bit, to make imports from the cli work properly, i dont plan on really making this project more advanced, would probably ship after packaging this, also rich is fire.
Log in to leave a comment
so, basically this is a tool which would make life very easy, I just take a screenshot of a question, then the question is passed on to hcapi, and it gets the answer, clones my voice, says the answer in my voice.
so far, i have the basic loop ready, where i take a screenshot (native macos screenshot
), and the image gets passed on to the hc api gemini model, and it gets the answer. and then the model says the answer. the model is a bit finnicky, but it works.
Log in to leave a comment