A Simple STT(Speech To Text)

Azeem worked on A Simple STT(Speech To Text)

about 1 month ago

0h 23m logged

So, I asked the AI to write the code for an API integration. I gave clear instructions use Gemini’s API, specifically the 3.1 Lite model because it’s fast. I also asked it to create a settings page with an option to input the API key.

The AI generated the code, but it got the Gemini model name wrong. I mean does Gemini not even know its own sibling’s name?

So I had to manually go into the documentation, find the correct model name, and fix the code myself.

Then came the next issue. When I tried to get transcription working, it kept failing. The problem is that I’m using Gemini’s large language model, not a specialized model like Whisper, which is built for speech-to-text. So the LLM tries to be “too smart” and ends up messing things up.

For example, when I say “I,” it sometimes converts it to “you,” like it’s responding instead of transcribing. It feels like it’s having a conversation instead of doing its job.

Now, why not just use Whisper? Because of cost, and I don’t want to run it locally since my laptop will get fired as it’s cooked right now.

So then came the dangerous part- prompt engineering.

I spent about half an hour refining prompts, trying to clearly explain what I wanted. Eventually, I got something decent. I added options so I could speak in any language, but the output would always be in my chosen language for example, speaking in Hinglish but getting output in English.

That part worked.

But the main problem remained the AI kept getting confused or trying to overthink things.

So I had to tighten the prompts a lot to control its behavior.

Comments

Azeem about 1 month ago

what I can only upload 2 images oo no. well next thing going to do is add a text field so it can also make raw text look good, and then fix some more things and ship it I hope

Azeem worked on A Simple STT(Speech To Text)

about 1 month ago

1h 2m logged

when I wanted to make an STT for the React application, I thought, let me be a bit lazy. You know, I’ll just tell the AI to make it for me. set up the project. But the AI is so dumb that it couldn’t even set up React. So, in the end, I had to go and set it up manually.

Then, once it was set up, I told the AI, “Hey, can you write the code for a basic page?” And it wrote it. Of course, I always do this. I get the base-level code from the AI and then I go in and change it, you know? Like, tweak it, perform a lot of tweaks, and remove or add things. That’s how I build it. Once I have the base, it becomes easy to do things. it’s like clay building,You take a clay base and then mold it. yes, currently I am doing exactly that.

Comments

da.superman2775 about 1 month ago

wow super cool dude!

Azeem about 1 month ago

Thanks do.superman

0 Followers

Comments

Comments