So, I asked the AI to write the code for an API integration. I gave clear instructions use Gemini’s API, specifically the 3.1 Lite model because it’s fast. I also asked it to create a settings page with an option to input the API key.
The AI generated the code, but it got the Gemini model name wrong. I mean does Gemini not even know its own sibling’s name?
So I had to manually go into the documentation, find the correct model name, and fix the code myself.
Then came the next issue. When I tried to get transcription working, it kept failing. The problem is that I’m using Gemini’s large language model, not a specialized model like Whisper, which is built for speech-to-text. So the LLM tries to be “too smart” and ends up messing things up.
For example, when I say “I,” it sometimes converts it to “you,” like it’s responding instead of transcribing. It feels like it’s having a conversation instead of doing its job.
Now, why not just use Whisper? Because of cost, and I don’t want to run it locally since my laptop will get fired as it’s cooked right now.
So then came the dangerous part- prompt engineering.
I spent about half an hour refining prompts, trying to clearly explain what I wanted. Eventually, I got something decent. I added options so I could speak in any language, but the output would always be in my chosen language for example, speaking in Hinglish but getting output in English.
That part worked.
But the main problem remained the AI kept getting confused or trying to overthink things.
So I had to tighten the prompts a lot to control its behavior.