Kyrgyz Traffic Rules | Spring AI banner

Kyrgyz Traffic Rules | Spring AI

2 devlogs
5h 52m 10s

Spring AI RAG system which answers user’s questions referring to the Kyrgyz traffic regulations.
Technologies used: ReactJs, Spring, GeminiAPI, PGSQL, PGVectorDB

This project uses AI

For this project, I engineered a Retrieval-Augmented Generation (RAG) system using Spring AI to provide intelligent, context-aware answers regarding Kyrgyzstan’s traffic regulations.

The AI architecture consists of two main pillars:
Vector Embeddings & Semantic Search: I used the text-embedding-004 model to transform the official PDD regulatory PDF into high-dimensional vectors. These are stored in a PGSQL database with the pgvector extension, allowing the system to perform “semantic” lookups rather than simple keyword matches.

Generative Intelligence: When a user asks a question—the system retrieves the most relevant legal text chunks from the vector store. These chunks are fed into Google Gemini (2.0 Flash), which synthesizes a NL response grounded strictly in the provided legal context.
zadex0tizirovan

Taming Spring AI, PgVector & Gemini to build an AI Traffic Lawyer

Hey everyone, ivyro here. I’ve been grinding on a backend project(actually forgot to post this devlog, but anyways): an automated AI legal assistant for Kyrgyzstan’s Traffic Rules (PDD KR). The goal is to let users ask everyday driving questions and get official legal answers with exact rule citations.
🐛 Bug 1: The “429 Resource Exhausted” Death Loop

Initially, I used a standard TokenTextSplitter to ingest the PDD Word document. It just blindly chopped the text every 800 tokens.
The result: I was sending massive, messy walls of text as context to Gemini. I instantly hit the API’s token limits and got greeted with HTTP 429 errors.

🛠️ The Fix: Semantic Regex Parsing
Legal documents are highly structured. I tossed the dumb splitter and wrote a custom TextParserService. Using Regex (^(\d+\.\d+(\.\d+)?\.?|\"[А-Яа-я].+\"|\d+\.\s).*), I parsed the document line-by-line, splitting it perfectly by actual legal articles and definitions. Instead of sending 3000+ tokens of random text, the app now only sends the exact 1-2 rules needed. 429 errors vanished.
🧠 The Big Brain Play: Query Expansion

Even with perfect data, the RAG was returning “I don’t know” to simple questions.
The Problem: A user asks, “Did the bus driver break the rule?” but the legal document says route transport vehicle. The embedding distance was too far apart (Semantic Gap). By the way, switched data format from word to txt because the size of file is decreasing fr

🛠️ The Fix:
I implemented Query Expansion right in my ChatController.
Before hitting the database, I intercept the user’s prompt and send a lightning-fast request to Gemini: “Rewrite this user query into legal PDD search terms without answering it.” Gemini translates “bus” to “route transport vehicle”. Next steps are to fix and polish the project

Attachment
0
zadex0tizirovan

So recently I was having my way home on a bus, but the driver breaks the traffic rules, or not? Idk, but I eventually got an idea of building a project which has the Kyrgyz Traffic Rules in docx format, that splitts for pieces and vectorizes, then the AI model takes the data and answers user’s question applying on the context. This all called Retrieval Augmented Generation (RAG). So as I’m a java dev, I decided to build this project on Java plus using free Gemini API and I did. But for Java Spring AI Framework, I could code this all on python

Attachment
0