Activity

ray2009zhu

I finished adding RAG for clubs search! It uses BM25 and embeddings to retrieve relevant club meetings, then the scores of all the meetings of a given club are aggregated to return club rankings.

Originally, I went with just embeddings, but realized that it gave a lot of garbage data when searching exact keywords (i.e. the test data for an unrelated club came up higher than a club for AI when searching “ai”), so I researched and found out about using BM25.

I also realized that relying on OpenAI to store embeddings cost a lot per query, so instead I stored embeddings on my own and queried their embeddings API for embeddings.

I also continued with some code cleanup such as doing renaming on variables and table columns (i.e. “club_id” instead of “id”), and moving off of using SQL queries directly in routes, but rather calling methods in Club/Meeting classes, which use SQL queries in there.

Attachment
0
ray2009zhu

This project originally was for a school project that has already been submitted last month, but I wanted to continue working on it on my own time to clean it up and add features I couldn’t finish by the project’s submission deadline. (I no longer have any obligation to work on the project, and it will not be submitted to anything school-related in the future.) The end goal of the project is to deploy to my school and hopefully people will use it!

I migrated the site’s MySQL from the database provided by the school to a database hosted by TiDB (so that I can manage both a school-version and a public-version of the site), and started on a semantic search tool, which submits clubs + meetings to OpenAI vector stores and queries the API for matches. Migration was kind of a pain in the ass.

Attachment
0