Alright the prototyping phase has been complete. Over 250k images were indexed and stored using clip in a giant vector database. This allows for the entire image database to be searched in fractions of a second. A video index of 100k vids was also indexed. A custom agent implementation was created although im considering abandoning it for traditional agentic features. Git commits are few because lancedb creates like a bajillion files and git crashes whenever i try to push. Resolving this soon. Currently working on rag generation and diagram creation. Currently debating weather to use ollama or gain more control by using the transformers library. Currently working on creating more and more tools for the agents. Also i tested around 20 different llms and i think Ive settled on glm 4.7 flash as the main model because of its high reasoning, prompt coherence, tool usage, and moe nature makes it fast. note im running any large models in q4 for speed purposes. For the image analysis and procecing im debating between qwen3 8b and glm 4.6v. Also hoping ill be able to use paper bannana’s code pretty soon in order to create a proper diagraming model.
Log in to leave a comment