2 devlogs
5h 42m 26s
I am enrolled in a ai research program for high school students where they teach about llm internals how to train them and publish papers related to that. we were taught about a type of llm architecture MOE so i coded a small moe architecture with null experts and trained on a tiny dataset . the model size is 29M.
This project uses AI
i was running in to some gpu bottlenecks i used claude to find it out and to generate the banner