A small MOE based large language model pretraining banner

A small MOE based large language model pretraining

2 devlogs
5h 42m 26s

I am enrolled in a ai research program for high school students where they teach about llm internals how to train them and publish papers related to that. we were taught about a type of llm architecture MOE so i coded a small moe architecture with null experts and trained on a tiny dataset . the model size is 29M.

This project uses AI

i was running in to some gpu bottlenecks i used claude to find it out and to generate the banner

Demo Repository

Loading README...