1GB, 22M param model trained (from scratch) on custom data! Has limited RAG capabilities, helping improve memory (theoretically).
GitHub Copilot helped with code completion and a lot of explaining and debugging.
1GB, 22M param model trained (from scratch) on custom data! Has limited RAG capabilities, helping improve memory (theoretically).
GitHub Copilot helped with code completion and a lot of explaining and debugging.
My first time making and training an LLM. Definitely fun to see it improve day by day, but I ran into performance issues due to its small size. I also wanted to give it other capabilities, but TensorFlow is somewhat limited in terms of what I can do with it. I learned a lot from this project.
I want to redo this project, but with a custom NN library and a better device when I have more time.
Probably gonna end the project here because of size restrictions. I might write a custom NN library from scratch and run it on a beefier device next time!
Log in to leave a comment
Changes/Updates:
Notes:
Log in to leave a comment
Created the basic transformer architecture, added some basic Discord commands allowing me to interact and monitor with the bot. Currently training the bot on a Pi locally. (It is not very good at writing yet)
Log in to leave a comment