Nous - Flavortown

albert shipped Nous

5 months ago

Shipped this project!

Hours: 20.45

Cookies: 🍪 287

Multiplier: 14.03 cookies/hr

Shipped!!! This is the second iteration of Nous, now with ViT (it has the ability to interpret images, in theory) and with an MoE implementation (basically more parameters). Sadly, this version doesn’t ship with the fully trained 2.3B model, since it’s gonna take a while to train that one, but you get to play around with the codebase!! Honestly, the most challenging thing was figuring out how to store 122GB worth of training data on a cloud GPU when it offers only 32GB (MAX) of storage. Was able to get over this challenge by putting it on a server which the GPU can access.

albert worked on Nous

5 months ago

2h 35m logged

Fixed some goofy issues with the new MoE implementation. The old model (actually the only one) was made before MoE was added, so doesn’t have the parameter or MoE = False, and default was True. This led to the generate function thinking it had 196M paramters (I WISH) instead of the 77M it actually has. One thing led to another and the next thing you know, it’s printing absolute nonsense.

One more thing I did is run the data gathering pipeline, saving all 122GB of training data on an external drive. The issue is that to train the new model (which would be 2.3B parameters by the way), I can’t just use my M4 MBA, since it would probably blow up (although that would be cool)… So, I’m using cloud GPUs (4x RTX 5090, total 128GB VRAM), but they can only store a total of 32GB worth of files, and the training file alone is 122GB, nevermind the checkpoints. To get around this, I’m gonna use the Flask framework to host the data probably on my old iMac (24GB RAM, so large ahh chunks of data), if not, I’ll use my Raspberry Pi 3B (only 1GB RAM). This will definitely make training slower, but at least it’ll train… right?

Also don’t worry about the model’s responses below, that’s normal for this one, since it probably has never seen anything like what I was prompting it in its training data.

0

1

Log in to leave a comment

Comments

chefpenguino 5 months ago

nice UI, looks good

albert worked on Nous

5 months ago

0h 31m logged

Finished ViT frontend interpretation for training.

Used the Flask API to make backend + frontend routing for ViT multimodal training. This also shows total parameters, including those gained from the ViT.

0

Log in to leave a comment

albert worked on Nous

5 months ago

5h 53m logged

Bug fixes and documentation changes.

Basically I tested the app to make sure everything in the MoE, ViT, and all else works as expected. For now, no errors found. Anything I have found has been fixed and documented.

I have also made documentation refactoring, where I simplified the README to contain only completely essential starter information and added an EXPLANATION document to contain the extra “beef” that once lived in the README.

0

Log in to leave a comment

albert worked on Nous

6 months ago

4h 41m logged

Started (and almost finished) a new part of my model… ViT! Nous can now interpret images. I’ve added a patch embeddings class, which creates the image embeddings, the ViT encoder, which processes those patches, and cross-attention (not just self-attention), which is able to handle and cross-reference text embeddings with image embeddings. These three together will let the model be able to create connections between text and images.

I’ve also been working on some organization improvements (side note, why is Pylint so picky? I don’t wanna make sure my imports are in alphabetical order), especially in documentation and code readability. The only reason for this is because one of my functions has literally >20 lines of arguments, and I wanna make sure it all makes sense.

Now, the only thing left to do is to complete the multimodal trainer (which will be a pain), create the pipeline, the data preparation module, make sure I actually CAN use the ViT with text, in what order, etc. I have a lot of work to do…

There’s still one more week of holidays, let’s grind together!

0

Log in to leave a comment

albert worked on Nous

6 months ago

1h 36m logged

Just finished the GUI implementation for the Mixture of Experts! It’s fully complete and commited to GitHub, and also looks pretty good! See for yourselves below:

1

Log in to leave a comment

Comments

seifotefy75 6 months ago

Perfect Work

albert worked on Nous

6 months ago

4h 30m logged

Finished the MoE implementation for Nous!! I now have to fix some dumahh shape bugs and some weird jit compilation errors, but it’s going well! Next step is going to giving the model some insane parameter numbers (I’m looking for >=1bn), then training it. Not sure what monster GPU I’ll need for that, but I’ll manage! (Oh also don’t worry about my memory pressure, just training a 2bn parameter model…)

0

Log in to leave a comment

albert worked on Nous

6 months ago

0h 37m logged

I’ve started working on using a different training pipeline for the model, as I want to expand it to be able to use image input, have text recognition, and more cool stuff to do with images! I have to use pretraining text data for this, since the current model’s natural language skills are not so good. Might also add a MoE.

0

Log in to leave a comment

0 Followers

Shipped this project!

Comments

Comments