Nous banner

Nous

7 devlogs
20h 26m 53s

A fully configurable, from-scratch GPT-style experimental model and platform with a GUI for training and inference.

Demo Repository

Loading README...

albert

Fixed some goofy issues with the new MoE implementation. The old model (actually the only one) was made before MoE was added, so doesn’t have the parameter or MoE = False, and default was True. This led to the generate function thinking it had 196M paramters (I WISH) instead of the 77M it actually has. One thing led to another and the next thing you know, it’s printing absolute nonsense.

One more thing I did is run the data gathering pipeline, saving all 122GB of training data on an external drive. The issue is that to train the new model (which would be 2.3B parameters by the way), I can’t just use my M4 MBA, since it would probably blow up (although that would be cool)… So, I’m using cloud GPUs (4x RTX 5090, total 128GB VRAM), but they can only store a total of 32GB worth of files, and the training file alone is 122GB, nevermind the checkpoints. To get around this, I’m gonna use the Flask framework to host the data probably on my old iMac (24GB RAM, so large ahh chunks of data), if not, I’ll use my Raspberry Pi 3B (only 1GB RAM). This will definitely make training slower, but at least it’ll train… right?

Also don’t worry about the model’s responses below, that’s normal for this one, since it probably has never seen anything like what I was prompting it in its training data.

Attachment
1

Comments

chefpenguino
chefpenguino about 2 months ago

nice UI, looks good

albert

Finished ViT frontend interpretation for training.

Used the Flask API to make backend + frontend routing for ViT multimodal training. This also shows total parameters, including those gained from the ViT.

Attachment
0
albert

Bug fixes and documentation changes.

Basically I tested the app to make sure everything in the MoE, ViT, and all else works as expected. For now, no errors found. Anything I have found has been fixed and documented.

I have also made documentation refactoring, where I simplified the README to contain only completely essential starter information and added an EXPLANATION document to contain the extra “beef” that once lived in the README.

Attachment
0
albert

Started (and almost finished) a new part of my model… ViT! Nous can now interpret images. I’ve added a patch embeddings class, which creates the image embeddings, the ViT encoder, which processes those patches, and cross-attention (not just self-attention), which is able to handle and cross-reference text embeddings with image embeddings. These three together will let the model be able to create connections between text and images.

I’ve also been working on some organization improvements (side note, why is Pylint so picky? I don’t wanna make sure my imports are in alphabetical order), especially in documentation and code readability. The only reason for this is because one of my functions has literally >20 lines of arguments, and I wanna make sure it all makes sense.

Now, the only thing left to do is to complete the multimodal trainer (which will be a pain), create the pipeline, the data preparation module, make sure I actually CAN use the ViT with text, in what order, etc. I have a lot of work to do…

There’s still one more week of holidays, let’s grind together!

Attachment
0
albert

Just finished the GUI implementation for the Mixture of Experts! It’s fully complete and commited to GitHub, and also looks pretty good! See for yourselves below:

Attachment
1

Comments

seifotefy75
seifotefy75 3 months ago

Perfect Work

albert

Finished the MoE implementation for Nous!! I now have to fix some dumahh shape bugs and some weird jit compilation errors, but it’s going well! Next step is going to giving the model some insane parameter numbers (I’m looking for >=1bn), then training it. Not sure what monster GPU I’ll need for that, but I’ll manage! (Oh also don’t worry about my memory pressure, just training a 2bn parameter model…)

Attachment
0
albert

I’ve started working on using a different training pipeline for the model, as I want to expand it to be able to use image input, have text recognition, and more cool stuff to do with images! I have to use pretraining text data for this, since the current model’s natural language skills are not so good. Might also add a MoE.

Attachment
0