Activity

0681691

Shipped this project!

Hours: 60.16
Cookies: 🍪 1717
Multiplier: 23.79 cookies/hr

A complete repository containing my development4 of miniGPT, nanoGPT and nano-MGPT coupled with usable examples and my trained models (trained on corbt/all-recipes) in the releases! Tested my knowledge on Transformer-architecture ML models, good refresher for my brain. Recipe type is good for basic training because of the predictable structure and is easy to evaluate with a larger LLM for hallucinations, so I suggest y’all who want to learn to start with corbt/all-recipes or roneneldan/TinyStories (a more generic one). I am satisfied with what turned out, my only regret being my local computer not being able to train a nano-MGPT so that is in my to do list to optimise. Next time I will begin training more complex LLMs, i.e. dedicated RPG-Type LLMs and to efficientise my structure, potentially turning away from generic GPT-type to a more compact or efficient Transformer-based architecture. <3 from 0681691

0681691

Final devlog before ship!
Not much to say, just tidied stuff up. But gosh, it took hella long.

Attachment
0
0681691

Did a bit of documentation. Removed some sus lines and redundancies. Overall, minor updates. I have yet to upload binaries. Just need to publish nanogpt training .ipynb and those binaries, think I’ll be done.

Also, I attached two images showing the specific flowchart of the mini/nano/nano-M GPT logic; first is miniGPT, nanoGPT, second is nano-MGPT. Should be self explanatory if you’ve read my other devlogs, but I added it (as a sneak peek from one of my newer draft papers on prompt-injection based memory retention techniques in training GPTs) so it makes more sense. <3

Attachment
Attachment
0
0681691

Added a logo. Just pixelated the cook emoji but I like it so that’s that. I also uploaded everything to github. Memory model is also finished! The key logic is like this.

You have a database in json or SQLITE format, that you can add and store to, and it is formatted as an rpg-like “inventory” system. I.e. it stores something like {‘apple’: 2}. Then the context window: key-value pairs for general facts. And there’s the history, which is a log of events/actions.

Before the model generates a single token, the MemoryAwareGenerator performs a memory injection. It converts the current MemoryState into a string: i.e. “Inventory: apple (2). Context: location is kitchen. —”), prefixing into the user’s prompt. The model receives the augmented prompt as if it were part of the conversation history.

The most unique part I call it the GenParUp loop (different from GPU) wherein the model is prompted to output special text commands like set_context(key, val) or add_item(name, qty). It is parsed by CommandParser using regular expressions to find these commands. Then, it removes them from the output so the user doesn’t see the code. Because the ‘memory’ has changed, the generator often restarts the generation process with the new, updated memory injected into the prompt to ensure consistency.

Sorry, I don’t have a MWE trained yet since it’s too big for my computer. I’ll see if I can get a MWE working on colab, but I guess this is enough for my ship!

My final task would be to zip all the models up so I can release it as binaries.

Attachment
0
0681691

Made a new architecture type. I found out that with small locally trained models (on a consumer laptop, i.e. a 2021 Macbook), models that are trained within a reasonable time frame, (i.e. within 14 hours) often hallucinate a lot. Notably for this project, one has a basic structure:

  • First, the ingredients
  • Building recursively on the ingredients, step 1
  • Building recursively on EVERYTHING before, step 2

And so on. But this would mean a complete complex restructuring of my architecture, which is too difficult. I tried training the same stuff on the original distilgpt2 model on colab, and the effects were the same, so my training dataset was either too small or not ideal enough, or my training was too short. Anyhow all of these are inconvenient, so I added a structure on top of the existing nanoGPT framework: a Text-based memory system. I will provide more details after a successful training implementation and upon I have trained a successful model. The discord bot works tho lol.

Attachment
0
0681691

Began work on an encoder from scratch. I just wanted to get rid of the GPT-2 encoding and use my own but so far though it seems easy it is pretty hard. The encoding block is not just one component, so I realised, but a flow chart in this sort:
tokens -> embeddings -> [attention + MLP] × N -> hidden states
But so far I got embeddings done (it was easy, use a simple split into chars and assn based on unicode). The attention, MLP and N matrix calculations though are pretty challenging to formalise. Especially special chars such as space, though I have it in my vocab it cannot be identified right now.

Attachment
Attachment
0
0681691

Finally, the website is complete. Not much to add on top of the previous devlog, but I was able to properly format the output by calling .replace(/\n/g, "<br>") on the recipe since divs don’t properly format \ns. I also added the /api_query page for documentation, and made a small demo video. Hunted round for some bugs, so here it is. I will upload it to github in the examples/website/ folder.

0
0681691

Did some major work on the online website. This part I did the quickest out of all other parts since at least I can claim that I am quite acquainted with basic web dev, unlike LLM dev (that’s why I took so long in actually producing the model lol!)

Used quite a classic but simple setup. A bunch of divs vertically with their own classes. Of course, the basic “sections” (all of them have unique classnames, though) look the same with same padding, margin, etc. and I centered on this colour scheme: #f9d5d3, #493535, #f4e5dd for a nice mild look that’s more interesting than black grey and white. I also made a loading css animation so the users don’t panic when their request takes a bit. Now I plan to work on formatting the recipe, documenting the API and committing everything to github.

Attachment
Attachment
Attachment
0
0681691

Made it public on huggingface!
After a lot of work, I publicised the first 124M model on Huggingface spaces. You should definitely have a try, I made it my demo!
Also I resolved a loading bug with miniGPT. It occurred that the Huggingface gradio Space loading was a bit different from transformers, so I had to write a bit of “carry-on” code to make the configuration in miniGPT understand the huggingface parameters so the model will actually load. For example, in my OG architecture using GPT-2 encoding, my saved models were very simple in config.json but the update needed a lot more information on the parameters, introducing some random stuff like “_num_labels” and whatnot, but got it all resolved.

I am planning to make it into an API to actually make it a usable chat interface and publish it using vercel or something of the sort.

Attachment
0
0681691

Major Update:
NanoGPT Architecture prototype complete, known as miniGPT-0.2r2 featuring the new sdsGPT architecture (super duper small GPT), a production-quality GPT-style transformer language model with several optimisations in less than 700 lines. Wherein:

Core Architecture:
Decoder-only transformer that includes the regular:
Embeddings of text input (from the training dataset) to a d-dimensional vector field wherein d is an adjustable hyperparameter depending on how intricately you want your semantic quality of the training data to be, with optional rotary embeddings (RoPE) for better positional information.
Transformer Blocks: Stack of identical layers (default 12), each containing
Multi-head self-attention with causal masking (can’t attend to future tokens)
Feed-forward MLP (4x expansion in hidden dimension)
Layer normalisation and residual connections
Language Head: Linear projection from embedding dim to vocab size

Optimisations for efficiency are also present to live up to our name of NanoGPT as an computing-friendly alternative to bulkier GPT models. Some key optimisations include:

Gradient Checkpointing to reduce memory usage by recomputation rather than storage.
Torch Compile JIT compiling the model for faster execution.
Scaled Residual Projections to initialise output projections inversely to layer count to stabilise deep networks.
Flash Attention by using optimised scaled_dot_product_attention rather than the old flash attention for faster/more memory-efficient attention.
A prototypical repetition penalty multiplier to prevent hallucinations using superpositioned entropy gaussians.

The model itself can be trained based on mixed precision training that accelerates it more for CUDA devices. Training is also checkpointed to allow saving during longer training sessions.

Using NanoGPT I trained a 54 M model and a 128 M model on corbt/all-recipes.

Attachment
0
0681691

Bug hunting for ship for recertification. Final mp4 uploaded here, showcasing the learning functions.

0
0681691

revised update to 0.2.2. Implemented a menu scroll bar and migrated some functions from main_window to logic/utils to reduce code concentration in main_window alone; unified thematic styles in lists. Introduced recording browser and Learning; concept for educ mode and resource hub underway.

0
0681691

Trained two models: distilGPT-2 (81.9 M) and a 124 M parameter miniGPT model (1-model124M). Began rough work on documenting them, and I also improved the miniGPT architecture from miniGPT-0.1 to miniGPT-0.2, which I am going to train the next ones on.

MiniGPT-0.2 is a major production variant of miniGPT-0.1 featuring stabilised huggingface usage and execution and whatnot, with minor improvements in architecture: top-k sampling, anti repetition bias and weight/model dual loading features.

I plan to make a UI to upload the better ones on.

Attachment
Attachment
Attachment
0
0681691

Trained a couple different different models using diff batch sizes, comparing their coherency in output.

Attachment
0
0681691

I tried using a locally built GPT (using simple Gen Transformer model), turned out pretty smooth. Pretty happy with the result. Made a ~50 M Model and trained it on my Mac in 15 minutes, so I will probably expand training to 2 epochs and extend the training dataset from 20000 to ~30000 recipes, while adding more embeddings and layers (640, 12) to extend it to a ~100 M model with a training time of up to 1 hour. Turns out a M1 can do alright

Attachment
0
0681691

As requested, compiled a binary and uploaded to github. Now that should address the accessibility criteria!
I also bug fixed a few things with relation to the compiling of binaries. It is all good when I verified, insofar as the functionality is concerned; user data may have to go through an update. Minor stylistic update: uploaded a little logo.
This (v0.2.1) is intended as a supplementary update to the major v0.2.0.

Attachment
0
0681691

Shipped this project!

Hours: 10.94
Cookies: 🍪 105
Multiplier: 9.64 cookies/hr

Hi all!

I am part of the PART (Project for Accessible Radio Telescopes) initiative, and this is our data-recording and processing interface! This application is a part of our mission to produce a completely working high end radio telescope design and open-source it on the web so that anybody, particularly rural students, can make it and use it (using this application!). We are going to make a few of these and distribute them to a few rural schools. Hobbyists, space enthusiasts, or in general, anybody, are also welcome to use our design.

So far it only works for RTL-SDR based telescopes (as per our design) but I hope that in the near future I will be able to provide a longer list of usable SDRs.

To support our outlook, please consider:

  1. Starring and watching this project on Github! (or contributing, that’s even better!)
  2. Following us on https://www.instagram.com/p.a.r.t._/ so you don’t miss out on our updates!

Again, thank you all and I hope that my dish is competent enough for your tastes!

0681691

Final practicality update:
This is now the first experimentally verified working version!
Updates from last time:

  1. Introduced RTL-SDR data collection logic
  2. Introduced a signal visualisation window
  3. Finalised the spreadsheet functionality
  4. Verified that RTL-SDR connection + data collection and logic works
  5. Introduced a Lesson Wizard for teaching
  6. Introduced a settings function
  7. Now you can also generate your own random sources to use if you don’t have a dongle.
  8. Among other things

This is still technically a pre-release, but I will ship it since all further updates will just be slight improvements/debugging.

0
0681691

Ran it locally with the Colab-trained model, I can publish it at least

Attachment
0
0681691

Created the model in Colabfold, complementing my existing efforts. Access to a T4 was nice… until it ran out. Now I have a 300 MB pth file that is probably corrupted (I probably truncated the file somewhere because my internet is slow). Next time I’ll probably train on a local M1 Pro. Anyhow I probably can deploy a working model tomorrow.

Attachment
0
0681691

As stated. Redid DishFlow’s structure based on a simple Transformer. Sorry it’s still primitive so I need to refine it better.

Attachment
0
0681691

Currently, implemented another system, we’ll see

Attachment
0
0681691

Basic setup

Attachment
0
0681691

Did a bit of UI

Attachment
0
0681691

Written a test application to complement it, wanna implement SDRpy in the next commit

Attachment
0
0681691

I fixed it with a bit of external help (thanks copilot agent… it turns out useful when i need it) and I realised you could bind labels as buttons, so I did that and now it looks fine.

Attachment
0
0681691

Initial GUI setup with the custom menu, i need to be able to change the tkinter fricking buttons

Attachment
0
0681691

I’m working on my first project! This is so exciting. I can’t wait to share more updates as I build.

Attachment
0