Activity

albert

9.5 hours.

That’s how long it took me to finally get the full simulation up and running. Now it’s not perfect, and it’s got some gaps, but honestly, I love the way it’s looking.

What I’ve done:

  • Built a state machine that stores molecule templates, instances, object scenes, and the entire environment
  • Built a geometry helper file with all sorts of functions to help the placement of molecules in an object’s bounding box space, so that molecule instances don’t intersect with one another.
  • Made a placement file that manages the placement of each molecule using a frontier approach, where once a new molecule is placed, more stem from that one.
  • Built the full rendering file that uses the placement file to actually render the atoms and bonds together, rotating them in space.
  • Tied them all together to get this.

So you may notice that the bonds aren’t just regular bonds. They’re electron density regions. Remember, I set out to be as accurate as possible, so here it is. Every single one of those dots is the psossible location of an electron in its bond.

There’s no simple single/double bonds, instead, I went all out. Single bonds are SIGMA bonds, and double bonds are one sigma + one pi bond.

If you’re wondering how the efficiency and smoothness of running this many entities in Panda3D is, it’s actually basically perfect. I don’t see any lag or sketchy jittering.

Next up is making it even more accurate!

Attachment
0
albert

My bad for not logging for a while, there js wasn’t much to show for outputs, and tbh, there still isn’t too much.

So, I finally got the details for all molecules working. The PubChem REST API is now pulling in coordinates, bond information, and chemical formulas for everything, and I’ve got 7 unique molecules across 37 annotated objects in the test scene.

After that, I started building out the actual simulation side. First step was getting a single molecule to render in Panda3D with proper CPK coloring and bond visualization. That worked surprisingly well. You can see the second image, that’s the single molecule.

Next, I tried just stacking them one on top of the other, spamming them in XYZ directions, but it didn’t really look realistic, so now I have to go the long route. I needed a way to place thousands of molecule instances without duplicating all the coordinate data, so I built a whole state machine system. There’s now a MoleculeTemplate class that holds the immutable chemical structure, and MoleculeInstance that just tracks position and rotation. One template can have hundreds of instances pointing to it.

Then I wrapped that in an ObjectState class to handle each annotated object from the scene (like “books” or “desk”), and an Environment class to hold all of them together.

The idea is you load the bounding boxes from the annotations, and for each object, you populate its volume with molecule instances based on the material composition.

I also spent way too long refactoring type annotations into a central file and making a constants module for element colors and radii. The CPK color mappings were duplicated in three places and it was lowkey pmo.

Next up:

Need to make the molecule placement actually realistic. Need to add randomization, collision checking between neighboring instances, and probably some basic force field approximations to make molecules settle into reasonable configurations.

Attachment
Attachment
0
albert

Finally got the details for all molecules

I ran into some issues when trying to get the formal names for each molecule, since Qwen was constantly giving me the conventional names (like cellulose instead of cellobiose), so I had to debug with changing the prompt, the model, and even the whole infrastructure, until I finally settled on using a cloud model, Deepseek V3.2, with chat mode on.

After that, I managed to get the SMILES strings and use rdkit to get all the 3D details for each molecule. There are some inaccuracies, so I’m gonna switch to a PubChem REST API (AGAIN), since they have better (AND MORE) data.

*PubChem boutta become my best friend sob-hole *

Attachment
0
albert

Made the aggregation script

This time I added aggregation capabilities so that once you have annotated your scene with the objects and their materials, it runs a local Ollama model (in my case, I chose qwen2.5:7b) and returns the composition of each object, putting them into a JSON file.

Attachment
0
albert

Made the manual annotation work

It’s been a while since I’ve worked on this, but it’s nice to be back! I’ve managed to make the manual annotation pipeline, since the AI one was horrible.

Now, you make a 2D bounding box first, then you extrude it to give it height. You can then save it as an object, and by the end, they all get stored into a JSON file.

It works fairly similarly to the AI one, except you have to do these ones yourself.

The UI definitely needs some work, but this is already a long time to go without a devlog.

Attachment
Attachment
0
albert

Shipped this project!

Hours: 15.42
Cookies: 🍪 234
Multiplier: 15.17 cookies/hr

SHIPPED!!!

I built a desktop code editor that is more lightweight than VS Code, supports most, if not all of the icon/theme packs (with a few code changes to point to the right filepaths), and has some syntax highlighting.

This project right not is in its most basic state, so don’t expect any fancy LSP, AI integration, Git, or nothing like that. It’s essentially a text editor that looks good, runs with <100MB of RAM, and has complete access to your filesystem.

I made this in Rust, which I have learned to NEVER use again. Although it was an experience, for sure, to learn this language, I do not recommend it as a passion language, since it is just so WIERD.

Anyways, had some challenges in making the releases, but the MacOS release uses Homebrew, and the Windows one has a simple .exe file.

Happy Coding!

albert

Ok, so there are a few things that I tried doing. Even after making the switch to the grounding DINO model, which perfectly predicted the bboxes of each object, the 3D projections still look like ahhh.

I know the issue is somewhere in the raycasting code, but I can’t lie ts frying me already. I tried making a few changes, but literally nothing worked. So now, I’m gonna switch to manual annotation. It’ll be more tedious than before, but at least it’ll work (hopefully).

Changelog

Attachment
0
albert

After looking at the projections of the Gemma-made bboxes in the environment, I was not having a good time. So, I decided to switch to a second model for bboxes while keeping object detection on Gemma.

I used IDEA-Research’s Grounding DINO model, which, given an object name, estimates a bounding box around it. I took the object names from Gemma’s output and put them into the classifier, creating this beautiful JSON you see below. Haven’t yet run the raycasting algorithm on them yet.

Changelog

Attachment
0
albert

Once the bounding boxes were gotten from Gemma, I only had to make the raycasting pipeline to project the 2D bounding box onto a 3D plane. The first thing I did was calculate camera poses using COLMAP, then I used those camera poses to project rays onto each image, recording all data in dictionaries.

I actually managed to learn a whole lot about looping through dictionaries and their intrinsic qualities, which was really fun. The only drawback is that the time complexity is not favorable: O(n^2), where n is the number of frames/images.

The only bug to fix is in the COLMAP database, where the initial pair of images is not being calculated. This is probably because there is too small of a difference between the camera poses in each frame. I’m not sure yet if this is true or how to even fix this.


Changelog

Attachment
0
albert

Switched fully to Gemma for the frame predictions cause its lowkey WAY more accurate with its text output. Also gave it context in the prompt from past frames so that it predicts things across frames more consistently.

Below is an image of the JSON file it outputs for all 450 images.

Changelog

Attachment
0
albert

I have changed the prompt to estimate bounding boxes for each object in the 2D image. Now, I have to project that into the 3D environment, and make these bounding boxes actually three-dimensional. To do this I will use raycasting and camera pose to estimate distance to objects and therefore their positions in 3D space.

I’m also thinking of switching to Gemma instead of Nemotron, since Gemma is not just a VLM, but is mutlimodal, so will often have better text/JSON outputs.

Changelog

Attachment
0
albert

On the movement and environment side of things I added the fabled zoom mechanic (it was really easy) and some speed control.

The most important new thing is that I am now using NVIDIA’s Nemotron with OpenRouter to analyze images and output the object it most likely is and the materials it’s made of.

I initially used Qwen, but it js wasn’t precicse enough for my use case.

Next up, I’ll have to find a way to create bounding boxes around specific objects in the image so that I know where to start and stop each material. Afterwards, it’s the actually fun part of making the lookup table so that each material can be linked to a specific molecule.

Changelog

Attachment
0
albert

Good news! I was able to load the OBJ into Panda3D and create some basic movement capabilities including WASD and mouse movement. The only things that I have to do before I get to the physics are adding zoom and maybe re-recording the video (since it’s kinda low quality, which messes up how the 3D scan looks)

Changelog

Attachment
0
albert

So I made the pipeline that turns videos into .obj files through the intermediary .usdz (Apple’s proprietary 3D file format).
I had to first take the video from many different angles, then turn it into a folder full of images that were taken every two frames, and then, had to use some SwiftUI code to use Apple’s RealityConverter.
Then, I decided to use aspose 3d to convert to OBJ, but ran into some coloring issues, since the USDZ is a zip archive, so I had to extract the texture images from it and insert them into the OBJ.

The result is actually not too far from what it really is!

Attachment
0
albert

Cmd+T is here

So basically, now, instead of having to manually select a file to open from the file explorer, you can just press cmd+t and it opens the file you want in a new tab. I feel like it looks kinda nice, but it still needs some work.

I think next ima work on some animations so that everything feels nice and smooth.

Attachment
0
albert

File searching mechanic!

Now, I added a way to look through all the files in the workspace for a certain word or phrase. It uses fuzzy finding to make it actually smart.

I did run into some goofy issues on the way there though. One of them was that whenever I would search for ANYTHING it would look through literally the WHOLE workspace, instead of just the files that I cared about. Turns out, all I had to do was add a .gitignore flag and make it “true”.

Attachment
0
albert

Design improvements and MARKDOWN PREVIEW!!!!

I managed to make the app have all of its corners rounded (ts had me crashing out for like a whole hour), but more importantly, I added markdown preview support, which honestly has some bugs, but I have NO CLUE how to fix them.

Changelog

Attachment
0
albert

Syntax Highlighting

Added custom syntax highlighting to the app. It works in some file types, but in others its kinda ahh. You can see in the example below how rust actually looks good, but python looks dull and grey…

In python files, for some reason, it only highlights the first couple of lines, then js gives up. I’ll have to investigate further, but for now, the editor’s coming along pretty well!

One thing I will have to be sure of, though, is that it doesn’t just become another VS Code clone.

Changelog

Attachment
Attachment
1

Comments

kashsuks
kashsuks 2 months ago

ts tuff

albert

SIDEBAR

I made the sidebar resizeable, even going so far as to make the cursor have that horizontal resize look. Now, you can see your files however large you’d like!

Changelog

Attachment
0
albert

The editor is functional!

YAY!!

So what I did to make this editor actually usable is first of all, made it look nice, with tab organization, good looking file and folder icons, and a relatively simple UI (I find).

My goal of not being cluttered has officially been achieved

VS Code has just too many options sometimes… all those little icons that you can find in the sidebar can seem overwhelming, so I’ve solved that through this minimalist approach. It might not be for everyone, but personally, I find it nice!

Changelog

Attachment
Attachment
0
albert

Looks even better! I added file icon support for JSON files following a certain template. You can see this template in the code, but it’s essentially the same one as in VS Code, since I legit just took a VS Code icon pack and put it into this editor… oops…

Credit to jonathanharty.gruvbox-material-icon pack!

Changelog

Attachment
0
albert

The editor actually looks great and works now! I’ve added a tab system so that the user can have multiple files open at once, and simply switch through tabs. Sadly, I haven’t yet gotten to the keyboard shortcuts such as cmd+w or cmd+t. I’ve also added some more apple-esque styling with rounded corners and the editor sort of “hovering”. The font is also much nicer, I must say, since I’m using IBM Plex Sans. That really pulls it together I find. Any opinions on what’s missing or what could look better?

Changelog

DOCS: Make a small edit to README
FEAT: Start adding icons to files and folders for it to look better
FEAT: Add tab functionality to editor to be able to open and close multiple tabs. Still needs some styling work, but the basic version is functional
STYLE: Add IBM Plex Sans to fonts and use that font throughout editor.
STYLE: Make the editor look better with a status bar, rounded corners, and overall bette r visuals

Attachment
0
albert

Made the actual filetree! It is complete, being able to open directories, expand and collapse them, as well as display files in the editor.

P.S. It’s also collapsible with Cmd+R!

Attachment
0
albert

Ok, so now, I’ve actually made the “backend” for the filesystem. It doesn’t look the way it should since I haven’t yet integrated it into the actual visuals of the app, it just exists as an unused module for the moment. It is able to detect if an entity is a file or a directory, and act accordingly depending on what it is. It holds data about what’s expanded, it sorts the entities alphabetically, and it even ignores some directories!

Changelog

Attachment
0
albert

Made a starting base for the code editor, with a simple ahh text editor. The upside: I had to manually code all of the cmd+backspace, alt+backspace, and other stuff like that cause they weren’t working… I also added line and column tracking, so that’s pretty cool

Attachment
0
albert

Soooooo I switched from a cocoa infrastructure to iced… I did this because I wanted to have this application be cross-platform, and cocoa is MacOS exclusive, since it works through an Objective-C bridge. However, I was able to get the window back up and running again!!

Attachment
0
albert

Started learning Rust x Cocoa for the first time! Right now, it’s at a very early stage in development, with it being only a text-editor (which also barely works :(), but this is just the base of the editor. I have great plans for this one!

Attachment
1

Comments

kashsuks
kashsuks 2 months ago

its so peak

albert

Shipped this project!

Hours: 20.45
Cookies: 🍪 287
Multiplier: 14.03 cookies/hr

Shipped!!! This is the second iteration of Nous, now with ViT (it has the ability to interpret images, in theory) and with an MoE implementation (basically more parameters). Sadly, this version doesn’t ship with the fully trained 2.3B model, since it’s gonna take a while to train that one, but you get to play around with the codebase!! Honestly, the most challenging thing was figuring out how to store 122GB worth of training data on a cloud GPU when it offers only 32GB (MAX) of storage. Was able to get over this challenge by putting it on a server which the GPU can access.

albert

Fixed some goofy issues with the new MoE implementation. The old model (actually the only one) was made before MoE was added, so doesn’t have the parameter or MoE = False, and default was True. This led to the generate function thinking it had 196M paramters (I WISH) instead of the 77M it actually has. One thing led to another and the next thing you know, it’s printing absolute nonsense.

One more thing I did is run the data gathering pipeline, saving all 122GB of training data on an external drive. The issue is that to train the new model (which would be 2.3B parameters by the way), I can’t just use my M4 MBA, since it would probably blow up (although that would be cool)… So, I’m using cloud GPUs (4x RTX 5090, total 128GB VRAM), but they can only store a total of 32GB worth of files, and the training file alone is 122GB, nevermind the checkpoints. To get around this, I’m gonna use the Flask framework to host the data probably on my old iMac (24GB RAM, so large ahh chunks of data), if not, I’ll use my Raspberry Pi 3B (only 1GB RAM). This will definitely make training slower, but at least it’ll train… right?

Also don’t worry about the model’s responses below, that’s normal for this one, since it probably has never seen anything like what I was prompting it in its training data.

Attachment
1

Comments

chefpenguino
chefpenguino 2 months ago

nice UI, looks good

albert

Finished ViT frontend interpretation for training.

Used the Flask API to make backend + frontend routing for ViT multimodal training. This also shows total parameters, including those gained from the ViT.

Attachment
0
albert

Bug fixes and documentation changes.

Basically I tested the app to make sure everything in the MoE, ViT, and all else works as expected. For now, no errors found. Anything I have found has been fixed and documented.

I have also made documentation refactoring, where I simplified the README to contain only completely essential starter information and added an EXPLANATION document to contain the extra “beef” that once lived in the README.

Attachment
0
albert

Started (and almost finished) a new part of my model… ViT! Nous can now interpret images. I’ve added a patch embeddings class, which creates the image embeddings, the ViT encoder, which processes those patches, and cross-attention (not just self-attention), which is able to handle and cross-reference text embeddings with image embeddings. These three together will let the model be able to create connections between text and images.

I’ve also been working on some organization improvements (side note, why is Pylint so picky? I don’t wanna make sure my imports are in alphabetical order), especially in documentation and code readability. The only reason for this is because one of my functions has literally >20 lines of arguments, and I wanna make sure it all makes sense.

Now, the only thing left to do is to complete the multimodal trainer (which will be a pain), create the pipeline, the data preparation module, make sure I actually CAN use the ViT with text, in what order, etc. I have a lot of work to do…

There’s still one more week of holidays, let’s grind together!

Attachment
0
albert

Just finished the GUI implementation for the Mixture of Experts! It’s fully complete and commited to GitHub, and also looks pretty good! See for yourselves below:

Attachment
1

Comments

seifotefy75
seifotefy75 3 months ago

Perfect Work

albert

Finished the MoE implementation for Nous!! I now have to fix some dumahh shape bugs and some weird jit compilation errors, but it’s going well! Next step is going to giving the model some insane parameter numbers (I’m looking for >=1bn), then training it. Not sure what monster GPU I’ll need for that, but I’ll manage! (Oh also don’t worry about my memory pressure, just training a 2bn parameter model…)

Attachment
0
albert

I’ve started working on using a different training pipeline for the model, as I want to expand it to be able to use image input, have text recognition, and more cool stuff to do with images! I have to use pretraining text data for this, since the current model’s natural language skills are not so good. Might also add a MoE.

Attachment
0