Activity

albert

Shipped this project!

Hours: 15.42
Cookies: 🍪 234
Multiplier: 15.17 cookies/hr

SHIPPED!!!

I built a desktop code editor that is more lightweight than VS Code, supports most, if not all of the icon/theme packs (with a few code changes to point to the right filepaths), and has some syntax highlighting.

This project right not is in its most basic state, so don’t expect any fancy LSP, AI integration, Git, or nothing like that. It’s essentially a text editor that looks good, runs with <100MB of RAM, and has complete access to your filesystem.

I made this in Rust, which I have learned to NEVER use again. Although it was an experience, for sure, to learn this language, I do not recommend it as a passion language, since it is just so WIERD.

Anyways, had some challenges in making the releases, but the MacOS release uses Homebrew, and the Windows one has a simple .exe file.

Happy Coding!

albert

Ok, so there are a few things that I tried doing. Even after making the switch to the grounding DINO model, which perfectly predicted the bboxes of each object, the 3D projections still look like ahhh.

I know the issue is somewhere in the raycasting code, but I can’t lie ts frying me already. I tried making a few changes, but literally nothing worked. So now, I’m gonna switch to manual annotation. It’ll be more tedious than before, but at least it’ll work (hopefully).

Changelog

Attachment
0
albert

After looking at the projections of the Gemma-made bboxes in the environment, I was not having a good time. So, I decided to switch to a second model for bboxes while keeping object detection on Gemma.

I used IDEA-Research’s Grounding DINO model, which, given an object name, estimates a bounding box around it. I took the object names from Gemma’s output and put them into the classifier, creating this beautiful JSON you see below. Haven’t yet run the raycasting algorithm on them yet.

Changelog

Attachment
0
albert

Once the bounding boxes were gotten from Gemma, I only had to make the raycasting pipeline to project the 2D bounding box onto a 3D plane. The first thing I did was calculate camera poses using COLMAP, then I used those camera poses to project rays onto each image, recording all data in dictionaries.

I actually managed to learn a whole lot about looping through dictionaries and their intrinsic qualities, which was really fun. The only drawback is that the time complexity is not favorable: O(n^2), where n is the number of frames/images.

The only bug to fix is in the COLMAP database, where the initial pair of images is not being calculated. This is probably because there is too small of a difference between the camera poses in each frame. I’m not sure yet if this is true or how to even fix this.


Changelog

Attachment
0
albert

Switched fully to Gemma for the frame predictions cause its lowkey WAY more accurate with its text output. Also gave it context in the prompt from past frames so that it predicts things across frames more consistently.

Below is an image of the JSON file it outputs for all 450 images.

Changelog

Attachment
0
albert

I have changed the prompt to estimate bounding boxes for each object in the 2D image. Now, I have to project that into the 3D environment, and make these bounding boxes actually three-dimensional. To do this I will use raycasting and camera pose to estimate distance to objects and therefore their positions in 3D space.

I’m also thinking of switching to Gemma instead of Nemotron, since Gemma is not just a VLM, but is mutlimodal, so will often have better text/JSON outputs.

Changelog

Attachment
0
albert

On the movement and environment side of things I added the fabled zoom mechanic (it was really easy) and some speed control.

The most important new thing is that I am now using NVIDIA’s Nemotron with OpenRouter to analyze images and output the object it most likely is and the materials it’s made of.

I initially used Qwen, but it js wasn’t precicse enough for my use case.

Next up, I’ll have to find a way to create bounding boxes around specific objects in the image so that I know where to start and stop each material. Afterwards, it’s the actually fun part of making the lookup table so that each material can be linked to a specific molecule.

Changelog

Attachment
0
albert

Good news! I was able to load the OBJ into Panda3D and create some basic movement capabilities including WASD and mouse movement. The only things that I have to do before I get to the physics are adding zoom and maybe re-recording the video (since it’s kinda low quality, which messes up how the 3D scan looks)

Changelog

Attachment
0
albert

So I made the pipeline that turns videos into .obj files through the intermediary .usdz (Apple’s proprietary 3D file format).
I had to first take the video from many different angles, then turn it into a folder full of images that were taken every two frames, and then, had to use some SwiftUI code to use Apple’s RealityConverter.
Then, I decided to use aspose 3d to convert to OBJ, but ran into some coloring issues, since the USDZ is a zip archive, so I had to extract the texture images from it and insert them into the OBJ.

The result is actually not too far from what it really is!

Attachment
0
albert

Cmd+T is here

So basically, now, instead of having to manually select a file to open from the file explorer, you can just press cmd+t and it opens the file you want in a new tab. I feel like it looks kinda nice, but it still needs some work.

I think next ima work on some animations so that everything feels nice and smooth.

Attachment
0
albert

File searching mechanic!

Now, I added a way to look through all the files in the workspace for a certain word or phrase. It uses fuzzy finding to make it actually smart.

I did run into some goofy issues on the way there though. One of them was that whenever I would search for ANYTHING it would look through literally the WHOLE workspace, instead of just the files that I cared about. Turns out, all I had to do was add a .gitignore flag and make it “true”.

Attachment
0
albert

Design improvements and MARKDOWN PREVIEW!!!!

I managed to make the app have all of its corners rounded (ts had me crashing out for like a whole hour), but more importantly, I added markdown preview support, which honestly has some bugs, but I have NO CLUE how to fix them.

Changelog

Attachment
0
albert

Syntax Highlighting

Added custom syntax highlighting to the app. It works in some file types, but in others its kinda ahh. You can see in the example below how rust actually looks good, but python looks dull and grey…

In python files, for some reason, it only highlights the first couple of lines, then js gives up. I’ll have to investigate further, but for now, the editor’s coming along pretty well!

One thing I will have to be sure of, though, is that it doesn’t just become another VS Code clone.

Changelog

Attachment
Attachment
1

Comments

kashsuks
kashsuks about 1 month ago

ts tuff

albert

SIDEBAR

I made the sidebar resizeable, even going so far as to make the cursor have that horizontal resize look. Now, you can see your files however large you’d like!

Changelog

Attachment
0
albert

The editor is functional!

YAY!!

So what I did to make this editor actually usable is first of all, made it look nice, with tab organization, good looking file and folder icons, and a relatively simple UI (I find).

My goal of not being cluttered has officially been achieved

VS Code has just too many options sometimes… all those little icons that you can find in the sidebar can seem overwhelming, so I’ve solved that through this minimalist approach. It might not be for everyone, but personally, I find it nice!

Changelog

Attachment
Attachment
0
albert

Looks even better! I added file icon support for JSON files following a certain template. You can see this template in the code, but it’s essentially the same one as in VS Code, since I legit just took a VS Code icon pack and put it into this editor… oops…

Credit to jonathanharty.gruvbox-material-icon pack!

Changelog

Attachment
0
albert

The editor actually looks great and works now! I’ve added a tab system so that the user can have multiple files open at once, and simply switch through tabs. Sadly, I haven’t yet gotten to the keyboard shortcuts such as cmd+w or cmd+t. I’ve also added some more apple-esque styling with rounded corners and the editor sort of “hovering”. The font is also much nicer, I must say, since I’m using IBM Plex Sans. That really pulls it together I find. Any opinions on what’s missing or what could look better?

Changelog

DOCS: Make a small edit to README
FEAT: Start adding icons to files and folders for it to look better
FEAT: Add tab functionality to editor to be able to open and close multiple tabs. Still needs some styling work, but the basic version is functional
STYLE: Add IBM Plex Sans to fonts and use that font throughout editor.
STYLE: Make the editor look better with a status bar, rounded corners, and overall bette r visuals

Attachment
0
albert

Made the actual filetree! It is complete, being able to open directories, expand and collapse them, as well as display files in the editor.

P.S. It’s also collapsible with Cmd+R!

Attachment
0
albert

Ok, so now, I’ve actually made the “backend” for the filesystem. It doesn’t look the way it should since I haven’t yet integrated it into the actual visuals of the app, it just exists as an unused module for the moment. It is able to detect if an entity is a file or a directory, and act accordingly depending on what it is. It holds data about what’s expanded, it sorts the entities alphabetically, and it even ignores some directories!

Changelog

Attachment
0
albert

Made a starting base for the code editor, with a simple ahh text editor. The upside: I had to manually code all of the cmd+backspace, alt+backspace, and other stuff like that cause they weren’t working… I also added line and column tracking, so that’s pretty cool

Attachment
0
albert

Soooooo I switched from a cocoa infrastructure to iced… I did this because I wanted to have this application be cross-platform, and cocoa is MacOS exclusive, since it works through an Objective-C bridge. However, I was able to get the window back up and running again!!

Attachment
0
albert

Started learning Rust x Cocoa for the first time! Right now, it’s at a very early stage in development, with it being only a text-editor (which also barely works :(), but this is just the base of the editor. I have great plans for this one!

Attachment
1

Comments

kashsuks
kashsuks about 2 months ago

its so peak

albert

Fixed some goofy issues with the new MoE implementation. The old model (actually the only one) was made before MoE was added, so doesn’t have the parameter or MoE = False, and default was True. This led to the generate function thinking it had 196M paramters (I WISH) instead of the 77M it actually has. One thing led to another and the next thing you know, it’s printing absolute nonsense.

One more thing I did is run the data gathering pipeline, saving all 122GB of training data on an external drive. The issue is that to train the new model (which would be 2.3B parameters by the way), I can’t just use my M4 MBA, since it would probably blow up (although that would be cool)… So, I’m using cloud GPUs (4x RTX 5090, total 128GB VRAM), but they can only store a total of 32GB worth of files, and the training file alone is 122GB, nevermind the checkpoints. To get around this, I’m gonna use the Flask framework to host the data probably on my old iMac (24GB RAM, so large ahh chunks of data), if not, I’ll use my Raspberry Pi 3B (only 1GB RAM). This will definitely make training slower, but at least it’ll train… right?

Also don’t worry about the model’s responses below, that’s normal for this one, since it probably has never seen anything like what I was prompting it in its training data.

Attachment
1

Comments

chefpenguino
chefpenguino about 2 months ago

nice UI, looks good

albert

Finished ViT frontend interpretation for training.

Used the Flask API to make backend + frontend routing for ViT multimodal training. This also shows total parameters, including those gained from the ViT.

Attachment
0
albert

Bug fixes and documentation changes.

Basically I tested the app to make sure everything in the MoE, ViT, and all else works as expected. For now, no errors found. Anything I have found has been fixed and documented.

I have also made documentation refactoring, where I simplified the README to contain only completely essential starter information and added an EXPLANATION document to contain the extra “beef” that once lived in the README.

Attachment
0
albert

Started (and almost finished) a new part of my model… ViT! Nous can now interpret images. I’ve added a patch embeddings class, which creates the image embeddings, the ViT encoder, which processes those patches, and cross-attention (not just self-attention), which is able to handle and cross-reference text embeddings with image embeddings. These three together will let the model be able to create connections between text and images.

I’ve also been working on some organization improvements (side note, why is Pylint so picky? I don’t wanna make sure my imports are in alphabetical order), especially in documentation and code readability. The only reason for this is because one of my functions has literally >20 lines of arguments, and I wanna make sure it all makes sense.

Now, the only thing left to do is to complete the multimodal trainer (which will be a pain), create the pipeline, the data preparation module, make sure I actually CAN use the ViT with text, in what order, etc. I have a lot of work to do…

There’s still one more week of holidays, let’s grind together!

Attachment
0
albert

Just finished the GUI implementation for the Mixture of Experts! It’s fully complete and commited to GitHub, and also looks pretty good! See for yourselves below:

Attachment
1

Comments

seifotefy75
seifotefy75 3 months ago

Perfect Work

albert

Finished the MoE implementation for Nous!! I now have to fix some dumahh shape bugs and some weird jit compilation errors, but it’s going well! Next step is going to giving the model some insane parameter numbers (I’m looking for >=1bn), then training it. Not sure what monster GPU I’ll need for that, but I’ll manage! (Oh also don’t worry about my memory pressure, just training a 2bn parameter model…)

Attachment
0
albert

I’ve started working on using a different training pipeline for the model, as I want to expand it to be able to use image input, have text recognition, and more cool stuff to do with images! I have to use pretraining text data for this, since the current model’s natural language skills are not so good. Might also add a MoE.

Attachment
0