Seer banner

Seer

18 devlogs
51h 49m 46s

Ever wished you could zoom in? What if you could zoom in so much you would want it to stop? Well, here you go. With Seer, you can now take a 3D scan of your room, open it up on your device, and zoom in all the way down to the molecular level to se…

Ever wished you could zoom in? What if you could zoom in so much you would want it to stop? Well, here you go. With Seer, you can now take a 3D scan of your room, open it up on your device, and zoom in all the way down to the molecular level to see what all the objects in your room actually look like. Then, you can turn on time to see those molecules vibrate and move just as they would normally. Was made with the utmost attention to physical accuracy.

This project uses AI

I used AI to help me with some new libraries. It did not write any of the code in my codebase, simply taught me syntax for those specific libraries.
Also used AI as a way to help plan the building of the app.
App icon and installer background were made with Gemini’s Nano Banana Pro.

Demo Repository

Loading README...

albert

Finally got time working and it lowkey cooked me.

Tuned the chunks first: fixed a bug where every chunk was seeding molecules from the world origin instead of its own bounding box center, so everything was spawning as one giant sphere. Classic. Also added a toroidal world so walking off one edge brings you back from the other.

Then I spent wayyyy too long on MACE. Got the full force evaluation pipeline working, then turned it on and organic molecules flew off to hundreds of thousands of Angstroms in one step. The timestep was just too large for C-H bonds.

I ended up replacing MACE with harmonic bond springs for now. 200 N/m between each bonded pair, 1 fs internal timestep, and the molecules actually vibrate correctly with ~0.14 Å bond fluctuations. Shape is fully preserved. MACE is still in the codebase but the harmonic model genuinely looks better for visualization anyway.

Also wired up the full MD UI: time toggle, temperature slider, speed slider (live, no restart needed), and an electron clouds toggle that swaps to stick bonds during dynamics. Sticks update in-place by writing vertex positions directly into the GeomNode so it doesn’t fry the frame rate.

I’m probably gonna ship this next devlog…

Attachment
0
albert

Completed the connection between macro and molecular level

Got the full zoom transition working, finally.

After locking the mouse, I still had to actually build out the environment that opens when you zoom in:

The room now fully unloads when you enter molecular mode, and when you zoom back out, everything restores exactly, meaning with the same camera position, same FOV, same everything. There’s also a bit of a background fade, so as you scroll in past a certain FOV, the color gradually shifts from whatever the natural background is to pure black.

I also wired up the raycast picker. When you zoom in, it fires a ray from the center of the screen, hits an object, and locks onto it. That way the molecules that load are actually from the material you’re looking at, not just random.

The biggest thing I added was the chunk streaming system. Instead of placing a fixed cluster of molecules and calling it a day, the whole molecular world is now an infinite 3D grid. As you move around, chunks generate in front of you and the distant ones drop off. You can literally walk forever in any direction and it just keeps going. Each chunk is seeded deterministically so they’re always consistent.

Also threw in the atom scale slider from before. It only shows up in molecular mode, and it lets you slide between covalent and van der Waals sizes.

Next up is tuning the chunk parameters. Way too many particles right now, it’s cooked. After that, actual tempral dynamics. YAYAYYYAYYYY

Attachment
Attachment
0
albert

Managed to lock the user’s mouse when they are in molecule mode

I added the state tracking capabilities to seer_app.py, and the main challenge here was just being able to lock the mouse from movement when zoomed in.

Previously, when zoomed in, if the user moved their mouse, it would be able to exit the bbox of the object. Now, as a fix, I decided to completely disable mouse movement when past a certain FOV, reenabling it when zooming back out.

The next challenge that I must take on is allowing the mouse to move inside the molecular environment while staying stationary in the real-world one. My fix to this may be slightly complex, but it is the best way I can think of to do it.

When their movement is locked, I plan on “disabling” the world environment, and just duplicating the background the user is seeing, then the new environment is no longer in the room, but its own dedicated viewport. This also improves efficiency, since I don’t have to keep the room rendered along with the molecules.

Attachment
Attachment
0
albert

Wired up the environment into a single module

I basically refactored the whole code so that instead of having an environment.py file which stored my entire environment, it would just contain the helper functions to construct the environment. Then, I made a seer_app.py file to orchestrate everything.

Next up, I still have to bring in the suff I did in zoom_controller.py and integrate it with the molecule arrangement module so that instead of seeing the blank screen of colour you can see in the images, you see an actual lattice or something.

Wish me luck!!

Attachment
Attachment
Attachment
Attachment
0
albert

OH MY GAWDDDD WHY IS THIS SO BEAUTIFUL!!!!

Yo im not even trolling this is actually so FREAKING COOL!!!

Here i’m rendering 1000 total molecules, which may seem like a lot, and that’s cause it is. It’s running at like 2 frames per second when it’s zoomed in, and when its not, it’s running at just about 10 seconds PER FRAME. Makes sense ig…

I added the ability to render multiple molecules at once, added a scale slider, and made sure everything is in metres instead of the goofy normalized Panda3D units. Now, I just have to put it into the real-world environment.

Attachment
Attachment
Attachment
0
albert

9.5 hours.

That’s how long it took me to finally get the full simulation up and running. Now it’s not perfect, and it’s got some gaps, but honestly, I love the way it’s looking.

What I’ve done:

  • Built a state machine that stores molecule templates, instances, object scenes, and the entire environment
  • Built a geometry helper file with all sorts of functions to help the placement of molecules in an object’s bounding box space, so that molecule instances don’t intersect with one another.
  • Made a placement file that manages the placement of each molecule using a frontier approach, where once a new molecule is placed, more stem from that one.
  • Built the full rendering file that uses the placement file to actually render the atoms and bonds together, rotating them in space.
  • Tied them all together to get this.

So you may notice that the bonds aren’t just regular bonds. They’re electron density regions. Remember, I set out to be as accurate as possible, so here it is. Every single one of those dots is the psossible location of an electron in its bond.

There’s no simple single/double bonds, instead, I went all out. Single bonds are SIGMA bonds, and double bonds are one sigma + one pi bond.

If you’re wondering how the efficiency and smoothness of running this many entities in Panda3D is, it’s actually basically perfect. I don’t see any lag or sketchy jittering.

Next up is making it even more accurate!

Attachment
0
albert

My bad for not logging for a while, there js wasn’t much to show for outputs, and tbh, there still isn’t too much.

So, I finally got the details for all molecules working. The PubChem REST API is now pulling in coordinates, bond information, and chemical formulas for everything, and I’ve got 7 unique molecules across 37 annotated objects in the test scene.

After that, I started building out the actual simulation side. First step was getting a single molecule to render in Panda3D with proper CPK coloring and bond visualization. That worked surprisingly well. You can see the second image, that’s the single molecule.

Next, I tried just stacking them one on top of the other, spamming them in XYZ directions, but it didn’t really look realistic, so now I have to go the long route. I needed a way to place thousands of molecule instances without duplicating all the coordinate data, so I built a whole state machine system. There’s now a MoleculeTemplate class that holds the immutable chemical structure, and MoleculeInstance that just tracks position and rotation. One template can have hundreds of instances pointing to it.

Then I wrapped that in an ObjectState class to handle each annotated object from the scene (like “books” or “desk”), and an Environment class to hold all of them together.

The idea is you load the bounding boxes from the annotations, and for each object, you populate its volume with molecule instances based on the material composition.

I also spent way too long refactoring type annotations into a central file and making a constants module for element colors and radii. The CPK color mappings were duplicated in three places and it was lowkey pmo.

Next up:

Need to make the molecule placement actually realistic. Need to add randomization, collision checking between neighboring instances, and probably some basic force field approximations to make molecules settle into reasonable configurations.

Attachment
Attachment
0
albert

Finally got the details for all molecules

I ran into some issues when trying to get the formal names for each molecule, since Qwen was constantly giving me the conventional names (like cellulose instead of cellobiose), so I had to debug with changing the prompt, the model, and even the whole infrastructure, until I finally settled on using a cloud model, Deepseek V3.2, with chat mode on.

After that, I managed to get the SMILES strings and use rdkit to get all the 3D details for each molecule. There are some inaccuracies, so I’m gonna switch to a PubChem REST API (AGAIN), since they have better (AND MORE) data.

*PubChem boutta become my best friend sob-hole *

Attachment
0
albert

Made the aggregation script

This time I added aggregation capabilities so that once you have annotated your scene with the objects and their materials, it runs a local Ollama model (in my case, I chose qwen2.5:7b) and returns the composition of each object, putting them into a JSON file.

Attachment
0
albert

Made the manual annotation work

It’s been a while since I’ve worked on this, but it’s nice to be back! I’ve managed to make the manual annotation pipeline, since the AI one was horrible.

Now, you make a 2D bounding box first, then you extrude it to give it height. You can then save it as an object, and by the end, they all get stored into a JSON file.

It works fairly similarly to the AI one, except you have to do these ones yourself.

The UI definitely needs some work, but this is already a long time to go without a devlog.

Attachment
Attachment
0
albert

Ok, so there are a few things that I tried doing. Even after making the switch to the grounding DINO model, which perfectly predicted the bboxes of each object, the 3D projections still look like ahhh.

I know the issue is somewhere in the raycasting code, but I can’t lie ts frying me already. I tried making a few changes, but literally nothing worked. So now, I’m gonna switch to manual annotation. It’ll be more tedious than before, but at least it’ll work (hopefully).

Changelog

Attachment
0
albert

After looking at the projections of the Gemma-made bboxes in the environment, I was not having a good time. So, I decided to switch to a second model for bboxes while keeping object detection on Gemma.

I used IDEA-Research’s Grounding DINO model, which, given an object name, estimates a bounding box around it. I took the object names from Gemma’s output and put them into the classifier, creating this beautiful JSON you see below. Haven’t yet run the raycasting algorithm on them yet.

Changelog

Attachment
0
albert

Once the bounding boxes were gotten from Gemma, I only had to make the raycasting pipeline to project the 2D bounding box onto a 3D plane. The first thing I did was calculate camera poses using COLMAP, then I used those camera poses to project rays onto each image, recording all data in dictionaries.

I actually managed to learn a whole lot about looping through dictionaries and their intrinsic qualities, which was really fun. The only drawback is that the time complexity is not favorable: O(n^2), where n is the number of frames/images.

The only bug to fix is in the COLMAP database, where the initial pair of images is not being calculated. This is probably because there is too small of a difference between the camera poses in each frame. I’m not sure yet if this is true or how to even fix this.


Changelog

Attachment
0
albert

Switched fully to Gemma for the frame predictions cause its lowkey WAY more accurate with its text output. Also gave it context in the prompt from past frames so that it predicts things across frames more consistently.

Below is an image of the JSON file it outputs for all 450 images.

Changelog

Attachment
0
albert

I have changed the prompt to estimate bounding boxes for each object in the 2D image. Now, I have to project that into the 3D environment, and make these bounding boxes actually three-dimensional. To do this I will use raycasting and camera pose to estimate distance to objects and therefore their positions in 3D space.

I’m also thinking of switching to Gemma instead of Nemotron, since Gemma is not just a VLM, but is mutlimodal, so will often have better text/JSON outputs.

Changelog

Attachment
0
albert

On the movement and environment side of things I added the fabled zoom mechanic (it was really easy) and some speed control.

The most important new thing is that I am now using NVIDIA’s Nemotron with OpenRouter to analyze images and output the object it most likely is and the materials it’s made of.

I initially used Qwen, but it js wasn’t precicse enough for my use case.

Next up, I’ll have to find a way to create bounding boxes around specific objects in the image so that I know where to start and stop each material. Afterwards, it’s the actually fun part of making the lookup table so that each material can be linked to a specific molecule.

Changelog

Attachment
0
albert

Good news! I was able to load the OBJ into Panda3D and create some basic movement capabilities including WASD and mouse movement. The only things that I have to do before I get to the physics are adding zoom and maybe re-recording the video (since it’s kinda low quality, which messes up how the 3D scan looks)

Changelog

Attachment
0
albert

So I made the pipeline that turns videos into .obj files through the intermediary .usdz (Apple’s proprietary 3D file format).
I had to first take the video from many different angles, then turn it into a folder full of images that were taken every two frames, and then, had to use some SwiftUI code to use Apple’s RealityConverter.
Then, I decided to use aspose 3d to convert to OBJ, but ran into some coloring issues, since the USDZ is a zip archive, so I had to extract the texture images from it and insert them into the OBJ.

The result is actually not too far from what it really is!

Attachment
0