Harbor Browser banner

Harbor Browser

20 devlogs
77h 58m 43s

A web browser made from scratch in Rust with a custom HTTP Client, HTML+CSS parser and renderer, and JS interpreter.

tathya is aweomse

Logo

Changes


  • 8f47563: I refactored the code base so that text renderers have their own vertex buffer objects which they write to. Should hopefully improve performance when rendering multiple text objects.
  • ef4a97f: Did some more refactoring, but this time to improve the layout structure I have. I refactored text renderers a little more so they’re now font-specific rather than text-specific. Also the layout system allows for multiple available fonts and sizes via separate text renderers.

Next Steps


  1. The layout system is kind of like absolute positioning right now. I want to implement a more flexible layout system that allows for relative positioning and alignment.
Attachment
0
tathya is aweomse

Logo
(Where’d all the time go come from?)

Changes


  • aa54f4f: Lots of quality of life improvements relating to text rendering:
    • Moved text rendering to a separate text rendering manager
    • Added support for CMAP Subtable Format 12
    • Spent way too much time trying to advance a line height properly
    • Added anti-aliasing support for text rendering
    • Added support for composite glyphs, which included implementing scaling and translation of glyph components

Next Steps


  1. Move the vertex buffer object to be a property of the text renderer, to make it easier to render multiple things in the future
  2. Try to add some filling to the shapes being drawn

Note


I feel like I didn’t spend THAT much time on this devlog, but looking back at it I did add a whole lot of features. I have no idea why I didn’t commit more often, but it might have something to do with the fact that work on this commit started at 1 AM.

Attachment
0
tathya is aweomse

Logo
(My favorite devlog so far)

Changes


  • 7c7cefe: I added the HDMX table - just for funsies.
  • 5eb379b: I did some basic work on the render pipeline so I could actually start rendering glyphs. I did a really messy merge right before this, so I can’t really explain the details of the changes.
  • c227220: I fixed the parsing for the glyf table. There was barely anything done correctly when I first wrote it, but I had no way of telling until I tried rendering. This took WAY TOO LONG. I think the core of the issue was that x- and y-coordinates are stored as deltas, not absolute values. I refactored the codebase to hold contours rather than raw points, so that may have somehow solved part of the issue too.
  • bf3d0c9: I realized I needed to use a LineList rather than a LineStrip to render the contours properly. A LineStrip connects the last point to the first point automatically, which is not what I wanted (it would connect points from different contours).
  • efbef46: I completely forgot that I had to handle bezier curves in the glyph rendering. So I added that functionality in. It seems to be working fine now. I do need to find a way to dynamically calculate a decent precision for the curves.

Next Steps


  1. Fill in the glyphs so they’re not just outlines.
  2. Implement rendering of composite glyphs.
  3. Implement rendering multiple glyphs (i.e. words, sentences).
Attachment
1

Comments

tathya is aweomse

Logo

Changes


  • db4c07f, cca929f: I implemented the post and loca tables. I uniquely remember enjoying writing the post table’s code, I don’t know why. loca was pretty boring.
  • 4016b3e, 399c66a: I added the most important table, glyf. This table was a pain to implement. I had to individually implement both simple and composite glyphs. Composites were especially tricky because there wasn’t a well defined structure for them in the docs.
  • 5681a6a, 80b8865: I implemented 5 more tables, cvt, fpgm, prep, gasp and meta. I’m almost done adding tables - there’s only a couple left (4 more that I plan on implementing).

Next Steps


  1. Implement the remaining tables: hdmx, kern, vhea, vmtx.
  2. Consider adding support for AAT (Apple Advanced Typography) tables like bsln, feat, etc. (probably won’t)
  3. Start with the rendering engine - this will be a big task.
Attachment
Attachment
1

Comments

tathya is aweomse

Logo
(Micro devlog)

Changes


  • b753d4c: I added parsing for OS/2 and name tables, though the name table parsing only supports version 0 currently.

Next Steps


  1. Finish post table parsing.

Note


I really only made this devlog because it pushes me over the 60h mark for this project.

Attachment
Attachment
2

Comments

SeradedStripes
SeradedStripes 3 days ago

its looking fire

tathya is aweomse

Logo

Changes


  • b3661e7: I completed CMAP format 6 parsing - it was actually pretty easy. I’ve learned that font parsing is WAY easier than HTML parsing because they have such a strict format. I can literally just read bytes in order and know exactly what they mean.
  • 5d11d0f: Added the font header table head parsing.
  • 0f25790: I added parsing for hhea, maxp and hmtx tables. This was actually a lot harder than I expected because the hmtx table’s length depends on variables defined in the maxp and hhea tables - but it’s not guaranteed that those tables will come before hmtx in the file. So I had to set up a deferred parsing system which allowed me to defer parsing of certain tables until their dependencies were parsed.

Next Steps


  1. Complete the remaining required tables name, OS/2, post.
  2. Start parsing of tables required for glyph outlines - loca and glyf.
Attachment
Attachment
0
tathya is aweomse

Logo

Changes


  • 375abc7: I decided that I’m going to write my own font parsing library. I saw Sebastian Lague’s video on font rendering and it inspired me to try it out myself. I created some basic structs and functions that will help me parse font files and extract glyph data. I’m using MacOS’s /System/Library/Fonts/Times.ttc as my test font file for now.
  • ad91bfd: I implemented basic parsing, so I can now parse the TTC data to get a header, and table directories. Each table directory is furhter parsed to get table records. I’m currently working on parsing the ‘cmap’ table to get character to glyph mappings. I’ve completed cmap subtable format 0 and 4 so far.

Next Steps


  1. Complete parsing of cmap table by implementing format 2, 6, 8, 10, and 12, 13, 14. I’m considering only doing format 4 and 12 for now since they cover most use cases.
  2. Start work on next table to parse, likely ‘glyf’ table to get glyph data.
Attachment
0
tathya is aweomse

Logo
(Small, but extremely significant change)

Changes


  • 1bc896d: I implemented the _adoption_agency function which means a tags (along with all the other formatting tags) are now being properly serialized. Honestly I’m super sceptical about this implementation but it seems to be working fine for now. I’m too scared to test it.

Next Steps


  1. Possibly a deserialization to serialize the DOM into HTML.
  2. Actually start on rendering the DOM?
Attachment
3

Comments

tathya is aweomse

If you can’t read the last lines, here’s what they say:
“Document Tree:
If printed, the DOM would be 254647 characters long.
Extra dev note: I manually went through the DOM and can confirm it looks correct.”

2143727
2143727 5 days ago

WHOAAAA THIS IS COOL
I’ve done this before (make custom web browser), and it is hard!

Something cool you can do because it’s Rust: compile it to WASM, and have a browser in a browser :D

Gavan Bess
Gavan Bess 4 days ago

wow this has already come such a long way in such a short time. Awesome!

tathya is aweomse

Logo
(This devlog is mostly refactoring)

Changes


  • 839d015: I fixed a small bug in the code where certain elements were lacking parents and node documents.
  • 00861d6: Refactor of all time 2, electric boogaloo: I refactored Attrs to hold Rc<RefCell> instead of Element directly. This allows for better management of element references and avoids ownership issues. It was also a massive pain in the ass to refactor.
  • 871232f: The tree constructor can now properly construct the tree of the dummy HTML document I’ve been feeding it. It was very rewarding to see the child nodes populate correctly.

Next Steps


  1. I hate the In Body state but I have to finish it anyway

Note


Every commit in this devlog had at least some work on the tree constructor even if not explicitly mentioned in the change notes.

Attachment
0
tathya is aweomse

Logo

Changes


  • e1a25d6: I left out adding list of active formatting elements, so I finally added it here. It was a lot more annoying that I thought it would be.
  • afb2530: I refactored most of my codebase to make use of Rc<Refcell<>> instead of Box<> and other shitty workarounds. The code is a lot… fatter? now. But it is also a lot cleaner and easier to work with. I also got started with the 7th state and it’s going well, I noticed a bug from the refactor but I couldn’t be bothered to fix it as of now.

Next Steps


  1. Complete the 7th state (the “in body” state).
  2. Trace the bug that I noticed after the refactor and fix.

Note

The attachment is ~90 lines of an almost 300 line long tree structure that I use to represent the DOM!

Attachment
0
tathya is aweomse

Logo
(I keep forgetting to devlog)

Changes


  • f0f4b470: I decided that my old implementation of some of the structs in the HTML specification was not very good, so I rewrote them. This should make future changes easier. Along with this, I started working on the HTML parser again - this time on the tree construction stage.
  • 951e074e: I completed the first 2 insertion mods states in the tree construction stage of the HTML parser.
  • db893cd: Not very eventful, I just added a few more insertion modes to the HTML parser’s tree construction stage.
  • 19f26fe: I added 2 more insertion modes (6 total now), and also properly implemented a function that I’ve been meaning to implement for a while now (_appropriate_insertion_place). Refactoring my code to support this function was very satisfying.

Next Steps


  1. Continue implementing more insertion modes for the HTML parser’s tree construction stage.
Attachment
0
tathya is aweomse

Logo
(This was a LONG coding session)

Changes


The core theme of this devlog is the implementation of the HTML Tokenization spec - comprising an absolutely enormous state machine with 80 different possible states, at least 40 different kinds of errors, and extremely poor documentation on specifics. Commit-wise breakdowns are pretty uneventful but out of habit I’ll do it anyway - but with no details.


Next Steps


  1. I’ve got my tokenizer completed, but I still need to construct a Node tree using the emitted tokens. That’s what I’ll be working on in the next devlog, along with implementing more parts of the HTML spec as is required.

Note


This was probably the most boring coding session I’ve ever gone through. The code, like the specification, is repetitive, and can probably be modularized to oblivion. But if you think I’m voluntary touching those 2600 lines of code EVER again, you’re crazy.


(Edit: Changed a commit link that was broken after commit reword)

Attachment
0
tathya is aweomse

Logo

Changes


  • f7eec2d: I added some serialization functions to all the new URL-related structures, refactored the URLPath structure to be more representative of the specification, and integrated the new URL infrastructure into my existing HTTP client code.

Next Steps

  • Implement the HTML parser (for real this time)

Note


The next devlog is probably gonna be a BIG one, like 5+ hours minimum (most likely) - given the fact that the HTML parser specification is just so LONG. I’ve been reading through it and the main state machine literally has 80 states and that’s not even the entire specification because there’s probably a billion utility functions and side quests I’ll have to go on to implement it right.

Attachment
0
tathya is aweomse

Logo

(This was a LONG coding session)

Changes


  • 5fb1292: As promised, I worked on IPv4, opaque and domain parsing. That concludes the host parsing algorithm, it was a LOT lengthier than expected;
  • 7728d02: I completed the ENTIRE URL parsing specification’s implementation. This took WAY TOO LONG. Seriously, there’s no way parsing a URL is THAT deep. For reference, the entire implementation spans almost 1500 lines, consisting 4 different individual parsers (IPv6, IPv4, Opaque, Host), 5 different encoding sets, 27 different custom errors, that all come together with a 600 line long state machine to parse a URL.

Next steps


  1. sleep
Attachment
Attachment
0
tathya is aweomse

Logo

Changes


  • 47f712d: Little bit of work on the HTML spec.
  • a849fb1: I was trying to implement the URL specification but turns out I need to implement a host spec and for that I need to implement IPv4, IPv6 and a couple other things. So I’ve implemented IPv6, but this devlog has been mostly just busy work that doesn’t have much immediate benefit to the project.

Next Steps


  1. IPv4, opaque and domain parsing (basically, finish up host parsing as a whole)
  2. URL parsing (I need to finish host parsing for this)
Attachment
0
tathya is aweomse

Logo

Changes


  • f351b9c: I moved from calling DNS resolution functions independently to having a DNS resolver which caches resolved addresses and does cache invalidation after a TTL is elapsed.
  • d443140: Added automatic redirect following, so responses with status code 3XX automatically trigger a redirect check
  • abea778: Started work on the HTML stuff by adding some structures I saw in the official HTML5 specification.

Next steps


  1. Ignore most of the remaining structures/traits (and the stuff I marked as TODO), and go straight into the HTML parser

Notes


Maybe trying to start with expressing the entire HTML spec in code wasn’t the best idea. I don’t like leaving things mid-way but I don’t really see another way out here, at least for now.

Attachment
0
tathya is aweomse

Logo

Changes


  • bc11bca, 076345e: I added basic DNS resolution so you don’t have to type in IP addresses when sending requests, and can just write a domain/hostname
  • 17ad97c: I abstracted streams away into a trait allowing me to implement a new kind of stream which supports TLS, which means I can now send and receive responses with HTTPS!

Next steps


  1. DNS caching and cache invalidation
  2. Maybe some work on the rendering?
Attachment
0
tathya is aweomse

Logo

Changes


  • 9eae3a1: I added response decoding using a state machine system that populates a request object as its fed data. I designed the system to be robust to an extent that even if it receives data in chunks of 8 bytes it is able to properly construct a response object.
  • c9c0f02: Added a little helper function to get a nice display of the response

Next steps


  1. Actually look at HTTPS
Attachment
0
tathya is aweomse

Logo

Changes


  • 0784aeb: Started work on the HTTP client that will be sending requests and receiving responses. The next commit should have the actual sending moved into a separate function so that different HTTP Protocols can send requests differently.

  • ba895e8: I added integrity checks to requests, different send methods depending on protocol, basic response decoding (only enough for HTTP/0.9 for now), and a basic Client struct so you can specify an address once and send requests to it multiple times


Next Steps


  1. Implement response handling for HTTP/1.0 and HTTP/1.1
  2. Look into adding support for HTTPS
Attachment
0
tathya is aweomse

Harbor Browser

Harbor Browser is going to be a browser where I write every major service myself. This means everything from:

  • HTTP client
  • HTML/CSS parser and renderer
  • JS interpreter

And anything else a traditional browser would have must be written by me. I’m trying to limit myself to as few dependencies as possible.

Progress

I decided to use winit and wgpu as the windowing and GPU abstraction layers. They’ll be doing a lot of the heavy lifting for this project, I suspect. I got a basic window opening and cleared with a color. It doesn’t sound like much but that was 250 lines of code :|

Attachment
2

Comments

Gavan Bess
Gavan Bess 16 days ago

sounds like you have a long road ahead of you. im excited to see where this goes!

cestlaviet8438
cestlaviet8438 16 days ago

you’re a trooper for this project 🫡