lexepub banner

lexepub

7 devlogs
7h 2m 11s

A Rust library for parsing ePubs FAST and memory efficiently.

It provides asynchronous streaming, metadata validation, and asset extraction across Rust, C/C++, and WebAssembly from a single core implementation.

This project uses AI

No AI used AT ALL :3

Demo Repository

Loading README...

NellowTCS

I PUBLISHED ITTTTTT!!!

This was the final cleanup sprint before shipping lexepub v0.1.0, and it ended up being a surprisingly big set of changes across docs, CI, workflows, and the README. Nothing glamorous, but all the stuff that makes a project feel real instead of “a folder on my computer.”

Docs Cleanup + Expansion

I went through the entire docs site and updated everything:

  • Rust adapter docs now include resource APIs, TOC APIs, and link normalization notes.
  • WASM adapter docs now list the full API surface, including get_resource() and the chapter‑relative resource helpers.
  • C/C++ adapter docs got updated with the new functions and a clearer API reference.
  • Quickstart now includes useful commands, WASM build instructions, and resource‑loading examples.
  • The homepage got a new “Resource + TOC Utilities” card and a clearer adapter parity section.

Basically: the docs now actually reflect what lexepub can do.

README Overhaul

The README got the cleanup it always deserved (and needed)

  • Proper formatting
  • Clearer feature list
  • NPM + crates.io links
  • Better installation instructions
  • Cleaner demo section
  • Updated build instructions

GitHub Pages Workflow Refactor

I rewrote (ahem, copied from Tamaru) the entire Pages deployment workflow:

  • fixed Node version
  • fixed root/base paths
  • fixed demo paths
  • fixed docs paths
  • fixed the deploy directory structure
  • fixed the index redirect

Release CI

I added proper release workflows:

  • cargo publish (with dry‑run option)
  • wasm-pack build
  • C header generation
  • release tarball creation
  • GitHub Release upload

It even handles platform‑specific artifacts cleanly and doesn’t hardcode paths anymore (I fixed that… again).

Resource API + WASM Updates

I added:

  • get_resource(path)
  • better resource resolution
  • normalized internal paths
  • improved TOC extraction
  • updated TypeScript definitions

The WASM adapter is now fully capable of powering the demo without hacks.

I Published It!!!!!!!!!!!!!!

Yep!
lexepub is now on crates.io and npm!

Attachment
Attachment
Attachment
0
NellowTCS

lexepub now has a Proper browser demo that actually renders EPUBs, images, CSS, TOC, everything!!!!
using the WASM build. I didn’t plan on building a renderer for v0.1.0, but here we are.

HTMLReader‑borrowed Demo

I put together a small demo that was yanked from HTMLReader (hey, again why remake the wheel if I already have a good UI/modular thing) that loads EPUBs through the WASM adapter and renders them directly in the browser. It works well!!!
(and shows off what lexepub can do without needing any external libraries)

It now supports:

  • chapter images
  • proper table of contents (using inferred chapter titles instead of raw filenames)
  • CSS from <link> tags
  • internal resource loading
  • normalized paths
  • and a clean UI for navigating chapters

Honestly, it’s starting to feel like a real reader, which feels insane…
I showed it to a friend and they actually thought it was like ePubJS or something haha, not my own thing!!!

Chapter Images + Resource Loading

I added get_chapter_resource and resolve_chapter_resource_path to the WASM API so the demo can fetch images and other linked assets.
This required adding:

  • path normalization
  • chapter-relative resolution
  • fallback logic for weird EPUB structures (why is this such a badly standardized format omg… I have ANOTHER project idea now, a new ebook format sob)

Images now display correctly inside chapters.

Proper TOC Support

Chapters now have a title field inferred from:

  1. <h1>, <h2>, or <title> in the AST
  2. otherwise the first non-empty text
  3. otherwise the filename stem

The WASM adapter exposes get_toc() and get_toc_json(), and the demo uses this to build a real table of contents.

CSS from <link> Tags

Linked CSS files now load and apply correctly.
This required:

  • resolving relative paths
  • reading CSS resources
  • parsing them
  • applying them to the AST before rendering

It’s still a simple CSS engine, but it’s enough for EPUBs.

Internal Path Normalization

EPUBs love weird paths (../, ./, backslashes, nested folders), so I added a normalize_internal_path() helper and integrated it into:

  • resource resolution
  • AST link normalization
  • href/src rewriting

This fixed a bunch of rendering/random other issues

TODO Updates

Added then checked off:

  • chapter images
  • proper TOC
  • CSS from <link>
  • demo added
  • internal path fixes

All that’s left really is cleanup and publishing and docs and so on, nothing major honestly, the TODO all has stuff that is beyond v0.1.0

Attachment
Attachment
Attachment
Attachment
1

Comments

Cyclic(John fire department) not FD

This is cool

NellowTCS

I added a basic demo!
The demo is based off HTMLReader, which i modified and stripped out epubjs from and replaced with the WASM compilation of LexePub. It works cleanly but there’s no formatting sob since like LexePub doesn’t have a proper renderer yet haha.

Attachment
Attachment
Attachment
0
NellowTCS

Today (well yesterday, I was tired, okay?) ended up being a pretty productive set of commits, nothing too dramatic, but a lot of important groundwork and cleanup that makes lexepub feel more complete and consistent across all adapters.

API Parity Across Rust, WASM, and C/C++

I finally checked off the “1‑1 API functionality” item, then unchecked it when I added CSS, then rechecked it in the same commit loll
This mostly meant adding the missing sync wrappers (get_metadata_sync, has_cover_sync, cover_image_sync) and wiring them into the C‑FFI layer.
(very annoying thing dealing with Diplomat’s restrictions, the thing I made, Saikuro is so much betterrrr and much cleaner and more languages are automatically supported.

Minimal CSS Parser that turned into a pretty average simple one

I added a small CSS parser, handrolled because cssparser from Servo, is difficult to deal with (future project, better api for it, 👀), just enough to handle basic selectors, declarations, and at‑rules.
It’s simple, sadly, but it works well for EPUB‑level CSS.
There’s a tiny AST (Stylesheet, CssRule, StyleRule), comment removal, declaration parsing, and some tests to make sure it doesn’t defy my expectations for CSS (copied from an ebook btw).

Documentation Pass

I added docs!

  • Rust adapter page includes full API references, sync wrappers, and CSS/AST behavior.
  • C/C++ adapter page lists the full generated API and includes a full example.
  • WASM adapter page has a proper API reference and example usage.
  • Quickstart has with optional features, sync wrappers, and convenience functions.

README Cleanup

The README got a big trim.
Most of the detailed examples and API references moved into the docs site, which is where they belong.
The README is now much cleaner and points people to the proper documentation. (Though I don’t know what late night me was thinking with the links being code lines like what?!?

TODO Updates

Checked off:

  • API parity
  • CSS parsing + application
  • Streaming cover image support

The majority of the TODO’s left are stuff that are probably for the future haha, but I do need to make a small HTMLReader-but-using-LexePub-demo

Attachment
Attachment
Attachment
0
NellowTCS

Just a tiny devlog this time, I’m learning :O

But I did add my default Docs with a capital D setup and the accompanying CI, and added streaming cover image support.

What I Did

I added docs.
And yes, I did borrow the folder structure from other projects.
I’m not reinventing the wheel when I already invented it twice (Tamaru, Saikuro, S-eco, need I name more?).
Please respect my efficiency.

And THEN, ONE MORE TODO: I implemented streaming cover image support.
As in:
cover_image_to_writer
Zero allocations.
Direct streaming.
AsyncWrite.
The whole thing.

It works, streams, and is like actually really nice!!!

TODO Updates

The TODO list shrank again, -2 more things…
I’m starting to worry I’m going to run out of TODOs and have to invent new ones.

Attachment
Attachment
Attachment
0
NellowTCS

Chaos Devlog time!!!

I added EPUB version detection.
Like, real version detection.
lexepub now looks at <package version="3.0"> and goes “oh okay cool” instead of staring blankly like a goldfish.

Then I added cover image format detection!
The manifest used to be a cute little HashMap<String, String> and now it’s a full (href, media-type) tuple because I decided lexepub should know MIME types like a sommelier knows wine.
This broke EVERYTHING.
Every. Thing.
Every .join(href) became .join(&href.0) aaaagh.

And then I fixed WASM.
Not “fixed WASM” like “haha a typo,”
I mean FIXED WASM like “rewrote half the bindings because Past Me was clearly having a moment’ (second time i’ve said that today haha).
Everything returns proper Result<T, JsValue> now.
Metadata serializes.
Chapters serialize.
Cover extraction works.

Oh and AST parsing?
Yeah that’s real now.
extract_ast() actually does AST things instead of returning ast: None like a liar.
WASM uses it too.
ParsedChapter is serializable.
Chapter is serializable (but I skipped the raw bytes because I’m not a monster).
This was supposed to be a “later” thing.
It is no longer a “later” thing.

Now we got:

EPUB Version: 3.0
Has Cover: true
Cover Format: image/jpeg

And the TODO list?
Oh my god the TODO list.
I chugged through TODOs fast as fluff.
WASM support? Done.
AST parsing? Done.
Version detection? Done.
Cover format detection? Done.
I swear the TODO list is shrinking faster than my sanity.

I also updated integration tests because apparently I’m responsible now.
They actually check MIME types and cover presence and error cases and everything.
Who am I.

Anyway.
I love how lexepub is turning out.
It started as a tiny little “haha unzip EPUB” thing and now it’s a full parsing engine with metadata, ASTs, WASM bindings, cover extraction, version detection, and a manifest that actually knows what it’s doing.

Attachment
Attachment
0
NellowTCS

SHOOOOOOOOOOOOT
I’m so good at forgetting projects existed sighhh.

So this is a lot less hours logged than I originally spent (a majority was googling HOW ePubs work (and why they’re so… hard to parse)) but I guess 2 and a half hours is fine whatever…

Okay so you all get the everything I did up to now devlog. Warning, this will be insanely rambly:

So. EPUBs.
Right.

You’d think “oh it’s just a zip file with some HTML in it” and you would be CORRECT but also WRONG because the way they organize everything is kind of a nightmare and I spent way too long just figuring out the file structure before I wrote a single line of Rust.

Okay so the gist: an EPUB is a zip file, inside that zip file is a META-INF/container.xml which points you to an OPF file (like OEBPS/content.opf or wherever), and THAT file has all the metadata AND a manifest (list of all files) AND a spine (the reading order). So you can’t just iterate the zip entries in order, you have to parse the OPF spine to figure out what order chapters actually go in.
Which is fun.
Very fun.
Super fun.

So I built the whole thing in layers basically. There’s an EpubExtractor at the bottom that just knows how to open a zip and read files out of it, and it can do this from a file path, from raw bytes, OR from a streaming async reader. That last one was kind of annoying to get right because async_zip has opinions about what traits your reader needs to implement and I had to do some fun (/sarc) stuff to get it to work.

Then on top of that there’s the actual parsing layer, ContainerParser for container.xml, OpfParser for the OPF file (metadata + spine + manifest), and ChapterParser / extract_text_content for turning the XHTML chapters into actual readable text using the scraper crate.

The main LexEpub struct is what you actually use and it caches chapters and metadata so you’re not re-parsing the whole thing every time you call get_metadata() twice.

Oh also there’s a lowmem feature flag that swaps out the scraper-based HTML parser for a dumb little hand-rolled state machine that just strips tags manually. It’s not as good at handling block elements and whitespace but it doesn’t build a full DOM tree which is the point. Useful for embedded targets theoretically.

The streaming story is… partially done. ChapterStream implements futures::Stream so you can consume chapters one at a time without loading all of them into memory at once. The benchmarks use jemalloc to actually measure heap allocation delta per operation which I’m pretty happy about as a setup.

For some reason I thought making benchmarks would be fun so there’s Criterion benches for from_bytes, from_reader, and extract_text_only. The CI runs cargo bloat to check binary size which is something I actually care about for once because the end goal is for this to be usable in WASM and potentially on embedded stuff (esp32 ahem)

WASM bindings exist in theory (src/wasm.rs) but they’re kind of broken right now, extract_with_ast() doesn’t exist, has_cover() doesn’t exist, cover_image() doesn’t exist. Those are all in the TODO. The C FFI via Diplomat is in better shape and actually generates a real header file.

The test suite is… extensive? Like maybe embarrassingly extensive for something this early. There’s unit tests, integration tests, API tests, edge case tests, streaming tests, performance tests, and a memory threshold test that reads /proc/self/status to check RSS delta. I went a little overboard. The edge case tests especially are kind of a placeholder graveyard right now, most of them are just “open the test epub and hope nothing crashes” because actually testing edge cases properly requires mock EPUBs and I have not built those yet.

The big things left:

  • A lot (just kidding, there’s a TODO.md)

Anyway that’s the summary and stuff thanks for coming to my TED Talk hope you enjoy.

Attachment
Attachment
0