I initially created this program to redact sensitive information in the demo videos I post for another project, but I thought it might be useful for other people as well so I’ll share it here.
I already finished the PoC which works well enough with some final touches.
The program extracts each frame then uses the PaddleOCR library to recognize words and their bounding boxes.
I’m using a kind of prefix tree to match text across the returned words, and rapidfuzz to fuzzy match words that aren’t caught by the previous match.
There’s still a lot of improvements to be had, and I’ll log my journey here!
Here’s a snippet of this program’s output from my latest devlog demo which has my youtube channel id, handle, and websub callback url redacted.
Log in to leave a comment