I did not add Devlogs here for this project.
I am really sorry for that, though I did take some screenshots throughout building this and documented the process. (Which I will publish in a blog post, hopefully in a few days.)
To reiterate, this is a very basic proof-of-concept MVP. Here are more details on how I built it:
First of all, I tried creating my own CNN model by watching a bunch of tutorials, but failed (It’s easy to put in one sentence but it took me a LONG TIME and also a bunch of unlogged hours since most of it was on Google Colab)
Then, I realised I could use something called ‘Transfer Learning’ and researched for the best models, and I found ShuffleNetV2, which was just around 6 MB and had a lot of classes (Objects that it could detect), but I only needed 47 classes (Numbers + Capital Letters + Some Small letters, provided by the EMNIST dataset)
So I imported the model, ‘unfroze’ it and then re-trained it using Google Colab’s free GPU to create the model.
The next step was to build an OpenCV pipeline to preprocess the image. I already had some experience with OpenCV, so it was easy to start. On top of that, I watched a bunch of other implementations and tutorials on this to do this.
The accuracy is still poor, and I have many potential improvements, but I was able to get the model under 2 MB using the methods described above.
Log in to leave a comment