I worked on shifting to MNIST.
I updated the training logic to shuffle the training data first. Before, it used to first train all the zeros, then ones, twos… and nines at the last. With MNIST having thousands of training samples per digit, the network would ‘forget’ everything it had learned apart from what it learned near the end so it classified almost everything as a nine. Now it shuffles everything and it works better.
MNIST has like 60,000 images so i didnt even bother trying to upload all of that to github.
After that i trained it on the entire data for 64 epochs which took FOREVER!
I also made some graphics for the readme ig