So today I worked more on the Representation learning part. I create embeddings by using an AutoEncoder on the music. The embeddings are the output of the bottleneck layer. I also clustered the embeddings using DBSCAN (I did not use K-Means because I did not want to assume anything about the data distribution).
The plan for tomorrow is to work on the part where we improve the clustering. Right now I am thinking of doing deep clustering. So deep clustering is basically first getting the clusters of all the embeddings and then training the model not only to reconstruct the input but also to classify the cluster in a multi-task learning wise fashion. How does this help? Well doing this will decrease the distance between elements of the same cluster and increase the distance between different clusters (hopefully). We then recluster and run the cycle multiple times, in the end we get a neural network that gives embeddings that have clusterable patterns. After getting the patterns, I have some interesting plans! (follow the project for more updates)
yours,
Raj
Log in to leave a comment
Here is what I did today:
I got the audio encoding done part. I first split the audio into small segments, after this Constant-Q transform is applied to these segments to make them into a format that a neural network can process. The images are the visualization of 2 songs (one of them is never gonna give you up, so you got rick rolled).
What is this project about you might ask, well ill try to explain it. What im trying to do is to take the patterns that can be found in music, and then use these patterns to create tabular data, The end goal is to crank out as many datasets as possible so that I could become a grandmaster in kaggle!
Log in to leave a comment