Movie Recommendation System

techwizard shipped Movie Recommendation System

5 months ago

Shipped this project!

Hours: 0.7

Cookies: 🍪 3

Multiplier: 4.73 cookies/hr

I just completed a movie recommender using Python, Pandas, and Scikit-Learn. The system uses Natural Language Processing (Bag of Words) to convert movie tags into vectors and calculates similarity scores to find the closest matches in n-dimensional space.

The Build: I processed the TMDB 5000 dataset, optimized the model using pickle for instant loading, and integrated the TMDB API to fetch real-time movie posters on the frontend.

The Lesson: I learned a ton about vectorization and CountVectorizer. It was fascinating to see how mathematical angles between vectors can accurately predict human taste in movies.

techwizard worked on Movie Recommendation System

5 months ago

0h 41m logged

Ever finished a movie and wondered, “What should I watch next?” I spent this week building a Content-Based Recommendation Engine to answer exactly that.
Here is how I built it day-by-day.
📅 Day 1: The Setup
I skipped the classic MovieLens dataset (ratings-based) and chose the TMDB 5000 Dataset to focus on content analysis. Using Pandas, I merged the movie and credits files and filtered for the essentials: genres, keywords, overview, cast, and crew.

📅 Day 2: Wrangling Data
Data cleaning was the heavy lifting. Columns like genres were stored as JSON strings (e.g., [{"id": 28, "name": "Action"}]).

The Fix: Used ast.literal_eval to parse them into Python lists.
Feature Engineering: I extracted the top 3 actors and the director. I also collapsed spaces (e.g., “Science Fiction” → “sciencefiction”) to create unique tag entities.
Result:A single “Super Column” called tags that summarizes the entire movie.
📅 Day 3: The Math (Vectorization)
To measure similarity, I needed to turn text into numbers.
Vectorization: Used Scikit-Learn’s CountVectorizer (Bag of Words) to convert tags into 5,000-dimensional vectors, removing stop words.
Similarity:Used Cosine Similarity to measure the angle between vectors. This generated a matrix comparing every movie against every other movie.
📅 Day 4: The Interface
I used Streamlit to build a frontend.
Logic: The user selects a movie → App finds its index → Sorts the similarity matrix → Returns the top 5 matches.
📅 Day 5: API Integration
Text-only lists are boring. I signed up for the TMDB API and wrote a script to fetch real-time movie posters. Displaying them side-by-side made the app feel like a real product.
📅 Day 6: Optimization
Re-calculating the model on every reload was too slow.
Solution: I used pickle to save the processed data and similarity matrix. The app now loads pre-computed files instantly.

Comments

Chibueze Benneth 5 months ago

oh that’s really cool! One thing I want to learn is how to properly integrate APIs into my workflow, so I am impressed you scaled your project to more than just a text based model. Good job!

techwizard about 2 months ago

Hey Chibueze! API integration is super fun once you get the hang of it. If you use Python, my go-to workflow usually looks like this:

Get the API Key: Sign up for the service to get your unique key.

Use the requests library: It makes fetching data so easy. You just set up your URL and pass your key.

Fetch the data: Run response = requests.get(url).

Convert to JSON: Use data = response.json() to turn the API’s response into a standard Python dictionary.

Extract: From there, you just dig into the dictionary to grab exactly what you need (like the movie poster URLs in this project!).

1 Follower

Shipped this project!

Comments