Developed as a University Project for Vivekananda Global University (VGU)
Netflix hosts thousands of movies and TV shows, making it difficult for users to find content that matches their interests. This project presents a Content-Based Recommendation System that recommends similar movies and TV shows based on their content characteristics.
The recommendation engine analyzes information such as:
- 🎭 Genre
- 📝 Description
- 🎬 Director
- 🎤 Cast Members
- 📅 Release Year
Using Natural Language Processing (NLP) and Machine Learning, the system identifies content similarities and recommends the most relevant titles.
- Build an intelligent recommendation system
- Apply Natural Language Processing techniques
- Implement TF-IDF Vectorization
- Calculate similarity using Cosine Similarity
- Develop an interactive Streamlit application
- Improve content discovery for users
Netflix Dataset
│
▼
Data Preprocessing
│
▼
Feature Engineering
│
▼
TF-IDF Vectorization
│
▼
Cosine Similarity Matrix
│
▼
Recommendation Engine
│
▼
Streamlit Web Application
- 🎬 Content-Based Recommendation System
- 🤖 Machine Learning Powered Recommendations
- ⚡ Fast Similarity Search
- 🎨 Interactive Streamlit Interface
- 📊 Clean and Modular Project Structure
- 🔍 Recommendation Based on Movie Metadata
| Technology | Purpose |
|---|---|
| Python | Programming Language |
| Pandas | Data Processing |
| NumPy | Numerical Operations |
| Scikit-Learn | Machine Learning |
| Streamlit | Web Application |
| TF-IDF | Text Vectorization |
| Cosine Similarity | Recommendation Engine |
| Pickle | Model Serialization |
The project uses the Netflix Titles Dataset containing metadata about Netflix movies and TV shows.
- Title
- Genre
- Description
- Director
- Cast
- Release Year
- Rating
- Handle missing values
- Clean textual data
- Prepare dataset for analysis
Important features such as genres, descriptions, directors, and cast members are combined into a single text feature.
The combined text is converted into numerical vectors using TF-IDF Vectorizer.
Cosine Similarity is used to measure similarity between titles.
The system returns the most similar movies or TV shows based on the selected title.
Netflix-Recommendation-System/
│
├── data/
│ └── netflix_titles.csv
│
├── notebooks/
│ └── EDA.ipynb
│
├── src/
│ ├── preprocessing.py
│ ├── feature_engineering.py
│ ├── vectorizer.py
│ ├── recommender.py
│ └── utils.py
│
├── models/
│ ├── similarity.pkl
│ └── tfidf.pkl
│
├── app/
│ ├── app.py
│ ├── recommendation.py
│ └── poster_fetcher.py
│
├── model.py
├── requirements.txt
├── README.md
└── .gitignore
git clone https://github.com/mukeshsharma99/Netflix-Recommendation-System.gitcd Netflix-Recommendation-Systempython -m venv venvWindows
venv\Scripts\activateLinux/macOS
source venv/bin/activatepip install -r requirements.txtpython model.pystreamlit run app/app.py- Movie Poster Integration using TMDB API
- Personalized Recommendations
- Trending Content Section
- User Authentication
- AWS Cloud Deployment
- Advanced Filtering Options
This project helped in understanding:
- Data Preprocessing
- Feature Engineering
- Natural Language Processing (NLP)
- TF-IDF Vectorization
- Cosine Similarity
- Recommendation Systems
- Streamlit Development
- End-to-End Machine Learning Projects
Mukesh Kumar
B.Tech – Computer Science & Engineering
Vivekananda Global University (VGU)
GitHub: https://github.com/mukeshsharma99
If you found this project useful, please consider giving it a Star ⭐ on GitHub.