Developed as a University Project for Vivekananda Global University (VGU)
Netflix hosts thousands of movies and TV shows, making it difficult for users to find content that matches their interests. This project presents a Content-Based Recommendation System that recommends similar movies and TV shows based on their content characteristics.
The recommendation engine analyzes information such as:
- π Genre
- π Description
- π¬ Director
- π€ Cast Members
- π Release Year
Using Natural Language Processing (NLP) and Machine Learning, the system identifies content similarities and recommends the most relevant titles.
- Build an intelligent recommendation system
- Apply Natural Language Processing techniques
- Implement TF-IDF Vectorization
- Calculate similarity using Cosine Similarity
- Develop an interactive Streamlit application
- Improve content discovery for users
Netflix Dataset
β
βΌ
Data Preprocessing
β
βΌ
Feature Engineering
β
βΌ
TF-IDF Vectorization
β
βΌ
Cosine Similarity Matrix
β
βΌ
Recommendation Engine
β
βΌ
Streamlit Web Application
- π¬ Content-Based Recommendation System
- π€ Machine Learning Powered Recommendations
- β‘ Fast Similarity Search
- π¨ Interactive Streamlit Interface
- π Clean and Modular Project Structure
- π Recommendation Based on Movie Metadata
| Technology | Purpose |
|---|---|
| Python | Programming Language |
| Pandas | Data Processing |
| NumPy | Numerical Operations |
| Scikit-Learn | Machine Learning |
| Streamlit | Web Application |
| TF-IDF | Text Vectorization |
| Cosine Similarity | Recommendation Engine |
| Pickle | Model Serialization |
The project uses the Netflix Titles Dataset containing metadata about Netflix movies and TV shows.
- Title
- Genre
- Description
- Director
- Cast
- Release Year
- Rating
- Handle missing values
- Clean textual data
- Prepare dataset for analysis
Important features such as genres, descriptions, directors, and cast members are combined into a single text feature.
The combined text is converted into numerical vectors using TF-IDF Vectorizer.
Cosine Similarity is used to measure similarity between titles.
The system returns the most similar movies or TV shows based on the selected title.
Netflix-Recommendation-System/
β
βββ data/
β βββ netflix_titles.csv
β
βββ notebooks/
β βββ EDA.ipynb
β
βββ src/
β βββ preprocessing.py
β βββ feature_engineering.py
β βββ vectorizer.py
β βββ recommender.py
β βββ utils.py
β
βββ models/
β βββ similarity.pkl
β βββ tfidf.pkl
β
βββ app/
β βββ app.py
β βββ recommendation.py
β βββ poster_fetcher.py
β
βββ model.py
βββ requirements.txt
βββ README.md
βββ .gitignore
git clone https://github.com/mukeshsharma99/Netflix-Recommendation-System.gitcd Netflix-Recommendation-Systempython -m venv venvWindows
venv\Scripts\activateLinux/macOS
source venv/bin/activatepip install -r requirements.txtpython model.pystreamlit run app/app.py- Movie Poster Integration using TMDB API
- Personalized Recommendations
- Trending Content Section
- User Authentication
- AWS Cloud Deployment
- Advanced Filtering Options
This project helped in understanding:
- Data Preprocessing
- Feature Engineering
- Natural Language Processing (NLP)
- TF-IDF Vectorization
- Cosine Similarity
- Recommendation Systems
- Streamlit Development
- End-to-End Machine Learning Projects
Mukesh Kumar
B.Tech β Computer Science & Engineering
Vivekananda Global University (VGU)
GitHub: https://github.com/mukeshsharma99
If you found this project useful, please consider giving it a Star β on GitHub.