This repository contains two Machine Learning projects completed as part of Month 2 of the ML Internship.
- Task 3 — Housing Price Prediction
- Task 4 — Iris Flower Classification
- Libraries Used
- How to Run
- Project Structure
Using the California Housing Dataset, a Linear Regression model is built to predict median house prices based on location, number of rooms, income, and population.
- File:
housing.csv - Target Variable:
median_house_value - Key Features:
| Feature | Description |
|---|---|
median_income |
Median income of the area |
ocean_proximity |
Location category (one-hot encoded) |
total_rooms |
Total rooms in the block |
housing_median_age |
Age of the houses |
households |
Number of households |
population |
Population of the area |
- Filled missing
total_bedroomsvalues with median - Applied One-Hot Encoding on
ocean_proximitycolumn - Applied StandardScaler for feature scaling
- 80/20 Train-Test Split
| Feature | Correlation |
|---|---|
median_income |
0.688 |
ocean_proximity_NEAR BAY |
0.160 |
ocean_proximity_NEAR OCEAN |
0.142 |
total_rooms |
0.134 |
housing_median_age |
0.106 |
- Linear Regression
| Metric | Value |
|---|---|
| MSE | 4,908,476,721 |
| R² Score | 0.6254 |
📌 R² of 0.625 means the model explains ~62.5% of the variance in house prices.
- Scatter Plot: Actual vs Predicted house values
- Bar Chart: Top 10 features affecting house price (by correlation)
Using the Iris Dataset, two classification models are built to classify iris flowers into 3 species — Setosa, Versicolor, and Virginica — based on petal and sepal measurements.
- File:
iris.csv - Target Variable: Species (3 classes)
- Features Used:
| Feature | Description |
|---|---|
sepal_length |
Length of the sepal |
sepal_width |
Width of the sepal |
petal_length |
Length of the petal |
petal_width |
Width of the petal |
- Applied LabelEncoder on target (species names → numbers)
- Applied StandardScaler for feature scaling
- 80/20 Train-Test Split
random_state = 42- Default parameters
max_iter = 200
| Model | Accuracy |
|---|---|
| Random Forest | 100% |
| Logistic Regression | 100% |
✅ Both models achieved perfect accuracy on the Iris dataset — a well-balanced, clean dataset ideal for classification benchmarking.
- Confusion Matrix: Random Forest predictions heatmap
- Bar Chart: Accuracy comparison between both models
numpy
pandas
matplotlib
seaborn
scikit-learn
- Clone the repository:
git clone https://github.com/hamza93-ai/ML-Internship-Month-2.git- Open the notebook in Google Colab or Jupyter:
jupyter notebook ML_Internship_Month2_Tasks.ipynb- Upload the required CSV file (
housing.csvoriris.csv) when prompted and run all cells.
ML-Internship-Month-2/
│
├── ML_Internship_Month2_Tasks.ipynb # Main notebook (Task 3 + Task 4)
├── ML_Internship_Month2_Report.pdf # Project report
└── README.md # Project documentation
Hamza Asif ML Internship — Month 2 | Task 3 & Task 4