Skip to content

hamza93-ai/ML_Internship_Tasks_-Month_2-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

🏠🌸 ML Internship Tasks — Month 2 (Task 3 & Task 4)

This repository contains two Machine Learning projects completed as part of Month 2 of the ML Internship.


📋 Table of Contents


Task 3 — Housing Price Prediction

📌 Overview

Using the California Housing Dataset, a Linear Regression model is built to predict median house prices based on location, number of rooms, income, and population.


📂 Dataset

  • File: housing.csv
  • Target Variable: median_house_value
  • Key Features:
Feature Description
median_income Median income of the area
ocean_proximity Location category (one-hot encoded)
total_rooms Total rooms in the block
housing_median_age Age of the houses
households Number of households
population Population of the area

⚙️ Data Preprocessing

  • Filled missing total_bedrooms values with median
  • Applied One-Hot Encoding on ocean_proximity column
  • Applied StandardScaler for feature scaling
  • 80/20 Train-Test Split

📈 Feature Importance (Top Correlations)

Feature Correlation
median_income 0.688
ocean_proximity_NEAR BAY 0.160
ocean_proximity_NEAR OCEAN 0.142
total_rooms 0.134
housing_median_age 0.106

🤖 Model Used

  • Linear Regression

📊 Results

Metric Value
MSE 4,908,476,721
R² Score 0.6254

📌 R² of 0.625 means the model explains ~62.5% of the variance in house prices.


📉 Visualizations

  • Scatter Plot: Actual vs Predicted house values
  • Bar Chart: Top 10 features affecting house price (by correlation)


Task 4 — Iris Flower Classification

📌 Overview

Using the Iris Dataset, two classification models are built to classify iris flowers into 3 species — Setosa, Versicolor, and Virginica — based on petal and sepal measurements.


📂 Dataset

  • File: iris.csv
  • Target Variable: Species (3 classes)
  • Features Used:
Feature Description
sepal_length Length of the sepal
sepal_width Width of the sepal
petal_length Length of the petal
petal_width Width of the petal

⚙️ Data Preprocessing

  • Applied LabelEncoder on target (species names → numbers)
  • Applied StandardScaler for feature scaling
  • 80/20 Train-Test Split

🤖 Models Used

1. Random Forest Classifier

  • random_state = 42
  • Default parameters

2. Logistic Regression

  • max_iter = 200

📊 Results

Model Accuracy
Random Forest 100%
Logistic Regression 100%

✅ Both models achieved perfect accuracy on the Iris dataset — a well-balanced, clean dataset ideal for classification benchmarking.


📉 Visualizations

  • Confusion Matrix: Random Forest predictions heatmap
  • Bar Chart: Accuracy comparison between both models


🛠️ Libraries Used

numpy
pandas
matplotlib
seaborn
scikit-learn

🚀 How to Run

  1. Clone the repository:
git clone https://github.com/hamza93-ai/ML-Internship-Month-2.git
  1. Open the notebook in Google Colab or Jupyter:
jupyter notebook ML_Internship_Month2_Tasks.ipynb
  1. Upload the required CSV file (housing.csv or iris.csv) when prompted and run all cells.

📁 Project Structure

ML-Internship-Month-2/
│
├── ML_Internship_Month2_Tasks.ipynb   # Main notebook (Task 3 + Task 4)
├── ML_Internship_Month2_Report.pdf    # Project report
└── README.md                          # Project documentation

👤 Author

Hamza Asif ML Internship — Month 2 | Task 3 & Task 4

About

This repository contains two ML Internship projects (Month 2): Housing Price Prediction using Linear Regression on the California Housing dataset, and Iris Flower Classification using Random Forest and Logistic Regression on the Iris dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors