This project explores consumer behavior analytics by building a Decision Tree classification model to predict whether a prospective customer will purchase a premium digital device (iPhone). By leveraging a rule-based algorithm, this analysis aims to provide marketing teams with highly interpretable "if-then" demographic thresholds to optimize targeted digital ad campaigns.
The dataset maps consumer demographics and financial status against their historical purchase decisions.
- Dataset Size: ~400 records, 4 variables.
- Target Variable:
Purchase Iphone(Binary: 0 = Not Purchased, 1 = Purchased) - Key Features:
Gender,Age,Salary.
- Language: Python
- Data Manipulation:
pandas,numpy - Machine Learning:
scikit-learn(Decision Tree Classifier) - Data Visualization:
matplotlib,seaborn
- Exploratory Data Analysis (EDA): * Generated a Count plot of the target to establish the baseline conversion rate.
- Utilized Boxplots by class and correlation visualizations to confirm that high
Salaryand higherAgeare the primary drivers of premium device purchases.Genderwas identified as statistically neutral.
- Utilized Boxplots by class and correlation visualizations to confirm that high
- Train/Test Split: An 80/20 train-test split applied to evaluate the model on unseen consumer data.
- Model Selection (Decision Tree): A Decision Tree Classifier was chosen specifically for its "white-box" nature. Unlike distance-based algorithms, Decision Trees do not require strict feature scaling and generate highly interpretable decision logic that non-technical business stakeholders can easily understand.
The model established a solid baseline for consumer profiling, balancing accuracy with high interpretability.
- Overall Accuracy Score:
0.79(79%) - Confusion Matrix Breakdown (Test Size = 76):
- True Positives (23): Correctly identified 23 actual buyers.
- True Negatives (37): Correctly identified 37 non-buyers.
- False Positives (11): Predicted 11 people would buy, but they did not (Overestimation/Wasted Ad Spend).
- False Negatives (5): Missed 5 actual buyers.
- The Accuracy vs. Interpretability Trade-Off: While a 79% accuracy is functional, the 11 False Positives indicate the model's rigid "if-then" splitting rules are slightly too aggressive. In a live marketing campaign, this translates to minor inefficiencies (targeting individuals who won't actually convert).
- Strategic Value: The primary value of this specific model is not raw predictive power, but insight extraction. Marketing teams can view the exact Age and Salary thresholds the tree created to build generalized, data-backed buyer personas.
- Next Steps: Because single Decision Trees are highly prone to variance and rigid thresholds, the immediate next step to improve accuracy (and reduce false positives) is to upgrade