Code for our peer-reviewed paper published at IEEE ADICS 2024. 📄 Paper: https://ieeexplore.ieee.org/document/10533585
A comparative study of machine-learning and deep-learning methods for detecting text-based cyberbullying on social media. We benchmark classical ML pipelines against a deep-learning embedding model to find the most accurate, efficient detector for multi-class harassment text.
~47,000 labeled tweets across 6 classes: religion, age, gender, ethnicity, other-cyberbullying, and not-cyberbullying. (Public Twitter cyberbullying dataset — Source: https://www.kaggle.com/datasets/andrewmvd/cyberbullying-classification.)
- TF-IDF + Support Vector Classifier (SVC) — pipeline method
- TF-IDF + Multinomial Naive Bayes (MNB) — pipeline method
- GloVe word embeddings + deep-learning model (separate approach)
- Metrics: accuracy, precision, recall, F1, specificity, training time
| Approach | Accuracy | Precision | Recall | F1 | Specificity |
|---|---|---|---|---|---|
| SVC + TF-IDF | 79.8% | 0.826 | 0.798 | 0.796 | 99.45% |
| MNB + TF-IDF | 70.9% | 0.841 | 0.709 | 0.746 | 93.56% |
| GloVe (DL) | 73.3% | 0.755 | 0.733 | 0.742 | 98.37% |
Finding: the SVC + TF-IDF pipeline beat the deep-learning model on accuracy and specificity, at a small training-time cost (7.8s vs 5.77s).
SVM-NaiveBayes.ipynb— TF-IDF + SVC / MNB pipelinesGlove.ipynb— GloVe embedding deep-learning model
pip install scikit-learn pandas numpy tensorflow(+ GloVe vectors for the DL notebook)- Download the dataset (link above), place in the repo root
- Open the notebooks in Jupyter and run top-to-bottom
N. P. S. Pendela et al., "Enhancing Cyberbullying Detection: A Multi-Algorithmic Approach," 2024 IEEE ADICS. DOI: 10.1109/ADICS58448.2024.10533585
Naga Prem Sai Pendela (first author) and team — Vardhaman College of Engineering.