Skip to content

premsai-pendela/Enhancing_Cyberbullying_detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Enhancing Cyberbullying Detection: A Multi-Algorithmic Approach

Code for our peer-reviewed paper published at IEEE ADICS 2024. 📄 Paper: https://ieeexplore.ieee.org/document/10533585

Overview

A comparative study of machine-learning and deep-learning methods for detecting text-based cyberbullying on social media. We benchmark classical ML pipelines against a deep-learning embedding model to find the most accurate, efficient detector for multi-class harassment text.

Dataset

~47,000 labeled tweets across 6 classes: religion, age, gender, ethnicity, other-cyberbullying, and not-cyberbullying. (Public Twitter cyberbullying dataset — Source: https://www.kaggle.com/datasets/andrewmvd/cyberbullying-classification.)

Methods

  • TF-IDF + Support Vector Classifier (SVC) — pipeline method
  • TF-IDF + Multinomial Naive Bayes (MNB) — pipeline method
  • GloVe word embeddings + deep-learning model (separate approach)
  • Metrics: accuracy, precision, recall, F1, specificity, training time

Results

Approach Accuracy Precision Recall F1 Specificity
SVC + TF-IDF 79.8% 0.826 0.798 0.796 99.45%
MNB + TF-IDF 70.9% 0.841 0.709 0.746 93.56%
GloVe (DL) 73.3% 0.755 0.733 0.742 98.37%

Finding: the SVC + TF-IDF pipeline beat the deep-learning model on accuracy and specificity, at a small training-time cost (7.8s vs 5.77s).

Repository

  • SVM-NaiveBayes.ipynb — TF-IDF + SVC / MNB pipelines
  • Glove.ipynb — GloVe embedding deep-learning model

How to run

  1. pip install scikit-learn pandas numpy tensorflow (+ GloVe vectors for the DL notebook)
  2. Download the dataset (link above), place in the repo root
  3. Open the notebooks in Jupyter and run top-to-bottom

Citation

N. P. S. Pendela et al., "Enhancing Cyberbullying Detection: A Multi-Algorithmic Approach," 2024 IEEE ADICS. DOI: 10.1109/ADICS58448.2024.10533585

Authors

Naga Prem Sai Pendela (first author) and team — Vardhaman College of Engineering.

About

Spearheaded a major project focused on enhancing cyberbullying detection through a multi-algorithmic approach, leveraging Support Vector Machines (SVM), Multinomial Naive Bayes, a pipeline method combining two estimators and a pre-trained Glove model. Conducted extensive performance evaluations on a large dataset and deploying the code on GitHub.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors