Skip to content

elaheghiyabi96/breast-cancer-histopathology-dataset-exploration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

breast-cancer-histopathology-dataset-exploration

Exploratory analysis and visualization of a breast cancer histopathology image dataset, including binary and multiclass classification structures across multiple magnification levels.

Breast Cancer Histopathology Dataset – Exploration & Visualization

📌 Overview

This repository provides an exploratory analysis and structured overview of a breast cancer histopathology image dataset. The goal is to clearly document the dataset organization, classification tasks, and image characteristics before applying machine learning or deep learning models.

The dataset contains microscopic images of breast tissue acquired at multiple magnification levels and supports both binary and multiclass classification problems.


🧬 Dataset Structure

The dataset is organized into two main classification tasks:

1️⃣ Binary Classification

Objective: Distinguish between benign and malignant breast tissue samples. classificacao_binaria/ ├── 40X/ │ ├── benign/ │ └── malignant/ ├── 100X/ ├── 200X/ └── 400X/ Classes:

  • benign
  • malignant

2️⃣ Multiclass Classification

Objective: Classify breast tissue into specific histopathological tumor subtypes. classificacao_multiclasse/ ├── 40X/ │ ├── adenosis │ ├── ductal_carcinoma │ ├── fibroadenoma │ ├── lobular_carcinoma │ ├── mucinous_carcinoma │ ├── papillary_carcinoma │ ├── phyllodes_tumor │ └── tubular_adenoma ├── 100X/ ├── 200X/ └── 400X/ Classes (8 total):

  • Adenosis
  • Ductal Carcinoma
  • Fibroadenoma
  • Lobular Carcinoma
  • Mucinous Carcinoma
  • Papillary Carcinoma
  • Phyllodes Tumor
  • Tubular Adenoma

🔬 Magnification Levels

Images are provided at four different magnifications:

  • 40X – Low magnification (global tissue structure)
  • 100X – Intermediate magnification
  • 200X – Higher cellular detail
  • 400X – Fine-grained cellular morphology

This allows analysis of how magnification impacts visual features and model performance.


🖼️ Sample Visualization

To facilitate qualitative inspection, this repository includes a visualization script that:

  • Randomly selects representative images from each class
  • Covers both binary and multiclass tasks
  • Displays samples across different magnification levels

This step is crucial for:

  • Understanding intra-class variability
  • Identifying visual differences between classes
  • Verifying dataset integrity before training models

🎯 Purpose of This Repository

This repository focuses only on dataset understanding, including:

  • Folder hierarchy documentation
  • Class definitions
  • Magnification levels
  • Sample image visualization

⚠️ No model training is performed here.
Future repositories may build upon this dataset for classification, segmentation, or representation learning tasks.


🛠️ Tools & Libraries

  • Python
  • OS (file system handling)
  • Pillow (PIL)
  • Matplotlib

📎 Notes

  • Images are organized in a format compatible with most deep learning frameworks (e.g., PyTorch, TensorFlow).
  • The dataset is suitable for benchmarking binary vs. multiclass classification approaches.

📬 Next Steps

Possible extensions include:

About

Exploratory analysis and visualization of a breast cancer histopathology image dataset, including binary and multiclass classification structures across multiple magnification levels.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors