Parameter-Efficient Fine-Tuning of BERT for binary sentiment classification using QLoRA (Quantized Low-Rank Adaptation) β achieving competitive accuracy while reducing trainable parameters by ~99% and GPU memory by ~70% compared to full fine-tuning.
- Overview
- Workflow
- Project Structure
- Key Techniques
- Dataset
- Model Architecture
- Training Configuration
- Results
- Efficiency Analysis
- Getting Started
- W&B Integration (Optional)
- License
This project demonstrates how QLoRA enables fine-tuning of large pre-trained language models on consumer-grade hardware. By combining 4-bit NF4 quantization (via bitsandbytes) with Low-Rank Adaptation (via peft), we fine-tune bert-base-uncased on the IMDb sentiment dataset while:
- Training only ~0.5% of total parameters (LoRA adapters on Q & V projections)
- Reducing GPU memory footprint by ~70% vs full fp32 fine-tuning
- Using the 8-bit paged AdamW optimizer for further memory savings
- Optionally logging experiments to Weights & Biases (works without it too)
flowchart TD
A([π Start]) --> B[Environment Setup\nInstall dependencies\ntransformers Β· peft Β· bitsandbytes]
B --> C[Data Preparation\nLoad IMDb 20k dataset\nfrom HuggingFace Hub]
C --> D[Tokenization\nbert-base-uncased\nmax_length=256]
D --> E[Train / Val / Test Split\n80% Β· 10% Β· 10%]
E --> F[4-bit Quantization Config\nNF4 Β· fp16 compute\nDouble Quantization]
F --> G[Load Base Model\nbert-base-uncased\nQuantized via BitsAndBytes]
G --> H[Prepare for k-bit Training\nFreeze base layers\nCast norms to fp32]
H --> I[Apply QLoRA Adapters\nLoRA rank r=16 Β· alpha=32\nTarget: query & value projections]
I --> J[Training Configuration\nEpochs=3 Β· LR=2e-4 Β· fp16\npaged_adamw_8bit Β· cosine LR]
J --> K[W&B Logging\nOptional Β· skipped if unavailable]
K --> L[Fine-Tuning\nHuggingFace Trainer\nEarly Stopping patience=2]
L --> M{Epoch Complete?}
M -- No --> L
M -- Yes --> N[Evaluation\nTest Set: Accuracy Β· F1\nConfusion Matrix]
N --> O[Efficiency Analysis\nParameter reduction\nMemory savings vs Full FT]
O --> P[Visualizations saved to Data/\nTraining curves Β· Comparison charts]
P --> Q([β
Done])
style A fill:#4CAF50,color:#fff
style Q fill:#4CAF50,color:#fff
style F fill:#2196F3,color:#fff
style I fill:#9C27B0,color:#fff
style K fill:#9E9E9E,color:#fff
style L fill:#FF9800,color:#fff
style N fill:#00BCD4,color:#fff
The full
.mmdsource is atFlow/workflow.mmd.
FineTuning/
βββ QLoRA_BERT_TextClassification.ipynb # Main notebook
βββ Flow/
β βββ workflow.mmd # Mermaid workflow diagram source
βββ Data/ # Generated plots (auto-created at runtime)
β βββ dataset_stats.png
β βββ confusion_matrix.png
β βββ training_curves.png
β βββ efficiency_comparison.png
β βββ lora_rank_sensitivity.png
βββ requirements.txt # Python dependencies
βββ .env.example # Environment variable template
βββ .gitignore
βββ LICENSE
| Technique | Detail |
|---|---|
| 4-bit NF4 Quantization | Weights stored in 4-bit Normal Float format via bitsandbytes |
| Double Quantization | Quantizes the quantization constants for extra memory savings |
| LoRA Adapters | Low-rank matrices injected into Q & V attention projections |
| fp16 Compute | Mixed precision training for speed without precision loss |
| Gradient Checkpointing | Trades compute for memory during backprop |
| Paged AdamW 8-bit | 8-bit optimizer with CPU offloading for optimizer states |
| Cosine LR Schedule | Smooth learning rate decay over training |
| Early Stopping | Stops training if validation accuracy doesn't improve for 2 epochs |
- Source:
dipanjanS/imdb_sentiment_finetune_dataset20kon HuggingFace Hub - Task: Binary sentiment classification (Negative / Positive)
- Size: 20,000 IMDb movie reviews
- Splits: 80% train Β· 10% validation Β· 10% test (carved from the original train split)
- Tokenizer:
bert-base-uncasedwithmax_length=256, dynamic padding viaDataCollatorWithPadding
Base model: bert-base-uncased (~110M parameters)
QLoRA configuration:
LoraConfig(
task_type=TaskType.SEQ_CLS,
r=16, # LoRA rank
lora_alpha=32, # Scaling factor
target_modules=["query", "value"], # Attention projections
lora_dropout=0.1,
bias="none",
)BitsAndBytes config:
BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
)| Hyperparameter | Value |
|---|---|
| Epochs | 3 |
| Per-device batch size | 16 |
| Gradient accumulation steps | 2 (effective batch = 32) |
| Learning rate | 2e-4 |
| LR scheduler | Cosine |
| Warmup ratio | 10% |
| Weight decay | 0.01 |
| Optimizer | paged_adamw_8bit |
| Precision | fp16 |
| Gradient checkpointing | β |
| Early stopping patience | 2 epochs |
| Best model metric | Validation Accuracy |
Evaluated on the held-out test set using Accuracy and weighted F1 Score.
| Metric | Score |
|---|---|
| Test Accuracy | ~93β94% |
| Weighted F1 | ~0.93β0.94 |
Exact values depend on hardware and random seed.
All plots are saved to the Data/ folder at runtime:
dataset_stats.pngβ label distribution & token length histogramconfusion_matrix.pngβ test set confusion matrixtraining_curves.pngβ train/val loss and validation accuracy over stepsefficiency_comparison.pngβ parameter & memory comparison chartslora_rank_sensitivity.pngβ LoRA rank vs trainable parameter count
| Metric | Full Fine-Tuning | QLoRA |
|---|---|---|
| Trainable parameters | ~110M (100%) | ~0.5M (~0.5%) |
| Model weights memory | ~440 MB (fp32) | ~55 MB (4-bit) |
| Estimated total GPU memory | ~1.7 GB | ~0.5 GB |
| Memory savings | β | ~70% |
| Memory reduction factor | β | ~3β4Γ |
These are theoretical estimates. Actual GPU usage includes activations, optimizer states, and framework overhead.
- Python 3.10+
- CUDA-capable GPU (recommended: 8GB+ VRAM)
- CUDA 11.8 or 12.x
git clone https://github.com/SANJAI-s0/bert-qlora-text-classification.git
cd bert-qlora-text-classification
pip install -r requirements.txtInstall PyTorch with the correct CUDA version for your system first:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
Open and run the notebook cell by cell:
jupyter notebook QLoRA_BERT_TextClassification.ipynbOr use Google Colab / Kaggle (free GPU tier works with this setup).
W&B logging is fully optional β the notebook trains and evaluates without it. If W&B is unavailable or the login fails, it silently falls back to local-only logging.
To enable W&B:
- Create a free account at wandb.ai
- Copy
.env.exampleto.envand add your API key from wandb.ai/authorize - The notebook auto-detects the key and logs metrics, curves, and the confusion matrix
cp .env.example .env
# Edit .env and set WANDB_API_KEY=your_key_hereThis project is licensed under the MIT License.