Skip to content

Overfitting in VideoMAE Model Fine-Tuning for Binary Classification on Home Camera Footage #129

@tgcandido

Description

@tgcandido

Description:
I'm fine-tuning a VideoMAE model for binary classification on home camera footage to distinguish between two actions. Here’s a summary of my setup and the issues I’m facing:

Dataset & Variations:
I have two primary datasets:

  • Small Dataset: ~120 clips for quicker iteration.
  • Full Dataset: ~3k clips.
    All videos are 6 seconds long, though I've also tested with 3-second clips.
    I've also created variations with blurred or blacked-out backgrounds to help with recognition.

Model & Configuration:
The model classifies actions using 16 uniformly sampled frames per video.
I’ve tried various base models, including small, base, large, and models fine-tuned on SSV2 and Kinect.
Hyperparameters tested:
Batch sizes of 2, 4, and 8.
Epochs ranging from 4 to 16.
Learning rate set to 5e-5.

I removed the RandomCrop transformation since it entirely removes the person from the video.

I'm using the Hugging Face Video Classification Colab Notebook as a starting point: Training Notebook.

Problem: Despite these variations, the model overfits immediately. I’ve also tested using the UCF101 dataset to rule out dataset-specific issues and got similar results to the Hugging Face VideoMAE colab, so the code seems fine.

Request: Any advice on addressing this overfitting issue would be greatly appreciated. Specifically, I'm looking for guidance on:

  • Additional hyperparameter adjustments.
  • Potential model architecture changes (if applicable).
  • Dataset augmentation techniques that might improve generalization.

Thank you for any help or insights you can provide!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions