Skip to content

Ranaam21/dgn-gesture-recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Geometric Network (DGN) for Gesture Recognition

Geometry-First Visual Intelligence: using differential geometry instead of raw pixels for hand gesture recognition.

DOI

Paper: "Geometry-First Visual Intelligence: Deep Geometric Networks and Quantum Geometric Networks for Gesture Recognition" Author: Amit Rana — Independent Researcher, Santa Clara, CA ORCID: 0009-0008-5998-6560 Preprint: https://doi.org/10.5281/zenodo.19842048


What This Is

Standard gesture recognition feeds raw video frames into deep neural networks — the system learns pixel statistics with no geometric understanding. DGN takes the opposite approach:

Before any neural network sees the data, we compute explicit differential-geometric properties directly from hand landmarks:

Feature Group Dimensions What It Captures
Ricci curvature of finger trajectories 32-D How sharply each joint bends over time
Bézier motion arc coefficients 32-D The shape of the path each fingertip traces
Joint angular velocities 32-D How fast each joint rotates
Skeletal topological weights 32-D Connectivity structure of the hand graph
Total 128-D Per-frame geometric snapshot

These 128 scalar values are not learned — they follow from mathematical definitions. Every dimension is named and interpretable. A logistic regression on these features alone reaches 52.61% across 27 gesture classes (~15× above random chance), with no deep learning at all.

Adding a temporal encoder (BiLSTM or Mamba SSM) over 36-frame sequences reaches 65.77% — competitive with skeleton-based state-of-the-art while operating on a representation 5,000× more compact than raw video.


Key Results

Static classifiers (single-frame, mean-pooled features)

Model Validation Accuracy
Logistic Regression 52.61%
Neural Network Baseline 61.70%
DGN — geometric MLP 61.64%

Temporal classifiers (36-frame sequences)

Model Validation Accuracy
DGN + Transformer 60.28%
DGN + bidirectional LSTM 65.69%
DGN + Mamba (best) 65.77%

Dataset: Jester — 148,092 videos, 27 gesture classes.


Repository Structure

dgn-gesture-recognition/
├── training/
│   ├── colab_temporal_feature_extraction.ipynb   ← Feature extraction pipeline (run on Colab)
│   ├── run_static_classifiers.py                 ← Table I: logistic regression + MLP baselines
│   ├── run_flattened_mlp.py                      ← Flat MLP baseline
│   ├── augment_flow_features.py                  ← 192-D flow-augmented features
│   └── eval_temporal_checkpoints.py              ← Checkpoint evaluation utility
├── results/
│   ├── static_results_verified.json              ← Table I numbers (verified)
│   └── flow_results.json                         ← Flow feature ablation results
└── paper/
    ├── generate_paper.py                          ← Generates IEEE-formatted .docx
    ├── generate_figures.py                        ← Generates all 4 paper figures
    ├── sections/                                  ← Paper section text files
    └── figures/                                   ← Pre-generated PNG figures

Not included in this repo (by design):

  • Raw Jester video data (download from the 20BN website)
  • Pre-extracted feature NPZ files (~700 MB — too large for GitHub)
  • Trained model checkpoints
  • Quantum extension (QGN) — described in the paper, not released here

Getting Started

1. Install dependencies

pip install mediapipe opencv-python numpy scipy scikit-learn torch python-docx matplotlib

2. Extract geometric features (Google Colab recommended — GPU speeds up MediaPipe)

Open training/colab_temporal_feature_extraction.ipynb in Google Colab.

  • Mount your Google Drive and point it at the Jester dataset frames
  • Outputs: temporal_ricci_bezier_instance_0.npz — shape (148092, 36, 128)

3. Run static classifiers

python training/run_static_classifiers.py

4. Generate paper figures

cd paper && python generate_figures.py

5. Generate paper document

cd paper && python generate_paper.py

The Geometry Pipeline

Raw Video Frames
      ↓
MediaPipe Hand Landmarks  (21 keypoints × 3D)
      ↓
┌──────────────────────────────────────────┐
│  Differential Geometry Extraction        │
│  • Ricci curvature (finger trajectories) │
│  • Bézier arc parameterization           │
│  • Angular velocities (joint rotations)  │
│  • Topological connectivity weights      │
└──────────────────────────────────────────┘
      ↓
128-D Geometric Feature Vector (per frame)
      ↓
36-Frame Temporal Sequence
      ↓
BiLSTM / Mamba SSM Encoder
      ↓
27-Class Gesture Output

Every feature has a name and a mathematical definition. Unlike convolutional embeddings, these representations can be inspected, composed with symbolic rules, and reasoned about directly — making DGN a natural front-end for Neuro-Symbolic AI systems.


Why Geometry First?

  1. Interpretability — "Dimension 7 is the Ricci curvature of the index fingertip trajectory" is a statement that can be verified and composed. A 512-D CNN embedding cannot say the same.

  2. Compactness — 128 scalars vs ~1M+ pixels per frame. 5,000× smaller. Runs on CPU.

  3. Quantum-readiness — Geometric scalars (angles, curvatures) map directly to quantum rotation gate parameters (θ, φ on the Bloch sphere). This is the native input format for variational quantum circuits — no forced encoding, no information loss. Explored further in the companion paper.


Citation

Preprint available on Zenodo. Citation will be updated upon journal publication.

@article{rana2026dgn,
  title   = {Geometry-First Visual Intelligence: Deep Geometric Networks and
             Quantum Geometric Networks for Gesture Recognition},
  author  = {Rana, Amit},
  year    = {2026},
  doi     = {10.5281/zenodo.19842048},
  url     = {https://doi.org/10.5281/zenodo.19842048},
  note    = {Preprint, Zenodo}
}

License

Code released under the MIT License. The paper text and figures are copyright Amit Rana. All rights reserved.

About

Geometry-first gesture recognition using differential geometry (Ricci curvature, Bézier arcs, joint angles) from hand landmarks. DGN pipeline from the paper "Geometry-First Visual Intelligence".

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors