Deep Geometric Network (DGN) for Gesture Recognition

Geometry-First Visual Intelligence: using differential geometry instead of raw pixels for hand gesture recognition.

Paper: "Geometry-First Visual Intelligence: Deep Geometric Networks and Quantum Geometric Networks for Gesture Recognition" Author: Amit Rana — Independent Researcher, Santa Clara, CA ORCID: 0009-0008-5998-6560 Preprint: https://doi.org/10.5281/zenodo.19842048

What This Is

Standard gesture recognition feeds raw video frames into deep neural networks — the system learns pixel statistics with no geometric understanding. DGN takes the opposite approach:

Before any neural network sees the data, we compute explicit differential-geometric properties directly from hand landmarks:

Feature Group	Dimensions	What It Captures
Ricci curvature of finger trajectories	32-D	How sharply each joint bends over time
Bézier motion arc coefficients	32-D	The shape of the path each fingertip traces
Joint angular velocities	32-D	How fast each joint rotates
Skeletal topological weights	32-D	Connectivity structure of the hand graph
Total	128-D	Per-frame geometric snapshot

These 128 scalar values are not learned — they follow from mathematical definitions. Every dimension is named and interpretable. A logistic regression on these features alone reaches 52.61% across 27 gesture classes (~15× above random chance), with no deep learning at all.

Adding a temporal encoder (BiLSTM or Mamba SSM) over 36-frame sequences reaches 65.77% — competitive with skeleton-based state-of-the-art while operating on a representation 5,000× more compact than raw video.

Key Results

Static classifiers (single-frame, mean-pooled features)

Model	Validation Accuracy
Logistic Regression	52.61%
Neural Network Baseline	61.70%
DGN — geometric MLP	61.64%

Temporal classifiers (36-frame sequences)

Model	Validation Accuracy
DGN + Transformer	60.28%
DGN + bidirectional LSTM	65.69%
DGN + Mamba (best)	65.77%

Dataset: Jester — 148,092 videos, 27 gesture classes.

Repository Structure

dgn-gesture-recognition/
├── training/
│   ├── colab_temporal_feature_extraction.ipynb   ← Feature extraction pipeline (run on Colab)
│   ├── run_static_classifiers.py                 ← Table I: logistic regression + MLP baselines
│   ├── run_flattened_mlp.py                      ← Flat MLP baseline
│   ├── augment_flow_features.py                  ← 192-D flow-augmented features
│   └── eval_temporal_checkpoints.py              ← Checkpoint evaluation utility
├── results/
│   ├── static_results_verified.json              ← Table I numbers (verified)
│   └── flow_results.json                         ← Flow feature ablation results
└── paper/
    ├── generate_paper.py                          ← Generates IEEE-formatted .docx
    ├── generate_figures.py                        ← Generates all 4 paper figures
    ├── sections/                                  ← Paper section text files
    └── figures/                                   ← Pre-generated PNG figures

Not included in this repo (by design):

Raw Jester video data (download from the 20BN website)
Pre-extracted feature NPZ files (~700 MB — too large for GitHub)
Trained model checkpoints
Quantum extension (QGN) — described in the paper, not released here

Getting Started

1. Install dependencies

pip install mediapipe opencv-python numpy scipy scikit-learn torch python-docx matplotlib

2. Extract geometric features (Google Colab recommended — GPU speeds up MediaPipe)

Open training/colab_temporal_feature_extraction.ipynb in Google Colab.

Mount your Google Drive and point it at the Jester dataset frames
Outputs: temporal_ricci_bezier_instance_0.npz — shape (148092, 36, 128)

3. Run static classifiers

python training/run_static_classifiers.py

4. Generate paper figures

cd paper && python generate_figures.py

5. Generate paper document

cd paper && python generate_paper.py

The Geometry Pipeline

Raw Video Frames
      ↓
MediaPipe Hand Landmarks  (21 keypoints × 3D)
      ↓
┌──────────────────────────────────────────┐
│  Differential Geometry Extraction        │
│  • Ricci curvature (finger trajectories) │
│  • Bézier arc parameterization           │
│  • Angular velocities (joint rotations)  │
│  • Topological connectivity weights      │
└──────────────────────────────────────────┘
      ↓
128-D Geometric Feature Vector (per frame)
      ↓
36-Frame Temporal Sequence
      ↓
BiLSTM / Mamba SSM Encoder
      ↓
27-Class Gesture Output

Every feature has a name and a mathematical definition. Unlike convolutional embeddings, these representations can be inspected, composed with symbolic rules, and reasoned about directly — making DGN a natural front-end for Neuro-Symbolic AI systems.

Why Geometry First?

Interpretability — "Dimension 7 is the Ricci curvature of the index fingertip trajectory" is a statement that can be verified and composed. A 512-D CNN embedding cannot say the same.
Compactness — 128 scalars vs ~1M+ pixels per frame. 5,000× smaller. Runs on CPU.
Quantum-readiness — Geometric scalars (angles, curvatures) map directly to quantum rotation gate parameters (θ, φ on the Bloch sphere). This is the native input format for variational quantum circuits — no forced encoding, no information loss. Explored further in the companion paper.

Citation

Preprint available on Zenodo. Citation will be updated upon journal publication.

@article{rana2026dgn,
  title   = {Geometry-First Visual Intelligence: Deep Geometric Networks and
             Quantum Geometric Networks for Gesture Recognition},
  author  = {Rana, Amit},
  year    = {2026},
  doi     = {10.5281/zenodo.19842048},
  url     = {https://doi.org/10.5281/zenodo.19842048},
  note    = {Preprint, Zenodo}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
paper		paper
results		results
training		training
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Geometric Network (DGN) for Gesture Recognition

What This Is

Key Results

Static classifiers (single-frame, mean-pooled features)

Temporal classifiers (36-frame sequences)

Repository Structure

Getting Started

1. Install dependencies

2. Extract geometric features (Google Colab recommended — GPU speeds up MediaPipe)

3. Run static classifiers

4. Generate paper figures

5. Generate paper document

The Geometry Pipeline

Why Geometry First?

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Deep Geometric Network (DGN) for Gesture Recognition

What This Is

Key Results

Static classifiers (single-frame, mean-pooled features)

Temporal classifiers (36-frame sequences)

Repository Structure

Getting Started

1. Install dependencies

2. Extract geometric features (Google Colab recommended — GPU speeds up MediaPipe)

3. Run static classifiers

4. Generate paper figures

5. Generate paper document

The Geometry Pipeline

Why Geometry First?

Citation

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages