🤖FFPA: Extends FlashAttention-2 via Split-D for large headdims, 1.5x~3×↑🎉 vs SDPA, up to 430T🎉 on H200.
-
Updated
Jun 12, 2026 - Python
🤖FFPA: Extends FlashAttention-2 via Split-D for large headdims, 1.5x~3×↑🎉 vs SDPA, up to 430T🎉 on H200.
Gemma 4 Agent Skills
Autonomous AI Agent Skill for Google AI Edge Gallery - self-evolving, 100% offline
The first native Swift inference engine for Gemma 4 on Apple Silicon / iOS
Reproducible recipe: serve abliterated Gemma-4-12B (gemma4_unified) at 50-118 tok/s on no-NVLink Blackwell (SM120) via vLLM nightly + ModelOpt FP8/NVFP4 + MTP spec-decode.
Run Gemma 4 locally with Gemma-4 Omni-Desktop. A free, offline AI agent optimized for RTX 30xx & 8GB Macs. Features native vision, voice & OpenClaw support.
Gemma 4 31B Abliterated — quality-preserving guardrail removal for Google's most capable open model. Apache 2.0. Runs on Apple Silicon via MLX.
PDF redaction that runs entirely on your device. Five detection layers including Gemma 4 contextual PHI. No uploads, no servers, no leaks.
Domain-specialized Gemma 2 27B + Gemma 4 31B for SEC filings — fine-tuned on TPU v6e-8 with PyTorch/XLA FSDPv2, plus a Vertex AI Vector Search RAG demo (69 tickers × 381 filings). Same LoRA recipe, +3.5% / +5.8% BERTScore F1.
Run Google's Gemma 4 entirely in your browser via WebGPU. Multimodal chat, E2B vs E4B arena, ONNX conversion toolkit.
On-device LLM client for Android. Fork of Google AI Edge Gallery
HarmonyOS NEXT native Gemma 4 MNN on-device LLM chat demo
Using Google' Gemma 4 Open weights for predicting and analyzing dermatology images, diagnosis and medication
Ash — offline survival assistant for iOS. Gemma 4 E2B/E4B fully on-device (text · image · voice) with RAG-grounded answers over 56 emergency-response packs. Built for the Kaggle Gemma 4 Good Hackathon.
Offline subtitle translator with 2026 LLMs (Gemma 4, Qwen 3.5, Hunyuan-MT, Llama 4 Scout) via Ollama. Web UI + CLI, 17 languages, video subtitle extraction, persistent Translation Memory, LLM-as-judge quality estimation, genre-aware prompts. No API keys, no cloud.
Distributed systems for automated proposal generation. Orchestrates LLM agents for requirement extraction, semantic inventory matching, and competitive pricing analysis
An exploration or the contradictions of Google Family of LLMs
AI-powered offline health assistant built with React Native. On-device Gemma 4 4B via llama.rn for private, internet-free medical Q&A. Multi-agent AI (drug info, first aid, symptom triage, nearby doctors). Firebase sync, biometric auth, medication insights.
Use vision-language models to perform alignment and registration of histological brain sections to any BrainGlobe atlas in any orientation plane.
Add a description, image, and links to the gemma-4 topic page so that developers can more easily learn about it.
To associate your repository with the gemma-4 topic, visit your repo's landing page and select "manage topics."