Skip to content

wprudencio/awesome-ai-cpu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 

Repository files navigation

🧠 Awesome AI Projects for CPU

Awesome MIT License PRs Welcome

A curated list of open source AI projects that run well on CPU, no GPU required.
Perfect for makers, indie developers, and local AI experiments.

Last updated: June 2026 — featuring projects from 2025–2026


📑 Table of Contents


🧠 Language Models (Chatbots / Text)

  • Ollama — "Docker for local LLMs." Run Llama, Mistral, Gemma, DeepSeek with one command. Excellent CPU support, easy CLI, model library.
  • GPT4All— Simple interface to run LLMs locally with several CPU-optimized models included.
  • LM Studio — Beautiful GUI app for local LLMs, automatically supports CPU execution.
  • Text Generation WebUI — Powerful web interface for LLMs with extensive CPU mode options.
  • Jan.ai — ChatGPT-like interface that runs 100% offline with a clean, modern UI.
  • LocalAI— OpenAI-compatible API for running local models. Drop-in replacement that also supports vision, voice, image gen — no GPU required.
  • Kobold.cpp — Lightweight inference engine for GGUF models with built-in web UI.
  • Open WebUI — Self-hosted, offline ChatGPT-style interface for Ollama, with RAG, web search, and multi-user support.

⚡ Inference Engines & Runtimes

  • llama.cpp— The gold standard for CPU-optimized LLM inference in C/C++. Powers Ollama, LM Studio, and most local LLM tools.
  • llamafile— Mozilla's single-file LLM executable. Distribute and run LLMs as a standalone binary — no installation, no dependencies, no GPU required. Built on llama.cpp, supports CPU inference out of the box.
  • mistral.rs — Fast, flexible LLM inference engine in Rust, built on Candle. Run any Hugging Face model or GGUF file with zero config — prebuilt CPU binaries for Linux/Windows and CPU Docker images mean no GPU or CUDA toolkit needed. Smart in-situ quantization (GGUF, GPTQ, AWQ) and hardware-aware tuning optimize for your CPU.
  • BitNet— Microsoft's official inference framework for 1-bit LLMs. Extremely efficient on CPU.
  • eLLM— Rust-based inference engine that claims to run LLMs faster on CPU than on GPU through aggressive optimization.
  • Krasis— Hybrid LLM runtime focusing on efficient execution of larger models on consumer hardware (CPU + limited VRAM).
  • IPEX-LLM— Accelerate local LLM inference on Intel CPUs, iGPUs, and NPUs. Seamless integration with llama.cpp, Ollama, HF Transformers.
  • ONNX Runtime — Cross-platform ML inference acceleration with CPU-optimized execution providers (OpenVINO, XNNPACK, CoreML).
  • OpenVINO — Intel's optimization toolkit for high-performance CPU inference across vision, language, and audio models.
  • LLM-D — Achieves state-of-the-art inference performance with innovative architecture design.
  • CTranslate2 — Fast inference engine for Transformer models. Powers Faster Whisper, optimized for CPU with Intel MKL and ONNX.
  • Trillim— Local AI stack for CPUs: CLI, Python SDK, and FastAPI server for BitNet and Bonsai (1-bit/ternary) bundles. Includes speech-to-text, text-to-speech, and image generation support.

🖼️ Image Generation and Editing

  • Stable Diffusion (CPU mode) — Image generation model that works on CPU (slower but functional).
  • Diffusion Bee — User-friendly GUI for Stable Diffusion on macOS, fully CPU compatible.
  • InvokeAI — Professional Stable Diffusion interface with excellent CPU support.
  • ComfyUI — Node-based UI for image AI pipelines, supports CPU workflow.
  • Fooocus — Simplified Stable Diffusion, easier to use than ComfyUI.
  • Real-ESRGAN — AI image upscaler, fast on CPU with great results.
  • GFPGAN — Restores and improves old/blurry faces in photos, runs efficiently on CPU.
  • Upscayl — Cross-platform AI image upscaler with simple GUI. Works great on CPU.

🎤 Voice and Audio

  • Whisper.cpp— Highly optimized Whisper (OpenAI) for CPU speech recognition. The fastest Whisper implementation for CPU.
  • Faster Whisper — Up to 4x faster than original Whisper using CTranslate2. Excellent CPU performance.
  • Piper TTS— Fast, local text-to-speech with small voice models (5-20MB). Note: archived but still functional.
  • Sherpa-ONNX— Comprehensive speech processing toolkit powered by ONNX Runtime. Speech-to-text, TTS, speaker diarization, VAD, keyword spotting — all on CPU. Cross-platform (x86, ARM, RISC-V, Android, iOS, Raspberry Pi).
  • Supertonic— Lightning-fast, on-device, multilingual TTS running natively via ONNX. Python, JS, Rust, Swift bindings.
  • MOSS-TTS-Nano— Ultra-compact (0.1B params) multilingual TTS from OpenMOSS. Runs realtime on a 4-core CPU, supports Chinese + English + more, with ONNX CPU inference and voice cloning. Apache-2.0.
  • Coqui TTS— Open source text-to-speech engine with many voices and languages. CPU efficient.
  • CosyVoice — Multi-lingual large voice generation model from FunAudioLLM. Supports voice cloning.
  • Amphion— Open-MMLab's toolkit for Audio, Music, and Speech Generation. Reproducible research with CPU mode.
  • Vosk — Offline speech recognition, very lightweight (50MB models).
  • Bark (Suno)— Realistic voice generation from text with CPU mode available.
  • Qwen3-TTS— Pure C inference engine for Qwen3-TTS. No Python, no PyTorch — just C and BLAS. Supports 0.6B/1.7B models.
  • RVC (Voice Conversion) — Real-time voice conversion, CPU compatible.
  • Demucs — Separate music into vocals/instruments (CPU mode available).
  • MusicGen — Generate music from text descriptions (CPU mode supported).
  • MusicGPT — Generate music based on natural language prompts. Runs locally on CPU.
  • acestep.cpp— Local AI music generation server with browser UI, powered by GGML. Describe a song + optional lyrics and get stereo 48kHz audio. Runs on CPU via BLAS-accelerated GGML backend with a dedicated CPU build script.
  • FunMusic — Fundamental toolkit for music generation, part of the FunAudioLLM ecosystem.

👁️ Computer Vision

  • OpenCV + DNN — Industry-standard vision framework with neural networks, fully CPU capable.
  • Ultralytics YOLO— YOLOv8, v9, v10+ with --device cpu. Real-time object detection on CPU.
  • MediaPipe — Google's library for hand, face, pose, and body tracking on CPU.
  • FaceX— Full face stack running entirely in the browser via WebAssembly. Detection, 576-point 3D mesh, recognition, anti-spoof. Zero server needed.
  • ONNX Models — Collection of pre-trained, state-of-the-art ONNX models for vision, text, and audio.

🔬 Small Models (Perfect for CPU)

💬 Language Models (< 7B parameters)

🖼️ Image Models

  • Stable Diffusion 1.5 — Classic version, lighter than v2/XL.
  • TinySD — Distilled version, 50% smaller than SD 1.5.
  • SSD-1B — 1B parameter SD model, 60% faster than SD 1.5.

🎤 Audio Models

👁️ Vision Models

🧠 Embedding Models (for RAG/Search)


🤖 AI Assistants & Agents

  • Cline— Autonomous coding agent as an SDK, IDE extension, or CLI assistant. Works with local LLMs via Ollama/LM Studio.
  • smolagents— HuggingFace's barebones library for agents that think in code. Supports local transformers and Ollama models, runs entirely on CPU.
  • Open Interpreter — Code-executing AI assistant (works with local LLMs).
  • AutoGPT — Autonomous AI agent (supports local models).
  • CrewAI— Multi-agent orchestration framework. Deploy autonomous agents that collaborate on complex tasks.
  • LangGraph — Stateful, graph-based agent orchestration framework from LangChain.
  • Dify— Production-ready platform for agentic workflow development. Visual builder + built-in RAG.
  • Flowise— Drag-and-drop visual tool to build LLM apps and AI agents. Self-host with Ollama.
  • RAGFlow— Leading open-source RAG engine with agent capabilities. Deep document understanding.
  • AnythingLLM — Chat with your documents (PDFs, text), supports local models.
  • PrivateGPT — Ask questions to your documents 100% offline.
  • CrewAI— Multi-agent orchestration for role-playing AI teams.

💻 Coding Assistants

  • Cline— Autonomous coding agent. VS Code extension + CLI + SDK. Supports Ollama, LM Studio, and any OpenAI-compatible local backend.
  • Continue.dev — Open-source VS Code / JetBrains copilot. Use local models (Qwen 2.5 Coder, DeepSeek Coder) for autocomplete and chat.
  • Aider — Terminal-based AI pair programming with git integration. Works with local LLMs.
  • Tabby — Self-hosted GitHub Copilot alternative. Code completion on CPU with StarCoder models.
  • OpenCode — The most-starred open-source AI coding agent of 2026. Designed for fast local development workflows.
  • Crush — Terminal-based agentic coding assistant from Charm. Auto-discovers local models from Ollama, LM Studio, litellm, and any OpenAI-compatible backend — run it fully offline on CPU. LSP-enhanced, MCP-extensible, cross-platform (macOS, Linux, Windows, BSD).

📚 Document & Knowledge (RAG)

  • RAGFlow— Deep document understanding RAG engine. PDF, DOCX, Excel — with agentic retrieval.
  • Dify— Full-featured LLM app platform with built-in RAG pipeline, knowledge base, and agentic workflow.
  • AnythingLLM — All-in-one desktop app for document-grounded conversations and private knowledge bases.
  • PrivateGPT — Offline Q&A over your documents (PDFs, text, code).
  • MinerU — Transforms complex documents (PDF, HTML, scans) into clean Markdown/JSON for RAG pipelines.
  • Docling— IBM's document understanding library. Parses PDF, DOCX, PPTX, images and more into structured Markdown/JSON with layout preservation. Runs fully on CPU via ONNX Runtime with dedicated CPU-only installation.
  • VelociRAG — Lightning-fast RAG for AI agents. ONNX-powered, 4-layer fusion, MCP server. No PyTorch needed.
  • RAG-Anything — All-in-one RAG framework with multiple retrieval strategies.
  • LightRAG— Graph-based retrieval-augmented generation system. Indexes text into entity-relation graphs for efficient retrieval. Uses lightweight local embedding models and works with any local LLM backend (Ollama, llama.cpp). [EMNLP 2025].
  • LlamaIndex — Data framework for connecting LLMs to external data sources (APIs, databases, documents).

🔄 Agentic Workflows & Platforms

  • Dify— Production-ready platform for building AI agents and workflows. Visual pipeline builder, RAG, MCP support, multi-model.
  • Flowise— Low-code/no-code platform to build LLM apps, chatbots, and agents visually.
  • n8n — Advanced workflow automation with native AI capabilities and MCP nodes.
  • Langflow — Visual framework for building multi-agent and RAG applications.
  • Haystack — End-to-end NLP framework for building search, QA, and RAG pipelines.

📊 Embeddings & Vector Databases

  • Chroma — Lightweight, embedded vector database. Runs entirely on CPU, perfect for local RAG.
  • Weaviate — Open-source vector search engine with hybrid search (vector + keyword). Runs on CPU.
  • Qdrant — High-performance vector database with rich filtering. CPU-friendly for moderate scale.
  • FAISS — Meta's library for efficient similarity search and dense vector clustering. CPU-optimized.
  • Voyager — Spotify's approximate nearest neighbor search library. Lightweight and fast on CPU.
  • zvec — Alibaba's lightweight, in-process vector database. Blazing-fast similarity search with dense + sparse vectors, full-text search, and hybrid retrieval. Embedded library — no servers, no config, runs on CPU anywhere your code runs. Python, Node.js, Go, Rust SDKs.

💡 Creative AI & Miscellaneous

  • Amphion— Audio, music, and speech generation toolkit. TTS, SVC, music gen — all on CPU.
  • MusicGen — Generate music from text descriptions (CPU mode supported).
  • FunMusic — Music generation toolkit from FunAudioLLM.
  • Diarize— Speaker diarization — "who spoke when?" CPU-only, no API keys, 8x faster than real-time.
  • llama.cpp — CPU-optimized inference for LLaMA and compatible models.
  • Roop — One-click face swap tool (CPU compatible).

🛠️ Development Tools

  • Transformers (Hugging Face) — Load and run any model with CPU backend (device="cpu").
  • Transformers.js— HuggingFace's Transformers for the browser. Run NLP, vision, and audio models directly in JavaScript — no server, no GPU. Powered by ONNX Runtime WebAssembly.
  • ONNX Runtime — Accelerate ML inference on CPU with optimizations (XNNPACK, OpenVINO, CoreML).
  • OpenVINO — Intel's optimization toolkit for CPU inference across any model.
  • BitNet— Official framework for 1-bit LLM inference. Revolutionary efficiency.
  • LMDeploy — Model compression and deployment toolkit for efficient CPU serving.
  • CTranslate2 — Fast transformer inference on CPU. Powers Faster Whisper and many production systems.
  • MLX — Apple's ML framework optimized for Apple Silicon (M-series CPUs). Excellent for local inference.
  • Candle— HuggingFace's minimalist ML framework for Rust with CPU-first design. Run LLMs, vision models, and more locally with zero GPU dependency.
  • llmfit— Rust CLI tool that detects your hardware and finds the best LLMs for your RAM, CPU, and GPU. One command to right-size models — scores quality, speed, fit, and context for hundreds of models. Supports Ollama, llama.cpp, MLX, LM Studio backends.

⚙️ Tips for Running on CPU

🎯 General Optimization

  • 🔧 Use smaller model versions (tiny, small, mini, nano)
  • ⚡ Apply quantization (Q4, Q5, Q8) to reduce RAM usage by 50-75%
  • 🧩 Use optimized runtimes: GGUF/GGML, ONNX Runtime, or OpenVINO
  • 🚀 Enable multi-threading to utilize all CPU cores
  • 📉 Reduce resolution/steps in image generation for faster results
  • 🔄 Use batch size 1 for CPU inference (larger batches don't help)

📊 Model Size Guide

Size Parameters RAM Needed Speed on CPU Use Case
Tiny < 1B 2-4GB ⚡⚡⚡⚡ Testing, edge devices
Small 1-3B 4-8GB ⚡⚡⚡ Daily use, chatbots
Medium 3-7B 8-16GB ⚡⚡ Quality balance
Large 7B+ 16GB+ Best quality (slower)

🔄 Quantization Formats

  • GGUF (Q4_K_M) — Best for llama.cpp/Ollama (4-bit), excellent quality/size ratio
  • GPTQ — Good compression with decent inference speed
  • AWQ — Better quality than GPTQ at the same model size
  • ONNX — Cross-platform optimization, works with many frameworks
  • BitNet (1-bit) — Next-gen extreme quantization, 90% size reduction

💡 2026 Tips & Trends

  • 1-bit LLMs are here: Microsoft's BitNet delivers surprisingly good quality at 1-bit precision. Runs 10x faster on CPU.
  • MoE models save RAM: Mixture-of-Experts (DeepSeek, Qwen3-MoE) activate only a fraction of parameters per token.
  • Apple Silicon is a CPU powerhouse: Use MLX or llama.cpp Metal backend on M1/M2/M3/M4 Macs for near-GPU speeds.
  • Hybrid CPU/GPU runtimes: Tools like Krasis automatically split models across available hardware.
  • WebAssembly AI: Run models directly in the browser (FaceX, Transformers.js) — zero install.

🚀 Quick Start Example

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Run a small model
ollama run phi3:mini

# For transcription with Whisper
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp && make
./main -m models/ggml-base.bin -f audio.wav

# For local coding assistant (Continue)
# Install the VS Code extension, point it to Ollama/local model

🤝 Contributing

Contributions are welcome! Please read the contribution guidelines first.

  • Add new projects that work well on CPU
  • Fix broken links or outdated information
  • Improve documentation and examples
  • Keep star counts and descriptions up to date

📜 License

CC0

This list is licensed under CC0 1.0 Universal and follows the Awesome format.


⭐ Star History

If you find this list helpful, please consider giving it a star on GitHub!


Made with ❤️ by the community | Last updated: June 2026

About

A curated list of open-source AI projects that run efficiently on CPU, no GPU required.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors