A web application that takes any uploaded image containing text and seamlessly converts it into clean, structured Markdown using powerful Vision LLMs (Google Gemini 2.0 & Llama 4 Scout via Groq).
Built with Next.js, Vercel AI SDK, and Tailwind CSS, featuring a bold, Gumroad-inspired Neubrutalism user interface.
- Accurate Text Extraction: Recognizes paragraphs, heading hierarchies, bullet points, and code blocks directly from screenshots/images.
- Multiple AI Models: Switch between
Gemini 2.0 Flash(via Google) andLlama 4 Scout(via Groq) on the fly. - Neubrutalist UI: A fun, highly-tactile design featuring heavy black borders, hard block shadows, vibrant colors, and click-depth animations.
- Drag & Drop: Easily drop images directly into the browser.
- 1-Click Copy: Copy the generated markdown to your clipboard instantly.
Ensure you have the following installed on your machine:
git clone https://github.com/khalidkhankakar/image-to-md
cd image-to-mdpnpm installCreate a file named .env.local in the root of the project:
touch .env.localOpen it and add your API keys:
# Get this from Google AI Studio (https://aistudio.google.com/)
GOOGLE_GENERATIVE_AI_API_KEY=your_gemini_api_key_here
# Get this from Groq Cloud (https://console.groq.com/keys)
GROQ_API_KEY=your_groq_api_key_herepnpm run devOpen your browser and navigate to exactly http://localhost:3000 (or whichever port Next.js assigns if 3000 is taken) to use the app!
- Framework: Next.js (App Router)
- Styling: Tailwind CSS v4
- AI Integration: Vercel AI SDK
- Providers:
@ai-sdk/google&@ai-sdk/groq
