Skip to content

factden/g2-reviews-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

G2 Reviews Scraper

Extract public G2 reviews and product data at scale — with ranked top‑10 competitors per product and an LLM‑ready markdown field for direct RAG ingestion. Runs on Apify.

Run on Apify License: MIT

This repo is the developer entry point for the G2 Reviews Scraper actor: the output shape, copy‑paste API snippets, a full field dictionary, and a short how‑to. The actor itself runs on Apify — no login, proxy, or anti‑bot setup required.

Run it on Apify →

G2 Reviews Scraper — sample review row with 32 structured fields


What it extracts

Two modes, one actor:

  • Reviews mode — give it G2 product URLs (or bare slugs like slack) and get every public review as a clean, structured row: 32 fields including 6 sub‑ratings, structured pros / cons / problemsSolved, switching history with named competitors, reviewer industry / role / company size / country, and an LLM‑ready markdownContent field.
  • Products mode — give it a keyword (e.g. CRM, communication, project management) and get the top matching products with metadata — competitor discovery before you pull reviews.

Two things you won't find in other G2 scrapers

🏆 Ranked top‑10 competitors per product — mined from each reviewer's switching‑from data and resolved to real product names (not opaque IDs). Battlecard‑ready, no aggregation code.

🤖 LLM‑ready markdownContent per review — a self‑contained markdown block, ready for direct vector‑DB / RAG ingestion with zero formatting work.

Ranked top-10 competitors per product LLM-ready markdown field
Ranked top‑10 competitors per product LLM‑ready markdownContent field

Quick start (API)

from apify_client import ApifyClient

client = ApifyClient("<YOUR_APIFY_TOKEN>")
run = client.actor("factden/g2-reviews-scraper").call(run_input={
    "mode": "reviews",
    "startUrls": ["https://www.g2.com/products/slack/reviews", "notion"],
    "maxReviewsPerProduct": 100,
})
for row in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(row["reviewTitle"], row["overallRating"])

More: Python · Node · curl


Output

Real sample output lives in examples/:

📊 Full 500-review sample dataset (5 products, download-ready CSV / JSON / JSONL): HuggingFace · Kaggle.

Every field is documented in FIELDS.md. From Apify you can download results as JSON, CSV, Excel, or HTML.


Use cases

  • Competitive battlecards — who switched away from a rival, and why (previousCompetitors, whySwitched).
  • Voice‑of‑customer / product research — structured pros, cons, problemsSolved across hundreds of reviews.
  • AI / RAG pipelines — drop markdownContent straight into a vector DB.
  • Market mapping — Products mode for category discovery and competitor sets.

How much does it cost?

Pay‑per‑event on Apify: $0.01 per run + $0.004 per row. New Apify accounts get $5 in free credit (~1,250 rows). See the actor page for current pricing.


FAQ

Is scraping G2 reviews legal? The actor collects only publicly available review data. As with any scraping, review G2's Terms of Service and your local regulations, and use the data responsibly.

Do I need a G2 account or proxies? No. Everything runs inside the actor on Apify's infrastructure.

Found a bug or want a field added? Open an issue here, or use the Issues tab on the Apify actor page.


Other scrapers by FactDen


The sample data in this repo is real public G2 review data, collected with the actor and provided for documentation/evaluation. Run the actor on Apify to pull data for any product, at any scale.