A production-ready tool for extracting detailed customer reviews from Mercadolivre product pages at scale. It helps teams turn raw Mercadolivre reviews into structured insights for sentiment analysis, benchmarking, and decision-making.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for mercadolivre-reviews-spider you've just found your team — Let’s Chat. 👆👆
This project extracts structured customer review data from Mercadolivre product listings, transforming unstructured feedback into clean, analyzable datasets. It solves the challenge of manually collecting and normalizing large volumes of product reviews. It is built for e-commerce analysts, marketers, data teams, and researchers.
- Collects full review details including ratings, text, dates, and images
- Handles multiple product URLs in a single run
- Produces consistent, analytics-ready structured output
- Designed for large-scale review analysis and reporting
| Feature | Description |
|---|---|
| Comprehensive Review Extraction | Captures ratings, titles, bodies, dates, images, and source URLs per review. |
| Scalable Crawling | Processes multiple product pages efficiently with parallel execution. |
| Structured Output | Outputs clean, normalized JSON ready for storage or analytics pipelines. |
| Proxy Support | Supports configurable proxy usage to improve access reliability. |
| Error Recovery | Retries failed requests and logs issues for stable long-running jobs. |
| Custom Inputs | Allows precise targeting through user-defined product URLs. |
| Field Name | Field Description |
|---|---|
| Review_Id | Unique identifier assigned to each customer review. |
| Product_Id | Identifier of the product associated with the review. |
| Rating | Numerical rating given by the customer. |
| Title | Short headline of the review. |
| Body | Full review text written by the customer. |
| Date | Date when the review was published. |
| Full_Review | Combined title and body text for convenience. |
| Image_URLs | List of image URLs attached to the review. |
| URL | Source URL where the review was collected. |
| Crawled_Date | Timestamp indicating when the data was extracted. |
[
{
"Review_Id": "1830050664",
"Product_Id": "MLM2031633061",
"Rating": 5,
"Title": "excelente",
"Body": "Esta robusta y tiene buenas funciones junto con la app, la recomiendo...",
"Date": "03-02-2025",
"Full_Review": "excelente: Esta robusta y tiene buenas funciones junto con la app...",
"Image_URLs": [
"https://http2.mlstatic.com/D_NQ_NP_982383-MLA82227086969_022025-F.jpg",
"https://http2.mlstatic.com/D_NQ_NP_786388-MLA81946094768_022025-F.jpg"
],
"URL": "https://articulo.mercadolibre.com.mx/noindex/catalog/reviews/MLM2031633061",
"Crawled_Date": "11-18-2025"
}
]
Mercadolivre Reviews Spider/
├── src/
│ ├── main.py
│ ├── crawler/
│ │ ├── reviews_collector.py
│ │ └── pagination_handler.py
│ ├── parsers/
│ │ ├── review_parser.py
│ │ └── text_cleaner.py
│ ├── utils/
│ │ ├── request_manager.py
│ │ └── logger.py
│ └── config/
│ └── settings.example.json
├── data/
│ ├── sample_input.json
│ └── sample_output.json
├── requirements.txt
└── README.md
- E-commerce analysts use it to analyze Mercadolivre reviews, so they can identify product strengths and weaknesses.
- Marketing teams use it to monitor customer sentiment, so they can optimize messaging and positioning.
- Competitive researchers use it to compare similar products, so they can benchmark performance.
- Content teams use it to aggregate real user feedback, so they can create authentic review-based content.
- Academic researchers use it to study consumer behavior trends, so they can support data-driven publications.
Can I scrape reviews from multiple products at once? Yes, the tool supports multiple product URLs in a single run, allowing batch collection at scale.
Does it include review images and ratings? Yes, ratings, text content, and all available image URLs are extracted per review.
Is the output easy to integrate with analytics tools? The output is structured JSON, making it straightforward to load into databases, dashboards, or BI tools.
How does it handle failed requests? Built-in retry logic and logging help maintain stability and data completeness during long runs.
Primary Metric: Processes hundreds of reviews per minute depending on page size and network conditions.
Reliability Metric: Maintains a high successful extraction rate with automatic retries for transient failures.
Efficiency Metric: Optimized request handling minimizes redundant loads and reduces execution time.
Quality Metric: Delivers high data completeness with consistent field coverage across reviews.
