Workopolis Job Scraper collects structured job listings from Workopolis.com so you can track hiring activity, analyze demand, and build reliable job datasets. It solves the pain of manual copy-paste by extracting consistent fields like title, company, location, and descriptions at scale. Built for analysts, recruiters, and developers who need clean Workopolis jobs data for search, alerts, or reporting.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for workopolis-job-scraper you've just found your team — Let’s Chat. 👆👆
This project scrapes job postings from Workopolis search results and (optionally) enriches each item by visiting the job detail page to capture full descriptions. It helps you automate job market monitoring, reduce time spent on repetitive research, and produce export-ready datasets for dashboards or pipelines. It’s designed for anyone who needs fast, repeatable access to Workopolis jobs data—without manual browsing.
- Supports keyword + location searches or direct search URLs for precise targeting
- Handles multi-page results with safety limits to prevent runaway pagination
- Optional detail collection to capture complete job descriptions in text and HTML
- Produces consistent, schema-friendly records suitable for databases and BI tools
- Includes scrape metadata fields for auditing, freshness, and pipeline traceability
| Feature | Description |
|---|---|
| Flexible search | Search by keyword and location, or scrape directly from custom search URLs. |
| Posted-date filtering | Focus on recent roles using posted-date filters (e.g., 24h, 7d, 30d). |
| Detail enrichment | Optionally open each job page to extract full descriptions and richer context. |
| Pagination handling | Automatically collects jobs across multiple result pages with guardrails. |
| Structured dataset output | Produces consistent fields for easy JSON/CSV export and analysis. |
| Proxy-ready reliability | Works with proxy settings to reduce blocking and improve stability. |
| Performance-oriented crawling | Optimized for fast collection with concurrent requests and configurable limits. |
| Cookie support | Accepts cookies as a header string or JSON for sessions and edge cases. |
| Field Name | Field Description |
|---|---|
| url | Direct link to the job posting. |
| title | Job title as shown on the listing or detail page. |
| company | Hiring company name associated with the role. |
| location | Job location (city/region/province) from the posting. |
| date_posted | Human-readable posting recency (e.g., "2 days ago"). |
| description_html | Full job description captured as HTML when detail scraping is enabled. |
| description_text | Full job description captured as plain text when detail scraping is enabled. |
| _source | Source identifier for the dataset (static value for this scraper). |
| _fetchedAt | ISO timestamp indicating when the record was collected. |
| _from | Indicates whether the record came from a list page or a detail page. |
[
{
"url": "https://www.workopolis.com/job/abc123",
"title": "Senior Software Engineer",
"company": "Tech Company Inc.",
"location": "Toronto, ON",
"date_posted": "2 days ago",
"description_html": "<div><h2>About the role</h2><p>Build and scale backend services...</p></div>",
"description_text": "About the role\nBuild and scale backend services...",
"_source": "workopolis.com",
"_fetchedAt": "2025-10-22T10:30:00.000Z",
"_from": "detail"
}
]
Workopolis Job Scraper (IMPORTANT :!! always keep this name as the name of the apify actor !!! Workopolis Job Scraper )/
├── src/
│ ├── main.js
│ ├── config/
│ │ ├── input.schema.json
│ │ └── defaults.json
│ ├── crawlers/
│ │ ├── searchCrawler.js
│ │ └── detailCrawler.js
│ ├── extractors/
│ │ ├── extractSearchCards.js
│ │ ├── extractJobDetail.js
│ │ └── normalizeFields.js
│ ├── utils/
│ │ ├── buildSearchUrl.js
│ │ ├── dateFilters.js
│ │ ├── dedupe.js
│ │ ├── logger.js
│ │ └── validation.js
│ └── storage/
│ ├── pushDataset.js
│ └── stateStore.js
├── data/
│ ├── input.examples.json
│ └── output.sample.json
├── .env.example
├── .gitignore
├── package.json
├── package-lock.json
└── README.md
- Recruiters use it to monitor new Workopolis postings for target roles, so they can respond faster to hiring demand and candidate sourcing opportunities.
- Market analysts use it to track job volume and location trends across Canada, so they can generate labor market insights and forecasts.
- Sales teams use it to build prospect lists from companies actively hiring, so they can prioritize outreach to high-intent organizations.
- Career platforms use it to populate job feeds with normalized fields, so they can deliver better search and alert experiences to users.
- Content creators use it to extract hiring signals by category and region, so they can publish data-backed career and salary insights.
How do I choose between keyword/location search and custom URLs? Use keyword + location when you want a simple, repeatable configuration and quick iteration. Use custom URLs when you need exact filters already supported by the site (specific query parameters, sorting, or niche combinations). Custom URLs are also helpful if you’ve already validated a search in the browser and want the scraper to reproduce it precisely.
What happens if I disable detail collection? With detail collection off, the scraper focuses on speed and collects what’s available on result cards (typically URL, title, company, location, and posting recency). This is ideal for large-scale monitoring and alerting where full descriptions are not required. If you need description_text/description_html, enable detail collection.
How do I avoid duplicates across pages or repeated runs? The scraper deduplicates by job URL during a run to prevent repeated items when pagination overlaps. For recurring runs, store the last-seen URLs (or a URL hash) and filter them in your pipeline to keep only new postings.
Why am I getting blocked or seeing empty results sometimes? Blocks can happen due to rate limits or aggressive concurrency. Reduce the number of pages, lower concurrency, and enable proxy settings for higher success rates. If you rely on session-based content, provide cookies via the cookies or cookiesJson option.
Primary Metric: With detail collection enabled, typical throughput is ~50–100 job records per minute depending on query complexity and network conditions.
Reliability Metric: With proxy settings enabled and conservative concurrency, runs commonly achieve 95%+ successful fetch rate across paginated searches, with most failures attributable to transient network timeouts.
Efficiency Metric: Detail scraping generally uses moderate resources (about 1–2 GB memory) and scales predictably with results_wanted and max_pages, making it suitable for scheduled monitoring.
Quality Metric: When detail collection is enabled, description completeness is high (full HTML + normalized text), and field consistency remains stable due to normalization and validation on each record.
