Skip to content

gurnebwaissneq/workopolis-job-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Workopolis Job Scraper

Workopolis Job Scraper collects structured job listings from Workopolis.com so you can track hiring activity, analyze demand, and build reliable job datasets. It solves the pain of manual copy-paste by extracting consistent fields like title, company, location, and descriptions at scale. Built for analysts, recruiters, and developers who need clean Workopolis jobs data for search, alerts, or reporting.

Bitbash Banner

Telegram   WhatsApp   Gmail   Website

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for workopolis-job-scraper you've just found your team — Let’s Chat. 👆👆

Introduction

This project scrapes job postings from Workopolis search results and (optionally) enriches each item by visiting the job detail page to capture full descriptions. It helps you automate job market monitoring, reduce time spent on repetitive research, and produce export-ready datasets for dashboards or pipelines. It’s designed for anyone who needs fast, repeatable access to Workopolis jobs data—without manual browsing.

Job Market Monitoring Workflow

  • Supports keyword + location searches or direct search URLs for precise targeting
  • Handles multi-page results with safety limits to prevent runaway pagination
  • Optional detail collection to capture complete job descriptions in text and HTML
  • Produces consistent, schema-friendly records suitable for databases and BI tools
  • Includes scrape metadata fields for auditing, freshness, and pipeline traceability

Features

Feature Description
Flexible search Search by keyword and location, or scrape directly from custom search URLs.
Posted-date filtering Focus on recent roles using posted-date filters (e.g., 24h, 7d, 30d).
Detail enrichment Optionally open each job page to extract full descriptions and richer context.
Pagination handling Automatically collects jobs across multiple result pages with guardrails.
Structured dataset output Produces consistent fields for easy JSON/CSV export and analysis.
Proxy-ready reliability Works with proxy settings to reduce blocking and improve stability.
Performance-oriented crawling Optimized for fast collection with concurrent requests and configurable limits.
Cookie support Accepts cookies as a header string or JSON for sessions and edge cases.

What Data This Scraper Extracts

Field Name Field Description
url Direct link to the job posting.
title Job title as shown on the listing or detail page.
company Hiring company name associated with the role.
location Job location (city/region/province) from the posting.
date_posted Human-readable posting recency (e.g., "2 days ago").
description_html Full job description captured as HTML when detail scraping is enabled.
description_text Full job description captured as plain text when detail scraping is enabled.
_source Source identifier for the dataset (static value for this scraper).
_fetchedAt ISO timestamp indicating when the record was collected.
_from Indicates whether the record came from a list page or a detail page.

Example Output

[
      {
            "url": "https://www.workopolis.com/job/abc123",
            "title": "Senior Software Engineer",
            "company": "Tech Company Inc.",
            "location": "Toronto, ON",
            "date_posted": "2 days ago",
            "description_html": "<div><h2>About the role</h2><p>Build and scale backend services...</p></div>",
            "description_text": "About the role\nBuild and scale backend services...",
            "_source": "workopolis.com",
            "_fetchedAt": "2025-10-22T10:30:00.000Z",
            "_from": "detail"
      }
]

Directory Structure Tree

Workopolis Job Scraper (IMPORTANT :!! always keep this name as the name of the apify actor !!! Workopolis Job Scraper )/
├── src/
│   ├── main.js
│   ├── config/
│   │   ├── input.schema.json
│   │   └── defaults.json
│   ├── crawlers/
│   │   ├── searchCrawler.js
│   │   └── detailCrawler.js
│   ├── extractors/
│   │   ├── extractSearchCards.js
│   │   ├── extractJobDetail.js
│   │   └── normalizeFields.js
│   ├── utils/
│   │   ├── buildSearchUrl.js
│   │   ├── dateFilters.js
│   │   ├── dedupe.js
│   │   ├── logger.js
│   │   └── validation.js
│   └── storage/
│       ├── pushDataset.js
│       └── stateStore.js
├── data/
│   ├── input.examples.json
│   └── output.sample.json
├── .env.example
├── .gitignore
├── package.json
├── package-lock.json
└── README.md

Use Cases

  • Recruiters use it to monitor new Workopolis postings for target roles, so they can respond faster to hiring demand and candidate sourcing opportunities.
  • Market analysts use it to track job volume and location trends across Canada, so they can generate labor market insights and forecasts.
  • Sales teams use it to build prospect lists from companies actively hiring, so they can prioritize outreach to high-intent organizations.
  • Career platforms use it to populate job feeds with normalized fields, so they can deliver better search and alert experiences to users.
  • Content creators use it to extract hiring signals by category and region, so they can publish data-backed career and salary insights.

FAQs

How do I choose between keyword/location search and custom URLs? Use keyword + location when you want a simple, repeatable configuration and quick iteration. Use custom URLs when you need exact filters already supported by the site (specific query parameters, sorting, or niche combinations). Custom URLs are also helpful if you’ve already validated a search in the browser and want the scraper to reproduce it precisely.

What happens if I disable detail collection? With detail collection off, the scraper focuses on speed and collects what’s available on result cards (typically URL, title, company, location, and posting recency). This is ideal for large-scale monitoring and alerting where full descriptions are not required. If you need description_text/description_html, enable detail collection.

How do I avoid duplicates across pages or repeated runs? The scraper deduplicates by job URL during a run to prevent repeated items when pagination overlaps. For recurring runs, store the last-seen URLs (or a URL hash) and filter them in your pipeline to keep only new postings.

Why am I getting blocked or seeing empty results sometimes? Blocks can happen due to rate limits or aggressive concurrency. Reduce the number of pages, lower concurrency, and enable proxy settings for higher success rates. If you rely on session-based content, provide cookies via the cookies or cookiesJson option.


Performance Benchmarks and Results

Primary Metric: With detail collection enabled, typical throughput is ~50–100 job records per minute depending on query complexity and network conditions.

Reliability Metric: With proxy settings enabled and conservative concurrency, runs commonly achieve 95%+ successful fetch rate across paginated searches, with most failures attributable to transient network timeouts.

Efficiency Metric: Detail scraping generally uses moderate resources (about 1–2 GB memory) and scales predictably with results_wanted and max_pages, making it suitable for scheduled monitoring.

Quality Metric: When detail collection is enabled, description completeness is high (full HTML + normalized text), and field consistency remains stable due to normalization and validation on each record.

Book a Call Watch on YouTube

Review 1

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

Review 2

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

Review 3

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★

Releases

No releases published

Packages

 
 
 

Contributors