Skip to content

tuansondoan-mark/Cyclistic-bike-share-EDA-with-SQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Cyclistic Bike-Share Data Analysis

This project is a case study from the Google Data Analytics Professional Certificate Capstone course. The main objective is to address a key business question to guide the company's marketing strategy: How do annual members (Subscribers) and casual riders (Customers) use Cyclistic bikes differently?


📌 Project Overview & Background

Cyclistic is a fictional bike-share company based in Chicago. Historically, Cyclistic's marketing strategy has focused on building general brand awareness. However, the Director of Marketing believes that the company's future success depends on maximizing the number of annual memberships.

To design effective marketing campaigns aimed at converting casual riders, we need to analyze historical bike trip data to understand how these two user segments behave differently.

👥 User Types

  1. Casual Riders (Customers): Users who purchase single-ride or full-day passes.
  2. Annual Members (Subscribers): Users who purchase annual memberships.

🛠️ Tools and Technologies Used

  • BigQuery SQL (Python Client API): Used for data exploration, integrity checks, data cleaning, and aggregation.
  • Tableau: The primary tool used to design interactive dashboards and visualize data trends.
  • Jupyter Notebook: Python environment utilized to connect with the BigQuery Client API, run SQL queries, and document the analysis process.

📊 Data Source

The analysis uses trip data from Q1 2019 and Q1 2020:

  • Q1 2019 Dataset: trips_2019_Q1 (365,069 rows)
  • Q1 2020 Dataset: trips_2020_Q1 (426,887 rows)

Technical Note: Column names differed between the two datasets (e.g., trip_id corresponds to ride_id, and usertype corresponds to member_casual). The preprocessing step standardized these columns before merging.


🧼 Data Cleaning & Standardization

  1. Duplicate Check: Verified that there are no duplicate entries in either dataset by comparing COUNT(id) and COUNT(DISTINCT id).
  2. Time Range Verification: Confirmed that all trip start times fell precisely within Q1 2019 and Q1 2020.
  3. Logic Errors & Data Correction:
    • Detected 24 records in the Q1 2019 dataset where the recorded tripduration did not match the calculated duration (end_time - start_time).
    • Corrected these records by recalculating the duration using the TIMESTAMP_DIFF function.
  4. Data Aggregation: Merged the datasets using a UNION ALL query, standardized user categories to Subscriber and Customer, and filtered out trips with durations less than or equal to 0.
    • Total clean dataset size: 791,746 rows (saved as CleanedTripData).

💻 SQL Query for Merging and Cleaning Data:

SELECT
  trip_id, start_time, end_time, start_station_id, start_station_name, end_station_id, end_station_name,
  CASE
    WHEN usertype = 'member' THEN 'Subscriber'
    WHEN usertype = 'casual' THEN 'Customer'
    ELSE usertype
  END AS cleaned_usertype,
  ROUND(TIMESTAMP_DIFF(end_time, start_time, SECOND), 1) AS trip_duration,
  FORMAT_DATE('%A', start_time) AS trip_start_weekday
FROM (
  SELECT
    CAST(trip_id AS STRING) AS trip_id, start_time, end_time, 
    from_station_id AS start_station_id, from_station_name AS start_station_name, 
    to_station_id AS end_station_id, to_station_name AS end_station_name, usertype
  FROM `bike-usage-differences.bike_usage_2019.trips_2019_Q1`
  UNION ALL
  SELECT
    ride_id, started_at, ended_at, 
    start_station_id, start_station_name, 
    end_station_id, end_station_name, member_casual
  FROM `bike-usage-differences.bike_usage_2019.trips_2020_Q1`
) AS union_table
WHERE TIMESTAMP_DIFF(end_time, start_time, SECOND) > 0;

📈 User Behavior Analysis (Insights)

1. General Overview of Trips and Durations

  • Average trip duration across all users: 1,184.13 seconds (~19.7 minutes).
  • Most active day of the week (Mode of Weekday): Tuesday.
User Type Number of Trips Percentage Average Trip Duration (Seconds) Average Trip Duration (Minutes)
Subscriber (Members) 720,313 90.98% 795.25 ~13.25 mins
Customer (Casual Riders) 71,433 9.02% 5,105.49 ~85.09 mins

Key Takeaway: Subscribers represent the vast majority of trips (~91%), but Customers have average trip durations that are nearly 6.5 times longer than Subscribers.

2. Weekday vs. Weekend Usage Comparison

User Type Day of the Week Number of Trips Average Trip Duration (Seconds)
Customer Monday 6,694 3,969.67
Customer Tuesday 7,972 4,183.81
Customer Wednesday 8,363 4,120.09
Customer Thursday 7,771 7,773.23
Customer Friday 8,508 5,736.57
Customer Saturday 13,473 4,950.77
Customer Sunday 18,652 5,061.30
Subscriber Monday 110,430 822.31
Subscriber Tuesday 127,974 769.44
Subscriber Wednesday 121,903 711.98
Subscriber Thursday 125,228 707.21
Subscriber Friday 115,168 796.73
Subscriber Saturday 59,413 974.07
Subscriber Sunday 60,197 972.94

Key Behavioral Insights:

  • Subscribers (Annual Members): Primarily use bikes for daily commuting (to work or school) from Monday to Friday, resulting in high weekday trip volumes and short, consistent trip times.
  • Customers (Casual Riders): Primarily use bikes for leisure and exercise, with trip volumes peaking significantly on weekends (Saturday and Sunday) and featuring much longer durations.

🎨 Data Visualization (Tableau)

Tableau dashboards highlighted two main trends:

  1. Trip Distribution: A clear breakdown showing the high volume of weekday trips by Subscribers compared to the leisure-oriented rides by Customers.
  2. Weekly Usage Patterns: Graphical trends showing Subscribers' activity dipping on weekends while Customers' activity surges.

🎯 Conclusion & Marketing Recommendations

Based on the insights derived from the data, we recommend the following marketing strategies to convert casual riders into annual members:

  1. "Commute Benefits" Marketing Campaign:
    • Design marketing materials highlighting the financial savings, health benefits, environmental impact, and convenience of using Cyclistic bikes for daily commuting.
    • Target casual riders who frequently rent bikes during typical weekday commute hours.
  2. Flexible/Weekend Membership Packages:
    • Introduce a new membership tier tailored for weekend riders (e.g., Weekend-only Annual Membership) or a loyalty program that rewards week-day commuting.
    • Send promotional upgrade offers to casual riders who show high weekend usage patterns.

About

Google Data Analytics Capstone project analyzing Cyclistic bike-share data using BigQuery SQL and Python. Explores usage patterns of casual riders vs. annual members to guide marketing strategy. Includes SQL cleaning scripts and EDA.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors