This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
A CLI tool that fetches race statistics from TypeRacer and generates visualizations. When credentials are provided it uses TypeRacer's official export endpoint (reliable, complete); otherwise it falls back to HTML scraping.
# Install dependencies
pip install -r requirements.txt
# Copy and fill in credentials (optional but recommended)
cp .env.example .env
# Fetch stats for a username and display plots
python get_stats.py -u <username>
# With date range (DD-MM-YYYY format)
python get_stats.py -u <username> --start-date "01-01-2024" --end-date "31-12-2024"
# Save to a specific CSV file
python get_stats.py -u <username> -f output.csv
# Visualize from an existing CSV (no network requests)
python get_stats.py -n -f output.csv
# Fetch without showing plots
python get_stats.py -u <username> --hide-plotsThere are no tests or linting configuration in this project.
Two pipelines share the same canonical DataFrame schema defined in constants.SCHEMA_COLUMNS: race, speed, accuracy, points, place, date, datetime, mode, text_id, skill_level, universe. The last five (EXTENDED_COLUMNS) are only populated by the export pipeline; the HTML pipeline fills them with pd.NA.
Authenticated (preferred) — ExportFetcher downloads per-bucket ZIPs from data.typeracer.com/pit/export_data?universe=play&bucket=N (1000 races each), ExportParser unzips and maps all 11 columns.
Unauthenticated (fallback) — HtmlFetcher paginates data.typeracer.com/pit/race_history via cursor-based links, HtmlParser extracts rows from Scores__Table__Row divs (extended columns set to pd.NA).
src/actions.py is the orchestration layer: calls create_session(), checks has_credentials(), picks the right fetcher+parser pair, applies date filtering, then hands the DataFrame to StatisticsVisualizer.
get_stats.py— CLI entry point (argparse), date validation, callsactions.parse()oractions.plot_stats_from_file()src/actions.py— picks pipeline, filters by date, saves CSV via_save_csv, launches Streamlit report via_launch_report(subprocess)src/auth.py—create_session()/has_credentials(): readsTYPERACER_USERNAME+TYPERACER_PASSWORDfrom.env, POSTs to/pit/loginsrc/fetchers/html_fetcher.py—HtmlFetcher.fetch_pages(username)generator: handles cursor pagination, progress output,@retryfor 429ssrc/fetchers/export_fetcher.py—ExportFetcher.fetch_all_buckets(username)generator: scrapes bucket hrefs from export page, yields raw ZIP bytessrc/parsers/html_parser.py—HtmlParser.parse_page(html)→List[dict]with all 11 schema columnssrc/parsers/export_parser.py—ExportParser.parse_zip(bytes)→pd.DataFrame; mapsRace #→race,WPM→speed,Accuracy×100→accuracy,Rank/# Racers→place,Date/Time (UTC)→date+datetime, plusMode,Text ID,Skill Level,Universesrc/retry_decorator.py—@retry()with exponential backoff (2^n + random(0,2)s), default 4 retriessrc/constants.py—URL(HTML scraper template),EXPORT_URL,TYPERACER_BASE_URL,SCHEMA_COLUMNS,EXTENDED_COLUMNSsrc/exceptions.py—ServerTooManyRequestsError(raised on HTTP 429)
A Streamlit + Plotly interactive dashboard (not yet wired to a CLI entry point; run separately with streamlit run). Requires streamlit and plotly which are not in requirements.txt yet.
src/report/data.py—load_csv(path)(@st.cache_data): reads CSV, coerces types, splitsplaceintoplace_rank/place_total, addsis_win/is_podiumbooleans.filter_df()applies date, WPM, accuracy, race-size, and mode filters.src/report/metrics.py— pure-pandas analytics:compute_kpis,compare_periods(last 30 days vs previous 30),compute_streaks,compute_milestones,compute_personal_bests,compute_monthly_summarysrc/report/charts.py— Plotly figure factories (all accept a preprocessed DataFrame): WPM scatter + rolling avg, histogram, PB progression, box by month, trend regression, animated monthly histogram, calendar heatmap, hour polar, day-of-week bar, monthly volume, hour×day heatmap, placement distribution, accuracy vs WPM scatter, cumulative points, mode performance, text repeat analysis, win-rate rollingsrc/report/components.py— Streamlit HTML helper components:kpi_card,render_kpi_row,section_header,empty_state,herosrc/report/theme.py—COLORSdict,HEATMAP_SCALE,SEQUENCE,register_template()(registers a dark Plotly template),CSSstring injected viast.markdown
Write comments or docstrings only when the code is genuinely complex and a reader would be confused without them. Default to clean, comment-free code.
- Export buckets: each bucket URL (
?bucket=N) returns a ZIP containing one CSV. Bucket links are scraped from the export page HTML — not hardcoded. - Column mapping (export CSV → DataFrame):
Accuracyis a 0–1 float → stored asstr(round(val*100, 2)).placecombinesRankand# Racersas"Rank/# Racers"to stay compatible withplot_placeswhich filters for/5. Extended columns (datetime,mode,text_id,skill_level,universe) are only populated by the export pipeline; charts/metrics that use them guard withhas_extended_data(). - Date filtering:
start_dateis exclusive,end_dateis inclusive (both applied inactions._filter_by_datesafter the full DataFrame is built; HTML pipeline also has an early-exit optimization). - HTML scraping: targets CSS classes
Scores__Table__Row,profileTableHeaderDate,profileTableHeaderRaces— fragile if TypeRacer changes their frontend. - Plot x-axis: all plots use actual race numbers except
plot_mean_tenswhich uses sequential index positions (1→N). Both correctly show oldest on the left because the DataFrame is sorted ascending byracebefore plotting. plot_places: only counts 5-player races (str.contains('/5')), an intentional filter in the original code.