Earnings Signal Lab

Test whether granular NLP features from earnings call transcripts predict forward stock returns.

This is not a sentiment analysis tool. Instead of computing a single "positive/negative" score, it extracts 16 specific behavioral and linguistic features from earnings transcripts and Q&A sessions, then backtests each one against real price data.

The 16 Features

Category	Feature	Signal Direction
Management Behavior	Hedging Language	Bearish when high
	Q&A Deflection Rate	Bearish when high
	Guidance Specificity	Bullish when high
	Confidence Shift (Prepared→Q&A)	Bearish when high
Analyst Behavior	Analyst Skepticism	Bearish when high
	Surprise Indicators	Context-dependent
	Question Clustering	Context-dependent
Forward Guidance	Guidance Revision Direction	Bullish when raised
	Qualifier Density	Bearish when high
Risk Signals	New Risk Factor Mentions	Bearish when high
	External Blame Attribution	Bearish when high
Strategic Signals	CapEx/Investment Tone	Bullish when aggressive
	Hiring & Headcount	Bullish when growing
	Competitive Positioning	Bearish when concerned
Demand Signals	Customer/Demand Descriptors	Bullish when strong
	Pricing Power Indicators	Bullish when strong

How It Works

HuggingFace Dataset → Claude NLP Extraction → Yahoo Finance Prices → Statistical Backtest

Load transcripts from the public glopardo/sp500-earnings-transcripts HuggingFace dataset (~20.7k transcripts, ~496 S&P 500 tickers, 2014–Nov 2025)
Extract features using Claude API — each transcript is analyzed for all 16 features with scores (0-1), evidence quotes, and section attribution (prepared remarks vs Q&A)
Get price data from Yahoo Finance at 1D, 5D, 10D, 21D post-earnings
Backtest — Information Coefficient, directional accuracy, Sharpe ratios, p-values, feature correlations, and multi-feature combination tests

Quick Start

Prerequisites

pip install datasets yfinance pandas numpy anthropic scipy

API Keys

Transcripts come from a public HuggingFace dataset — no API key needed for that.

You only need a Claude API key for the NLP analysis step:

Anthropic: https://console.anthropic.com

export ANTHROPIC_API_KEY="your_claude_key"

Run the Pipeline

# Full pipeline (first run)
python earnings_signal_pipeline.py

# Check what's cached
python earnings_signal_pipeline.py --status

The pipeline will:

Load ~20.7k transcripts across ~496 S&P 500 tickers from HuggingFace
Analyze each with Claude (~$0.02/transcript)
Fetch price data from Yahoo Finance
Output earnings_signal_data/backtest_results.json

All data is cached, so re-runs skip already-processed transcripts.

Refresh Mode

After the initial run, use refresh mode to update price data as holding periods mature:

# Update only events with incomplete returns (e.g., 21D hadn't elapsed yet)
# and recent earnings (within last 35 days), then re-run backtest
python earnings_signal_pipeline.py --refresh

# Re-fetch ALL price data from scratch (backs up old data first)
python earnings_signal_pipeline.py --refresh-all

Refresh mode is smart about what it updates — it shows you exactly what changed:

↻ NVDA_Q4_2024 [incomplete returns]: 21D: None→+8.3%
↻ AAPL_Q1_2025 [recent (12d ago)]: 10D: +2.1%→+2.4%, 21D: None→-1.2%

Run Individual Steps

python earnings_signal_pipeline.py --step pull       # Only pull new transcripts
python earnings_signal_pipeline.py --step analyze    # Only run Claude on unanalyzed transcripts
python earnings_signal_pipeline.py --step prices     # Only fetch missing price data
python earnings_signal_pipeline.py --step backtest   # Re-run backtest on existing data
python earnings_signal_pipeline.py --no-confirm      # Skip prompts (for cron/automation)

View Results

Upload backtest_results.json to the React dashboard (dashboard/earnings-signal-analyzer.jsx), or load it in Claude.ai as an artifact.

Key Metrics

Information Coefficient (IC): Spearman rank correlation between feature score and forward returns. IC > 0.10 is considered meaningful in quant finance.
Directional Accuracy: How often the signal correctly predicts up/down.
Sharpe Ratio: Risk-adjusted return when the signal is triggered.
p-value: Statistical significance of the IC.

Customization

Adjust holding periods

HOLDING_PERIODS = {
    "1D": 1,
    "5D": 5,
    "10D": 10,
    "21D": 21,
}

Modify the feature extraction prompt

The EXTRACTION_PROMPT variable contains the full Claude prompt for feature extraction. You can add, remove, or modify features here.

Project Structure

earnings-signal-lab/
├── earnings_signal_pipeline.py    # Main pipeline (pull, analyze, backtest)
├── server/
│   └── app.py                     # FastAPI web server + public API
├── static/
│   └── index.html                 # Public dashboard (no raw transcripts)
├── dashboard/
│   └── earnings-signal-analyzer.jsx  # React dashboard (for Claude.ai artifact)
├── cron_runner.py                 # Automated daily updates
├── first_run.py                   # One-time initial data population
├── Dockerfile                     # Railway/Docker deployment
├── railway.toml                   # Railway configuration
├── requirements.txt
├── .gitignore
└── README.md

Output Structure

earnings_signal_data/
├── transcripts/          # Cached raw transcripts per company
├── analysis/             # Claude feature extractions per earnings call
├── price_data.json       # Yahoo Finance price data
├── backtest_results.json # Final results (served via API)
├── backtest_dataset.csv  # Raw dataset for custom analysis
└── SUMMARY.md            # Claude-generated analysis summary

Deploy to Railway

The app runs as a full-stack Python service on Railway: FastAPI serves the public dashboard and API, a cron job runs daily updates.

Step 1: Initial data population (local)

export ANTHROPIC_API_KEY="your_key"
python first_run.py

This takes ~30 min and costs ~$5 in Claude API calls. All data is saved to earnings_signal_data/.

Step 2: Push to GitHub

git add -A
git commit -m "Initial commit with data"
git push origin main

Note: The .gitignore excludes earnings_signal_data/ by default. For Railway, you have two options:

Option A: Remove earnings_signal_data/ from .gitignore and commit the data (simpler, ~50MB)
Option B: Use a Railway persistent volume (better for ongoing updates)

Step 3: Create Railway project

Go to railway.app and create a new project
Connect your GitHub repo
Railway will auto-detect the Dockerfile and deploy

Step 4: Set environment variables

In Railway dashboard → your service → Variables:

ANTHROPIC_API_KEY=your_anthropic_key
PORT=8000

Step 5: Add persistent volume (Option B)

If using a persistent volume for data:

In Railway dashboard → your service → Settings → Volumes
Add a volume mounted at /app/earnings_signal_data
Upload your local earnings_signal_data/ contents to the volume

Step 6: Set up cron job

In your Railway project, add a second service
Set the source to the same GitHub repo
Set the start command to: python cron_runner.py
In service settings, set schedule: 0 18 * * 1-5 (weekdays at 6pm ET)
Add the same environment variables

Public API Endpoints

Once deployed, your Railway URL exposes:

Endpoint	Description
`GET /`	Public dashboard
`GET /api/results`	Full results (features, regression, combos)
`GET /api/predictions`	Just prediction weights and expected performance
`GET /api/features`	Individual feature signal strength
`GET /api/summary`	Markdown analysis summary
`GET /api/status`	Pipeline health and data freshness
`GET /docs`	Interactive API documentation

No raw transcripts or Claude analysis text is exposed through any endpoint.

Legal Notes

Transcripts: Sourced from the public glopardo/sp500-earnings-transcripts HuggingFace dataset. Not redistributed.
Price Data: Yahoo Finance data is for personal use per their terms.
Predictions: Your derived analysis/predictions are your own work.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Earnings Signal Lab

The 16 Features

How It Works

Quick Start

Prerequisites

API Keys

Run the Pipeline

Refresh Mode

Run Individual Steps

View Results

Key Metrics

Customization

Adjust holding periods

Modify the feature extraction prompt

Project Structure

Output Structure

Deploy to Railway

Step 1: Initial data population (local)

Step 2: Push to GitHub

Step 3: Create Railway project

Step 4: Set environment variables

Step 5: Add persistent volume (Option B)

Step 6: Set up cron job

Public API Endpoints

Legal Notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
combined_data		combined_data
dashboard		dashboard
notebooks		notebooks
sec_filing_data		sec_filing_data
server		server
static		static
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.notebook		Dockerfile.notebook
LICENSE		LICENSE
README.md		README.md
cron_runner.py		cron_runner.py
earnings_signal_pipeline.py		earnings_signal_pipeline.py
first_run.py		first_run.py
railway.toml		railway.toml
requirements-notebook.txt		requirements-notebook.txt
requirements.txt		requirements.txt
sec-filing-analyzer-upgrade-prompt.md		sec-filing-analyzer-upgrade-prompt.md
sec_filing_pipeline.py		sec_filing_pipeline.py

Folders and files

Latest commit

History

Repository files navigation

Earnings Signal Lab

The 16 Features

How It Works

Quick Start

Prerequisites

API Keys

Run the Pipeline

Refresh Mode

Run Individual Steps

View Results

Key Metrics

Customization

Adjust holding periods

Modify the feature extraction prompt

Project Structure

Output Structure

Deploy to Railway

Step 1: Initial data population (local)

Step 2: Push to GitHub

Step 3: Create Railway project

Step 4: Set environment variables

Step 5: Add persistent volume (Option B)

Step 6: Set up cron job

Public API Endpoints

Legal Notes

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages