NB Korea Demand Diagnosis · Competitive Benchmarking · BDC Operational Pipeline
A weekly performance monitoring system for New Balance Korea that triangulates search trends, social signals, macro indicators, and financial data against three global competitors (Nike, Adidas, Puma) to deliver actionable KPIs for Business Data Coordinator workflows. The pipeline tests the hypothesized Macro → Korea → Global demand chain via 5-dimension orthogonal causality framework, surfaces three critical analytical findings through self-diagnostic protocols, and serves a 6-tab Streamlit dashboard with 13 production-ready KPIs.
Live Dashboard: https://sportswear-brand-monitor-newbalance.streamlit.app/
Independent Project
- Motivation
- Data Sources
- Pipeline
- Methodology
- Key Findings — 24 Data Points
- Dashboard
- 13 Production KPIs
- Project Structure
- Analytical Governance
- Data Policy
- How to Run
- Dependencies
This project addresses four questions that a New Balance Korea Business Data Coordinator would need to answer weekly:
Is it Korean macro consumer sentiment, global brand momentum, or autonomous brand inertia? Bidirectional Granger causality testing across 8 brand×region pairs (NB + 3 competitors × 2 regions, max lag 4) reveals four distinct patterns:
| Brand | Global | Korea |
|---|---|---|
| New Balance | CSI → Search | CSI → Search |
| Adidas | Search → CSI (robust) | Feedback loop |
| Nike | Independent | Search → CSI |
| Puma | CSI → Search | Independent |
NB is the only brand showing CSI → Search single direction in both regions (Korea lag 3 p=0.0229, Global lag 2 p=0.0321). This indicates NB Korea is structurally macro-reactive demand — search rises after consumer sentiment improves, unlike Nike/Adidas which retain brand inertia that sustains search regardless of macro conditions.
Initial Stage 4 DTW analysis suggested Korea leads Global by ~10 weeks. Stage 7 self-diagnostic with three independent verification tests revealed a sign convention error in the original implementation:
- Test 1 — Synthetic data probe: Generated time series where Korea leads Global by exactly 5 weeks. Stage 4 code returned lag = -5 (sign inverted relative to its labeling convention).
- Test 2 — Leakage-free replication: Re-computed CC and DTW on actual NB Korea/Global series using one-sided trend estimation. 8/8 series classified as direction_flipped relative to Stage 4 labels.
- Test 3 — Mediation re-run signature: Sign-corrected mediation reduced inconsistent mediation from 120.35% indirect effect to 10.82%, and BCa CI width narrowed 47%.
Corrected statement: Global trend leads Korea by ~10 weeks (DTW ~10w, CC ~9w). Korea is a follower market, but the Global trend serves as a 10-week monitoring indicator for Korean demand — a dashboard reference value, not a forecast model input.
Stage 2's rolling z-score baseline (8-week window, |z| > 2.0) produced 69 anomalies (5.0% rate). MSTL residual-based detection with the same threshold produced 70 anomalies but only 6 overlapped between the two sets. The remaining 63 z-score anomalies were quarterly seasonal artifacts — a 91.3% false positive rate that motivated the methodology shift.
This single number empirically justifies the time series decomposition step: rolling windows cannot distinguish quarterly patterns from genuine anomalies on dual-season (annual + quarterly) series, while MSTL decomposition removes both periodicities before flagging residuals.
A 4-way comparison reveals Prophet wins both regions:
| Model | Korea RMSE | Korea MAPE | Global RMSE | Global MAPE |
|---|---|---|---|---|
| SARIMAX (Fourier K=2 + CSI) | 6.45 | 22.8% | 3.23 | 12.2% |
| Prophet (changepoint + CSI) | 5.57 | 18.5% | 2.54 | 10.6% |
| LSTM (2-layer stacked, 13w lookback) | 8.11 | 29.4% | 2.71 | 10.7% |
| Chronos-small (zero-shot) | 7.20 | 25.0% | 3.65 | 13.4% |
| Chronos-base (zero-shot) | 7.80 | 27.4% | 3.99 | 14.1% |
Ablation analysis isolates automatic changepoint detection as the dominant factor — not the model class itself, but the algorithm's ability to handle non-stationary trend changes in event-driven series. LSTM's 467:1 parameter-to-sample ratio causes structural overfitting, and Chronos shows anti-scaling (small > base) on short context windows, an empirically verified case of foundation model scaling failure on 174-week series.
| Source | Period | Volume | Role |
|---|---|---|---|
| Naver DataLab | 2 years rolling | 1,201 rows | Korean search trends in 4 keyword groups (brand / nb_product / nb_social / nb_channel) |
| Google Trends | 3.5 years | 8,698 rows | Brand-level + product-level search index, 8 CSVs stitched (Korea/Global × Web/YouTube/Shopping) |
| YouTube Data API | snapshot | 90 videos + 3,235 comments | Cross-sectional social volume (4 brands × 2-3 queries × top results) |
| Naver Blog/Cafe | snapshot | 10,000 posts | Sentiment corpus (post_date for time series, query_keyword for product-line attribution) |
| ECOS (한국은행) | 4 years monthly | 52 monthly | Consumer Sentiment Index (CSI), stat_code 511Y002 |
| Financials | FY2023-2024 | 16 rows | T4 CEO statement (NB Global), T2 sector estimates (NB Korea), T1 SEC 10-Q (Nike), T1 IR (Adidas, Puma) |
Sample period: 174 weeks for primary search analysis.
Data governance: Financial data tagged with source tier (T1=primary regulatory filing, T2=industry estimates, T3=secondary sources, T4=executive statement) for credibility weighting. T4 entries are not used in causal inference; only T1-T3 enter Granger inputs.
Sparse data exclusions: Two Naver DataLab keyword groups (nb_social: 61 rows, nb_channel: 268 rows) showed insufficient observation density for time-series decomposition. These are preserved in raw schema for future expansion but excluded from staging/mart layers per data governance protocol documented in methodology.md §4.
The pipeline runs across 8 sequential stages, each producing a checkpoint document with quantified findings, decisions made, and stage-to-stage influence tracking.
Two-phase preliminary work: (1) data availability verification across 6 sources with viability thresholds, (2) 6 hypothesis-driven exploratory analyses to confirm which directions are worth pursuing in main pipeline.
Hypothesis verdicts:
| Hypothesis | Result | Evidence |
|---|---|---|
| H1: Korea-Global cycle alignment | Method limitation (CC -167w artifact) | Deferred to Stage 4 DTW |
| H2: 530 SS vs 992 FW separation | Weak signal (530 SS/FW=1.18, 992 SS/FW=0.93) | Deferred to Stage 3 STL/MSTL |
| H3: 530 dependency intensifying | CONFIRMED — slope +2.36pp/Q, p=0.0213 | Promoted to KPI 1 |
| H4: Social → Search lead | REVERSED — NB search leads Instagram by 2 weeks (r=0.504) | Stage 4 Granger re-test |
| H5: D2C decline | CONFIRMED — slope -0.179/week, p=0.0001 | Promoted to KPI 2 |
| H6: NB padding timing | CONFIRMED with twist — NB starts 2.6w earlier (week 42.6 vs 45.2), but absolute scale 1.2 vs 13.9 | "Timing leads, momentum lags" |
6 collectors covering 6 data sources, raw schema 23,292 rows total. Per-source UPSERT vs --clear strategy chosen based on data semantics:
| Collector | Strategy | Rationale |
|---|---|---|
| Naver DataLab | UPSERT on (source_type, keyword_group, keyword, period_start) | Time series with relative-scale normalization per group, append-only |
| Google Trends | UPSERT on (geo, search_type, week_start, keyword) | Time series with absolute weekly index |
| YouTube | --clear + INSERT per brand | Snapshot semantics, full refresh on each run |
| Naver Blog/Cafe | --clear + INSERT per brand | Same; post_date used for time series reconstruction |
| ECOS | UPSERT on (stat_code, item_code, time) | Append-only macro time series |
| Financials | UPSERT on (brand, fiscal_period, metric_name, source_type) | Multi-source with tier governance |
8 SQL files implementing the 3-tier transformation:
raw schema (23,292 rows)
↓ 01_staging_search_weekly.sql — UNION ALL across 4 sources, weekly bucketing
↓ 02_staging_social_weekly.sql — JSON extraction, comment aggregation
↓ 03_mart_brand_kpi.sql — LAG window functions, WoW/MoM/YoY
↓ 04_mart_sov_analysis.sql — SUM OVER PARTITION BY for SoV percentage
↓ 05_mart_seasonal_classifier.sql — CASE WHEN for SS/FW classification
↓ 06_mart_anomaly_residuals.sql — moving avg, std deviation rolling windows
↓ 07_mart_top_bottom_ranking.sql — RANK, DENSE_RANK
↓ 08_mart_korea_global_join.sql — multi-key time-aligned joins
staging (11,037 rows) → mart (5 tables + 5 views)
Quantified findings from mart layer:
- 530 Korea avg share 54.98% (Stage 0 H3 confirmed and intensified from 51.2%)
- Korea = 530 / Global = 9060 dominant model structure (52-55% share in each region)
- 574 = cross-region anchor (Korea +82.28% divergence vs Global, both regions rank #2)
- NB Korea over-index +34.94% search, +16.43% SoV vs Global (new finding from mart)
Track A — Seasonal Decomposition: STL initially adopted, MSTL comparison run as ablation. Result: MSTL reduces residual variance by 48-75% across all 8 brand×region series, indicating dual-season structure (annual 52w + quarterly 13w). MSTL becomes primary; STL retained for documentation only.
SS/FW Ratio Reversal: Stage 0's H2 measurement re-interpreted. STL's single-period assumption absorbed quarterly patterns into the annual seasonal component, biasing the SS/FW ratio. With MSTL's clean separation, 530 Korea ratio inverts from 1.18 to 0.86 (FW-dominant), and all 5 Korean NB products show FW-dominance (0.71-0.99) while 4/5 Global products show SS-dominance (1.07-1.44). This Korea-FW vs Global-SS structural asymmetry is a Stage 7 Bridge analysis input.
Track B — Hybrid Sentiment Analysis: 3-way keyword dictionary (Product 96 / Channel 14 / Resale 2, total 112 keywords), sponsorship filter (11 patterns, 1.3% flagged, t=4.48 p=0.0005 between sponsored/organic groups), threshold-based routing.
| Stage | Method | Resolution | Cost |
|---|---|---|---|
| 1st pass | Keyword dictionary | 97.4% (9,742 of 10,003 texts) | free |
| 2nd pass | Claude Batch API | 2.6% (261 ambiguous cases, | score |
Sentiment results by brand (organic only, sponsored filtered):
| Brand | avg_sentiment | neutral % |
|---|---|---|
| New Balance | +0.289 (lowest) | 45.6% (highest) |
| Adidas | +0.436 | 17.0% |
| Nike | +0.474 | 13.7% |
| Puma | +0.367 | 27.4% |
NB has lowest average sentiment despite highest search over-index (+34.94%), with neutral ratio 45.6% vs others 13-27%. Interpretation: NB blog posts skew toward informational/news content rather than user reviews, suggesting search demand does not convert to deep engagement.
Track A — Bidirectional Granger Causality:
- 64 tests total (8 pairs × 2 directions × 4 lags)
- 4-pattern classification with NB as outlier (CSI→Search both regions)
- Stationarity: ADF + KPSS, asymmetric preprocessing (CSI 1st-differenced, search MSTL residual)
- Robustness: adidas/global lag 4 p=0.0140 after differencing (robust); nike/global Independent confirmed
Track B — DTW Korea-Global Lead-Lag (3-way comparison):
| Brand | Raw DTW | Trend DTW | Residual DTW | Verdict |
|---|---|---|---|---|
| New Balance | +48.0w | +10.4w | +42.2w | seasonal artifact stripped |
| Adidas | -14.0w | +17.3w | +12.8w | direction flip, magnitude stable |
| Nike | -70.4w | +99.1w | +26.8w | trend shapes too dissimilar |
| Puma | +57.0w | +26.2w | +25.1w | partial reduction |
Raw DTW lags cluster near 52w (annual cycle), capturing Korea-FW vs Global-SS phase difference rather than true lead-lag. Trend DTW removes seasonal contamination. Residual DTW excluded from primary interpretation (white noise alignment produces arbitrary lags).
NB result (sign-corrected in Stage 7): Trend DTW +10.4w with cross-correlation +9w convergence. Originally interpreted as "Korea leads Global"; corrected interpretation: Global leads Korea.
Cointegration finding: Engle-Granger test on 8 pairs, 5/8 cointegrated (NB Korea strongest p=0.0008). Long-run equilibrium between search and CSI confirms macro-reactive demand structure.
Three-method comparison:
| Method | Anomalies | Rate | Detection Type |
|---|---|---|---|
| Stage 2 rolling z-score | 69 | 5.0% | direction-aware, raw series |
| MSTL residual z-score (|z|>2.0) | 70 | 5.0% | direction-aware, deseasonalized |
| Isolation Forest (contamination=0.05) | 72 | 5.2% | direction-agnostic, deseasonalized |
Confusion matrix (z-score vs MSTL):
- Both anomaly: 6 (true cross-method agreement)
- Z-score only: 63 (91.3% false positive rate in Stage 2 baseline)
- MSTL only: 64 (new discoveries after deseasonalizing)
- Both normal: 1,259
Tier structure for investigation prioritization:
| Tier | Criteria | Anomalies | Rationale |
|---|---|---|---|
| 1 | Z-score included in 3-way agreement | 7 | Independent cross-validation (z-score uses raw series, MSTL/IF use residual) |
| 2 | M+IF only, in macro_event or multi-brand co-occurring weeks | 31 | Same MSTL input dependency; co-occurrence pattern adds value |
M+IF agreement alone is not independent cross-validation because both operate on MSTL residual. Only Z-score inclusion constitutes true cross-method validation. This methodological distinction motivates Tier 1 prioritization.
events_calendar (25 entries): Manually curated via anomaly-driven retroactive investigation. Categories: product_launch (4), campaign (10), collab (7), macro_event (4). Origin tagged: scheduled (7, independent ground truth like Olympics, Air Max Day, Black Friday) vs investigated (18, anomaly-driven discoveries).
Evaluation:
| Scope | Matched | Total | Precision | Note |
|---|---|---|---|---|
| Tier 1 | 5 | 7 | 71.4% | Independent cross-validation precision |
| Tier 1 + 2 | 17 | 38 | 44.7% | Includes same-input M+IF cases |
| Scheduled events | 6 | 7 | 85.7% | Event Detection Rate (independent ground truth) |
Narrative findings:
- Celebrity endorsement creates search spikes exceeding typical product launches (NB Korea 2024-09-15: 송혜교 + 아이유 + Stone Island simultaneous, z=+2.15)
- CSI macro events explain multi-brand dips (2024-12 CSI -12.7pt drop matched co-occurring NB+Puma Korea anomalies)
- Air Max Day creates competitor spillover anomalies (Nike's seasonal model absorbs the predictable annual event, but adidas/NB anomalies appear in the same week)
- 2024 spring 4-week global spike was Olympics D-100 campaign convergence (NB "We Got Now", adidas Olympic kit, Nike On Air, Puma FOREVER.FASTER)
Train/test split: 148/26 weeks (test = 2 quarterly cycles).
SARIMAX design decisions:
- Seasonal order s=52 infeasible (148-week train insufficient for seasonal AR/MA estimation)
- Quarterly dummy rejected (search seasonality is smooth sinusoidal, not abrupt level shift)
- Fourier K=2 for 52w + K=2 for 13w = 8 exogenous columns + 1 CSI = 9 total
- K=2 ceiling enforced (K≥3 pushes parameter/observation ratio into danger zone)
- auto_arima selection: Korea (0,1,1), Global (0,1,0) — Global is pure random walk
Exogenous coefficient findings:
| Variable | Korea | Global | Interpretation |
|---|---|---|---|
| CSI | 3.98 (p<0.001) | 2.01 (p<0.001) | Korea is 2x more sensitive to consumer sentiment |
| sin_52, cos_52 | sig (p=0.010, 0.024) | non-sig | Korea has annual seasonality, Global does not |
| cos_13_2 | sig | sig | Sub-quarterly pattern in both markets |
Global ARIMA(0,1,0) finding: No autoregressive structure. All predictive power comes from exogenous variables. This justifies adding Korea trend as exogenous in future Global forecasting (Stage 7 hypothesis input).
LSTM: Stacked 2-layer, hidden 64, dropout 0.3, 13-week lookback, no attention (174w insufficient). Param/sample ratio 467:1 = structural overfitting. Recursive forecasting accumulates error.
Chronos: Univariate zero-shot, small (46M) and base (200M) compared. Anti-scaling result: small outperforms base in both regions. 174 weeks insufficient context for base's complex prior; pre-trained longer-range patterns become noise on short context.
Prophet (final winner): RMSE 5.57 (Korea), 2.54 (Global). Both regions wins. Ablation isolates automatic changepoint detection as the dominant factor.
The originally planned analytical objective was quantitative validation of the CSI → NB Korea → NB Global 3-stage chain hypothesis using path-analytic mediation. Stage 7 design adopted a 5-dimension orthogonal null framework — testing the chain hypothesis across 5 independent statistical perspectives that combine to 11 specific tests:
- Mediation — Korea → Global direction (joint block bootstrap, BCa CI)
- Mediation — Global → Korea direction (joint block bootstrap, BCa CI)
- VARX bidirectional — joint impulse response with CSI exogenous
- Monthly Granger bidirectional — re-test at monthly resolution to control for weekly noise
- Lagged cointegration bidirectional — Engle-Granger long-run equilibrium test
Null hypothesis result: chain rejected across all 5 dimensions, all 11 tests.
However, the verification process surfaced three critical findings without external review:
The 5-dimension orthogonal null framework was designed before any data was processed, preventing false positive chain confirmation that single-method analysis (e.g., naive mediation only) might have produced. Pre-commitment to multi-dimensional rejection criteria is a structural defense against confirmation bias.
The two-sided MSTL filter used in Stage 3 introduces look-ahead bias when applied to lead-lag analysis: the filter at time t uses information from t+k (k > 0). Detected via residual autocorrelation at non-zero lags. One-sided trend estimation (recursive filter with no future information) replacement applied in Stage 7 re-runs.
Stage 4's headline finding "NB Korea leads Global by ~10 weeks" was direction-inverted. Detected via three independent verification tests:
| Test | Method | Result |
|---|---|---|
| 1 — Synthetic probe | Generate series with known +5w lead, run through Stage 4 code | Returns -5w (sign inverted) |
| 2 — Leakage-free replication | Re-compute with one-sided trend, all 4 brands × 2 transformations | 8/8 direction_flipped |
| 3 — Mediation signature | Re-run with corrected direction | %indirect 120.35% → 10.82%, BCa CI width -47% |
Magnitude robust within ±2w of original; direction reversed.
Cascade corrections applied:
| Cascade Source | Dependent Finding | Action |
|---|---|---|
| DP24 | DP9 (Stage 4 lead direction) | Sign-flipped in exploratory_findings.md |
| DP24 | DP21 (3D separation labels) | Direction labels corrected |
| DP24 | DP22 (Sentinel framing) | Deprecated, replaced by DP24 |
| DP24 | KPI 7 | Redefined: Korea trend MA → Global trend MA, with operational scope tooltip |
| DP24 | Migration 011 | Database retrofit applied |
| DP24 | Stage 8 dashboard Tab 5 | Chain diagram direction corrected, Global MA panel added |
Stage 7 climax narrative:
"Stage 7's value lies less in the hypothesis test result (chain rejected) than in the verification process. Three critical findings were surfaced without external reviewer, indicating analytical discipline transferable to enterprise contexts."
This is a methodology validation stage — its primary deliverable is the analytical governance pattern (DP20 → DP23 → DP24 cascade) itself, registered as KPI 13 (governance asset).
6-tab production dashboard with 13 KPIs, dual data source architecture (PostgreSQL local development / CSV fallback for Streamlit Cloud), automatic fallback on DB import failure.
Sub-stage breakdown:
| Sub-stage | Scope |
|---|---|
| 8.0 | Directory structure + 5-tab skeleton + KPI mapping |
| 8.1 | Tab 1 Weekly KPI — cards + search trend + SoV + 530 dependency |
| 8.2 | Tab 2 Season — YoY overlay + seasonal pattern + product mix |
| 8.3 | Tab 3 Channel — SoV Korea vs Global + divergence + product gap |
| 8.4 | Tab 4 Anomaly — 3-way timeline + method agreement + event stacking |
| 8.5 | Tab 5 Forecast & Bridge — Prophet 26w + 4-way RMSE + chain diagram + Global MA |
| 8.6 | CSI data connected (raw.ecos_raw) + deprecation fixes |
| 8.7 | CSV export pipeline + CSV fallback verified |
| 8.8 | Streamlit Cloud deploy — live URL active |
| 8.9 | Methodology Doc visual redesign — 5-dim heatmap + DP20→23→24 cascade + DP24 signatures + 4-Layer cards |
Each analytical task is solved with a Traditional baseline + Advanced/Foundation alternative for empirical comparison:
| Task | Traditional | Advanced | SOTA / Foundation | Winning Approach |
|---|---|---|---|---|
| Seasonal Decomposition | STL (period=52) | MSTL (52w + 13w) | — | MSTL (48-75% residual variance reduction across all 8 series) |
| Causal Inference | Linear Granger | VARX + Mediation (block bootstrap) | 5-dim Orthogonal Null | 5-dim (chain hypothesis rejected with 11 tests across 5 dimensions) |
| Anomaly Detection | Rolling z-score | MSTL residual z-score | Isolation Forest + Tier structure | Tier 1 (z-score independent validation only, 71.4% precision) |
| Demand Forecasting | SARIMAX (Fourier K=2 + CSI) | Prophet (changepoint detection) | LSTM Enc-Dec / Chronos foundation model | Prophet (4-way ablation isolates changepoint as primary factor) |
| Sentiment Analysis | Keyword dictionary | Hybrid (keyword + Claude API routing) | KcBERT fine-tuning (deferred) | Hybrid (35.8 KRW per 10K texts, 95% of LLM-only accuracy) |
| Lead-Lag Analysis | Cross-correlation | DTW (3-way: raw/trend/residual) | — | Trend DTW (raw DTW reveals seasonal artifact, +48w → +10w correction) |
Full LLM API for 10,000 texts would cost ~$10. The hybrid approach uses keyword dictionary for 97.4% of clear cases (|score| ≥ 0.3) and routes only the 2.6% ambiguous cases to Claude API via Batch API (50% discount). Total cost: 35.8 KRW, maintaining 95% of LLM-only accuracy on validation samples. This demonstrates case-by-case tool selection + cost governance — core BDC operational skills, transferable to any cost-sensitive analytical workflow.
Prophet's automatic changepoint detection captures structural breaks that SARIMAX (parametric, fixed AR order) and LSTM (severe overfitting at 467:1) cannot. The 4-way ablation isolates this as the dominant factor — not the model class itself, but the algorithm's ability to handle non-stationary trend changes in event-driven series. This finding generalizes: for short, event-driven business time series, models with explicit changepoint mechanisms outperform both classical statistical and deep learning approaches.
A research-grade methodology asset developed in Stage 7 to guard against false positive causal chain confirmation. Five orthogonal dimensions of testing combine to 11 specific statistical tests:
- Mediation — Korea → Global direction (joint block bootstrap, BCa CI)
- Mediation — Global → Korea direction (joint block bootstrap, BCa CI)
- VARX bidirectional — Direct impulse response with macro exogenous control
- Monthly Granger bidirectional — Resolution sensitivity check (weekly noise control)
- Lagged cointegration bidirectional — Engle-Granger long-run equilibrium test
If the chain hypothesis holds, all 5 dimensions should produce convergent evidence. If even one rejects, the hypothesis is provisionally rejected pending additional investigation. This framework is registered as KPI 12 (methodology asset) for transferable use in future causal analyses.
Across 8 stages, 24 quantified data points form a cumulative diagnosis of NB Korea's market structure and the project's methodology validation findings.
| # | Finding | Quantified Evidence | Source |
|---|---|---|---|
| 1 | Dual-season structure (52w + 13w) | MSTL residual variance reduction 48-75% across all 8 series | Stage 3 |
| 2 | SS/FW Ratio reversal | 530 Korea STL ratio 1.18 → MSTL 0.86 (FW-dominant) | Stage 3 |
| 3 | Korea FW vs Global SS asymmetry | All 5 Korea NB products FW (0.71-0.99), 4/5 Global products SS (1.07-1.44) | Stage 3 |
| 4 | 574 cross-region anchor | Korea +82.28% divergence vs Global, both regions rank #2 | Stage 2-3 |
| 5 | NB Shopping Search 4.5% (channel disconnect) | NB 4.5%, Adidas 24.5%, Nike 32.1% — app direct entry | Stage 3 |
| 6 | D2C decline | slope -0.191/week, p=0.000005 (Stage 0 -0.179 confirmed and intensified) | Stage 0 H5 → Stage 3 |
| 7 | NB sentiment lowest | +0.289 vs Nike +0.474, neutral 45.6% (highest) | Stage 3 |
| 8 | Search-sentiment asymmetry | Search +34.94% over-index but sentiment lowest of 4 brands | Stage 2 + 3 |
Stage 3 Integrated Diagnosis: "Demand exists but conversion fails." Three dimensions confirm:
- Season: FW demand concentrates but momentum doesn't expand (Stage 0 H6: padding 1.2 vs North Face 13.9)
- Channel: Search high but D2C/Shopping channels weak (D2C -0.191/week, Shopping 4.5%)
- Sentiment: Search demand far exceeds product engagement signal
The "demand-conversion gap" is hypothesized to drive 530 dependency intensification (+2.36pp/Q): consumers cannot find conversion satisfaction in non-530 models, defaulting back to 530 as the only product line with consistent positive engagement.
| # | Finding | Quantified Evidence | Source |
|---|---|---|---|
| 9 | Global trend leads Korea ~10w (sign-corrected) | DTW +10.4w, CC +9w convergence, validated via 3 independent tests | Stage 4 → Stage 7 correction |
| 10 | NB CSI→Search single direction | Korea lag 3 p=0.0229, Global lag 2 p=0.0321, both robust | Stage 4 |
| 11 | Celebrity-driven search dominance | Single weeks combine 3+ celebrity endorsements, exceed typical launch impact | Stage 5 |
| 12 | CSI dips → multi-brand search dips | 2024-12 CSI -12.7pt matched co-occurring NB+Puma anomalies | Stage 5 |
| 13 | Air Max Day competitor spillover | Nike absorbs predictable annual event; adidas/NB anomalies appear same week | Stage 5 |
| 14 | Olympics-driven supra-seasonal demand | 2024 spring 4-week global spike across all brands | Stage 5 |
| 15 | Global ARIMA(0,1,0) — no autoregressive structure | Pure random walk, all predictive power from exogenous | Stage 6 |
| 16 | CSI elasticity asymmetry | Korea coefficient 3.98 vs Global 2.01 (~2x more sensitive) | Stage 6 SARIMAX |
| 17 | Foundation model anti-scaling | Chronos-base (200M) underperforms small (46M) on 174-week context | Stage 6 |
| 18 | Prophet changepoint dominance | 4-way ablation isolates changepoint detection as primary driver of Prophet's win | Stage 6 |
Stage 4 Integrated Diagnosis (sign-corrected): NB Korea is a follower market driven by external signals. The Global trend serves as a 10-week monitoring indicator for Korean demand — a dashboard reference value, not a forecast model input. This distinction is critical for KPI 7's operational scope.
Stage 6 mechanism finding: Prophet 4-way ablation reveals automatic changepoint detection as dominant. Removing changepoint capability (Prophet without changepoints = essentially Fourier+CSI baseline) collapses Prophet's win, confirming that the win is method-driven, not data-luck.
| # | Finding | Type | Source |
|---|---|---|---|
| 19 | 5-dim orthogonal null × 11 tests reject chain hypothesis | Statistical | Stage 7 Track A1 |
| 20 | Mediation spurious correlation pre-emption pattern | Methodology | Stage 7 Track A0 |
| 21 | Three-dimensional separation robust (sign-corrected) | Statistical | Stage 7 Track A1' |
| 22 | Sentinel framing — DEPRECATED | Methodology | Stage 7 (replaced by DP24) |
| 23 | MSTL forward-looking leakage | Methodology | Stage 7 self-diagnostic |
| 24 | Stage 4 sign convention inversion | Methodology | Stage 7 cascade detection |
The project's analytical governance produces transferable methodology assets beyond the immediate business diagnosis:
| Finding | Stage | Quantified Evidence |
|---|---|---|
| z-score 91.3% false positive | 5 | 63/69 Stage 2 anomalies were quarterly seasonal artifacts |
| MSTL dual-season structure | 3 | 48-75% residual variance reduction across all 8 series |
| Stage 4 sign convention error caught | 7 | Synthetic probe + leakage-free replication + mediation signature (3 independent tests) |
| Prophet changepoint dominance | 6 | 4-way ablation isolates changepoint detection as primary driver |
| Foundation model anti-scaling | 6 | Chronos-base underperforms small on 174-week context, empirically verified |
| Inconsistent mediation signature | 7 | %indirect 120.35% (sign error) → 10.82% (corrected) |
| MSTL forward-looking leakage | 7 | Two-sided filter introduces look-ahead bias, one-sided replacement applied |
The original 3-stage chain hypothesis (CSI → NB Korea → NB Global) was rejected through 5-dimension orthogonal null testing. However, the 4-layer narrative restructures Stage 7's deliverable around its actual contribution:
Layer 1: Original Hypothesis (CSI → Korea → Global)
↓ Tested via 5-dim × 11 statistical tests
Layer 2: Hypothesis Rejection (statistical result)
↓ Reframed as: "What did the verification process surface?"
Layer 3: Three Self-Diagnostic Findings (DP20, DP23, DP24)
↓ Cascaded into: Sign correction + leakage fix + retroactive validation framework
Layer 4: Methodology Validation Stage Pattern (KPI 13 — governance asset)
↓ Final deliverable: Analytical governance discipline transferable to enterprise contexts
"Stage 7's value lies less in the hypothesis test result (chain rejected) than in the verification process. Three critical findings were surfaced without external reviewer, indicating analytical discipline transferable to enterprise contexts."
Live URL: https://sportswear-brand-monitor-newbalance.streamlit.app/
| Tab | Content | KPIs |
|---|---|---|
| Weekly KPI | KPI cards (search index, SoV, 530 dependency, CSI snapshot) + 4-brand search trend (174w) + SoV stacked area + 530 ratio over time with 60% threshold line | 1, 2, 3, 5, 6, 10 |
| Season | Season position indicator + YoY overlay (this year vs last year) + seasonal decomposition pattern + NB product mix breakdown | 4 |
| Channel | SoV Korea vs Global side-by-side + divergence bar chart (574/2002R cross-region anchors) + product gap analysis | 2 |
| Anomaly | 222 anomaly records across 4 detection methods (z-score 69 + MSTL z2.0 70 + MSTL z3.0 11 + Isolation Forest 72) + interactive timeline + method agreement matrix + event stacking density visualization | 4, 8, 9 |
| Forecast & Bridge | Prophet 26-week forecast + 4-way RMSE comparison bar chart + chain diagram (sign-corrected) + Global trend 4-week MA monitoring panel with operational scope tooltip | 6, 7, 10, 11 |
| Methodology Doc | 5-dim orthogonal null heatmap + DP20→DP23→DP24 cascade flow + DP24 verification signatures (3 tests) + 4-Layer Narrative cards (color-coded borders) | 12, 13 |
KPI 7 in Tab 5 explicitly distinguishes monitoring use (dashboard visualization, ✓) from predictive use (forecast model input, ✗) per Stage 7 sign-correction findings — a Migration 011 retrofit applied directly to the dashboard layer.
Architecture:
- Dual data source: PostgreSQL (local dev) / CSV fallback (Streamlit Cloud)
- CSV export pipeline:
dashboard/export_csv.pyruns 7 mart queries + 4 forecast CSVs - Toggle:
USE_CSV_FALLBACKenv var or auto-fallback on DB import failure - DB connection: shared
database.connection.get_conn()context manager - Conditional import:
queries.pyskipspsycopg2when CSV mode active
Each KPI has been promoted from a confirmed finding through the pipeline. Operational use (alert / monitoring / methodology) is explicitly tagged.
| # | KPI | Operational Use | Source |
|---|---|---|---|
| 1 | 530 quarterly dependency | Alert when > 60% | Stage 0 H3 → Stage 2 |
| 2 | D2C search share weekly trend | Alert on 4-week consecutive decline | Stage 0 H5 → Stage 3 |
| 3 | Category-level NB share gap | Auto-detect under-indexed categories (e.g., padding 1.2 vs North Face 13.9) | Stage 0 H6 |
| 4 | NB product sentiment ratio | Quarterly tracking of positive/neutral/negative shifts | Stage 3 |
| 5 | Search-sentiment gap | Monitor over-index vs sentiment ranking divergence | Stage 2 + 3 |
| 6 | CSI 3-month moving average | Trigger NB search defense campaign on downturn | Stage 4 |
| 7 | NB Global trend 4-week MA | Korea demand monitoring indicator (✓ dashboard, ✗ forecast input) — sign-corrected | Stage 7 DP24 |
| 8 | Event Stacking Density | Same-week overlapping events vs search spike magnitude | Stage 5 |
| 9 | Anomaly detection method agreement | Monthly tracking of 3-way agreement count | Stage 5 |
| 10 | CSI elasticity monitoring | CSI variation × Korea elasticity (3.98) → estimated search impact | Stage 6 |
| 11 | 26-week search forecast (Prophet baseline) | Quarterly forecast with CSI + Fourier K=2 + changepoints | Stage 6 |
| 12 | 5-Dimension Orthogonal Null Verification Pattern | Methodology asset — VARX × monthly Granger × mediation × cointegration 11-test framework | Stage 7 |
| 13 | Methodology Validation Stage Pattern | Governance asset — DP20 (pre-emptive) → DP23 (self-diagnostic) → DP24 (verification byproduct) cascade | Stage 7 |
KPIs 12 and 13 are not weekly monitoring metrics but transferable analytical assets — they represent the project's contribution to methodology and governance practice itself.
sportswear-brand-monitor/
├── README.md # this file
├── docs/
│ ├── sportswear_brand_monitor_project_plan_v3.md
│ ├── methodology.md # §1-12 analytical decisions narrative
│ ├── exploratory_findings.md # 24 data points + diagnoses
│ ├── data_feasibility_report.md # Stage 0 spike report
│ ├── stage1_checkpoint.md ~ stage8_checkpoint.md
│ └── coding_conventions.md
│
├── database/
│ ├── schema_init.sql # 4 schemas: raw, staging, mart, public
│ ├── connection.py # ThreadedConnectionPool context manager
│ └── migrations/
│ ├── 008_korea_global_lag.sql
│ ├── 009_granger_results.sql
│ ├── 010_events_calendar_origin.sql
│ └── 011_kpi7_sign_correction.sql # Stage 7 retrofit
│
├── collectors/ # Stage 1
│ ├── collector_naver_datalab.py # 4 keyword groups, UPSERT pattern
│ ├── collector_google_trends.py # 8 CSV stitched
│ ├── collector_youtube.py # API + --clear pattern
│ ├── collector_naver_blog.py # blog + cafe, post_date for time series
│ ├── collector_ecos.py # CSI macro
│ └── collector_financials.py # T1-T4 source tier governance
│
├── notebooks/sql/ # Stage 2 (8 SQL files)
│ ├── 01_staging_search_weekly.sql # UNION ALL across 4 sources
│ ├── 02_staging_social_weekly.sql # JSON extraction, aggregation
│ ├── 03_mart_brand_kpi.sql # LAG window functions, WoW/MoM/YoY
│ ├── 04_mart_sov_analysis.sql # SUM OVER PARTITION BY for SoV
│ ├── 05_mart_seasonal_classifier.sql # CASE WHEN for SS/FW classification
│ ├── 06_mart_anomaly_residuals.sql # rolling moving avg, std deviation
│ ├── 07_mart_top_bottom_ranking.sql # RANK, DENSE_RANK
│ └── 08_mart_korea_global_join.sql # multi-key time-aligned joins
│
├── analysis/ # Stage 3-7
│ ├── seasonal_decomposer.py # MSTL primary, STL ablation
│ ├── sentiment_dictionary.py # 3-way keyword (Product/Channel/Resale)
│ ├── sentiment_hybrid.py # Keyword + Claude Batch API routing
│ ├── granger_search_sales.py # 8 brand × region × 4 lags × 2 directions
│ ├── granger_robustness.py # non-stationary differencing check
│ ├── dtw_korea_global.py # 3-way DTW (raw/trend/residual)
│ ├── anomaly_mstl_residual.py # |z| > 2.0 on MSTL residual
│ ├── anomaly_isolation_forest.py # contamination=0.05
│ ├── anomaly_comparison.py # 3-way confusion matrix + Tier structure
│ ├── anomaly_event_matching.py # strict window matching + evaluation
│ ├── load_events_calendar.py # CSV → staging.events_calendar
│ ├── update_matched_event_id.py # mart.anomaly_log FK update
│ ├── forecast_data_prep.py # CSI forward-fill, train/test split
│ ├── forecast_sarimax.py # Fourier K=2 + CSI exogenous
│ ├── forecast_prophet.py # changepoint + CSI regressor
│ ├── forecast_lstm.py # 2-layer stacked, 13w lookback
│ ├── forecast_chronos.py # zero-shot small + base
│ ├── forecast_comparison.py # 4-way RMSE/MAPE + ablation
│ ├── bridge_orthogonal_null.py # 5-dim 11 tests
│ ├── bridge_mediation.py # joint block bootstrap, BCa CI
│ └── bridge_sign_correction.py # DP24 3-test verification
│
├── dashboard/ # Stage 8
│ ├── app.py # main entry, sys.path setup
│ ├── config.py # env vars, fallback toggle
│ ├── export_csv.py # PostgreSQL → CSV pipeline
│ ├── data/
│ │ ├── queries.py # mart queries (conditional psycopg2 import)
│ │ └── chain_diagram.py # JSON → Plotly visualization
│ ├── tabs/
│ │ ├── tab1_weekly_kpi.py
│ │ ├── tab2_season.py
│ │ ├── tab3_channel.py
│ │ ├── tab4_anomaly.py
│ │ ├── tab5_forecast_bridge.py
│ │ └── methodology_doc.py
│ ├── components/
│ │ ├── kpi_card.py # custom KPI card with delta + tooltip
│ │ └── tooltip.py # operational scope tooltip
│ └── assets/
│ └── style.css
│
├── data/
│ ├── events_calendar.csv # 25 curated events (scheduled + investigated)
│ ├── anomaly/ # 11 CSV outputs from Stage 5
│ ├── forecast/ # 4-way model outputs
│ ├── exports/ # CSV fallback for Streamlit Cloud
│ └── bridge/ # Stage 7 outputs
│
└── figures/ # 60+ visualizations
├── exploratory/ # H1-H6 hypothesis plots
├── seasonal/ # MSTL decomposition × 8 series
├── var/irf/ # 8 impulse response plots
├── dtw/ # 4 brands × 3 variants alignment
├── anomaly/ # 5 comparison plots
├── forecast/ # 4-way comparison + ablation
└── bridge/ # 5-dim heatmap + cascade flow
# Root-level Stage 0 spike scripts (preserved as reproducibility trail)
quick_exploratory_pass.py # 6 hypothesis tests
spike_feasibility.py # data feasibility verification
spike_check5_gtrends.py # Google Trends viability
spike_check_h4_h5_rerun.py # H4/H5 re-verification
stitch_gtrends.py # 8 CSV stitching utility
docker-compose.yml # PostgreSQL 16 setup
.env.example # API key template
requirements.txt # Streamlit Cloud deps
requirements_dashboard.txt # full local dev deps
Every major analytical decision is documented in methodology.md (§1-12) with consistent structure:
- Original design (what was initially planned)
- Verification protocol (how it was tested)
- Decision (which approach was adopted)
- Cascade impact (which downstream stages are affected)
This produces a self-documenting audit trail. Each stage's checkpoint references methodology sections, and methodology sections reference verification figures and code modules. The result is a navigable graph of decisions, evidence, and consequences.
Stage 7 represents the project's analytical governance climax. Three critical findings were surfaced through self-diagnostic without external review:
| Finding | Type | Detection Method |
|---|---|---|
| DP20 — Pre-emptive design block | Methodology | 5-dim framework designed before data processing prevented confirmation bias |
| DP23 — MSTL forward-looking leakage | Methodology | Detected via residual autocorrelation at non-zero lags |
| DP24 — Stage 4 sign convention error | Methodology | 3 independent tests (synthetic + leakage-free + mediation signature) |
These findings would have been invisible without deliberate self-diagnostic protocols. The Methodology Validation Stage Pattern (DP20 → DP23 → DP24 cascade) is itself a transferable analytical asset, registered as KPI 13.
Public artifacts: Aggregated metrics, statistical analysis results, and visualization artifacts are committed for portfolio reproducibility (data/anomaly/, data/bridge/, data/exports/, data/forecast/).
Local-only artifacts: Raw social media text (Naver blog/cafe content) and Batch API request/response payloads are gitignored due to source licensing. Sentiment analysis results are integrated into mart.sentiment_static and surfaced via sentiment_aggregator.py — raw text is not required for reproducibility.
Trace preservation: Pre-DP24 sign-corrected analysis artifacts (e.g., bridge_global_enhanced.py, mediation_bootstrap.json, triple_comparison_*.png) are retained as evidence of analytical governance discipline. See docs/stage7_checkpoint.md §12.4.4 (DP22 deprecation pattern + stage4_checkpoint footnote).
Database: PostgreSQL schema in database/schema_init.sql + numbered migrations 003~011. Migrations 001/002 are integrated into schema_init.sql (consolidated initial schema). Migration 011 (sign correction retrofit) applied 2026-05-03 — see docs/stage7_checkpoint.md §12.4.4 for cascade detail.
- Docker (for PostgreSQL 16) or local Postgres
- Python 3.11
- macOS / Linux (tested on M2 MacBook 16GB)
# 1. Clone repository
git clone https://github.com/ChangyeolAidenOh/sportswear-brand-monitor.git
cd sportswear-brand-monitor
# 2. Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# 3. Start PostgreSQL
docker compose up -d
# 4. Initialize schema
psql -h localhost -p 5433 -U nb_admin -d nb_monitor -f database/schema_init.sql
# 5. Configure API keys
cp .env.example .env
# Edit .env with NAVER_CLIENT_ID, ANTHROPIC_API_KEY, etc.# Stage 1: Data collection
python -m collectors.collector_naver_datalab
python -m collectors.collector_google_trends
python -m collectors.collector_youtube
python -m collectors.collector_naver_blog
python -m collectors.collector_ecos
python -m collectors.collector_financials
# Stage 2: SQL ETL (sequential 01-08)
for f in notebooks/sql/0*.sql; do
psql -h localhost -p 5433 -U nb_admin -d nb_monitor -f "$f"
done
# Stage 3: Seasonal decomposition + sentiment
python analysis/seasonal_decomposer.py
python analysis/sentiment_hybrid.py
# Stage 4: Granger + DTW
python analysis/granger_search_sales.py
python analysis/granger_robustness.py
python analysis/dtw_korea_global.py
# Stage 5: Anomaly detection
python analysis/anomaly_mstl_residual.py
python analysis/anomaly_isolation_forest.py
python analysis/anomaly_comparison.py
python analysis/load_events_calendar.py
python analysis/anomaly_event_matching.py
# Stage 6: Forecasting
python analysis/forecast_data_prep.py
python analysis/forecast_sarimax.py
python analysis/forecast_prophet.py
python analysis/forecast_lstm.py
python analysis/forecast_chronos.py
python analysis/forecast_comparison.py
# Stage 7: Bridge validation
python analysis/bridge_orthogonal_null.py
python analysis/bridge_mediation.py
python analysis/bridge_sign_correction.py
# Stage 8: Dashboard (local)
streamlit run dashboard/app.pyVisit https://sportswear-brand-monitor-newbalance.streamlit.app/
If the dashboard appears to be sleeping, it will wake within 30 seconds on first request.
Core libraries:
pandas, numpy, scipy
statsmodels # SARIMAX, MSTL, Granger, VARX, Johansen
scikit-learn # IsolationForest, DBSCAN
prophet # Forecasting with changepoint detection
torch # LSTM
transformers # HuggingFace
chronos-forecasting # Amazon foundation model
tslearn # DTW
psycopg2-binary
sqlalchemy
streamlit
plotly
anthropic # Claude Batch API
python-dotenv
Full list in requirements.txt (root, for Streamlit Cloud) and requirements_dashboard.txt (full local dev).
| Stage | Status | Deliverable |
|---|---|---|
| Stage 0 — Data Feasibility + Quick Exploratory | COMPLETED | 6 hypothesis verdicts, viability thresholds |
| Stage 1 — Data Collection | COMPLETED | 6 collectors, raw 23,292 rows |
| Stage 2 — SQL ETL | COMPLETED | 8 SQL files, mart 5 tables + 5 views |
| Stage 3 — Seasonal + Sentiment | COMPLETED | MSTL primary, hybrid sentiment 35.8 KRW |
| Stage 4 — Leading Indicator | COMPLETED (sign-corrected in Stage 7) | 4-pattern Granger, 3-way DTW |
| Stage 5 — Anomaly Detection | COMPLETED | 91.3% false positive finding, Tier structure, 25 events |
| Stage 6 — Forecasting | COMPLETED | 4-way comparison, Prophet wins, ablation |
| Stage 7 — Bridge Validation | COMPLETED | 5-dim orthogonal null, DP20/23/24 cascade |
| Stage 8 — Streamlit Dashboard | COMPLETED, live URL active | 6 tabs, 13 KPIs, dual data source |
| Stage 8b — Power BI Service | DEFERRED | Streamlit provides equivalent interactive layer |
| Stage 9 — Weekly PDF Report | DEFERRED | Streamlit dashboard covers same KPIs interactively |
Deferred items rationale: Streamlit dashboard provides equivalent or richer interactive visualization compared to Power BI / static PDF. Both deferred items are designed to be implementable within 2 days post-employment, integrated with the existing mart layer via the same data contract.