Weak-to-Strong Generalization studies the phenomenon where:
A strong model trained on weak supervision can match or even surpass the performance of the weak source.
This paradigm appears across:
- Large Language Models (LLMs)
- Multimodal Models (VLMs, Video Models)
- Alignment & Preference Learning
- Agent-based Systems
- β Real-world supervision is often noisy
- β High-quality labels are expensive
- β Weak signals are cheap and scalable
π W2SG provides a new scaling axis:
improving performance without improving supervision quality
| Paper | Venue | Setting | Weak Source | Strong Gain |
|---|---|---|---|---|
| Weak-to-Strong Generalization: Eliciting Strong Capabilities with Weak Supervision | ICLR 2024 | Alignment | Weak reward model | β |
| Self-Improving Language Models | ICML 2023 | Self-training | Model itself | β |
| RLAIF: Learning from AI Feedback | NeurIPS 2023 | Alignment | AI feedback | β |
| Direct Preference Optimization (DPO) | NeurIPS 2023 | Alignment | Weak preferences | β |
| Constitutional AI | NeurIPS 2023 | Alignment | Rule-based feedback | β |
| Distilling Step-by-Step Reasoning | NeurIPS 2022 | Reasoning | Weak CoT | β |
| Teaching Small Models to Reason | arXiv | Reasoning | Large model CoT | β |
| STaR: Bootstrapping Reasoning | NeurIPS 2022 | Self-training | Self-generated CoT | β |
| Self-Consistency Improves CoT | ICLR 2023 | Reasoning | Multiple weak paths | β |
| Noisy Student Training | CVPR 2020 | Vision | Noisy pseudo-labels | β |
| Pseudo-Labeling for Semi-Supervised Learning | ICML | SSL | Weak labels | β |
| FixMatch | NeurIPS 2020 | SSL | Augmented weak labels | β |
| Knowledge Distillation | NeurIPS 2015 | Distillation | Teacher logits | β |
| Born-Again Neural Networks | ICML 2018 | Distillation | Same architecture | β |
| When Does Student Surpass Teacher? | ICML | Distillation | Weak teacher | β |
| VideoCoCa / Flamingo-style works | NeurIPS | Multimodal | Weak alignment | β |
| BLIP / BLIP-2 | ICML 2023 | Multimodal | Noisy captions | β |
| LLaVA | NeurIPS 2024 | Multimodal | GPT-generated data | β |
| Voyager (Minecraft Agent) | NeurIPS 2023 | Agent | Weak exploration | β |
| ReAct | ICLR 2023 | Agent | Prompted reasoning | β |
| Reflexion | NeurIPS 2023 | Agent | Self-feedback | β |
We organize the literature into the following categories:
- Learning from weak preference signals
- Reward modeling with imperfect annotators
- DPO / RLHF under weak supervision
- Pseudo-labeling
- Iterative self-improvement
- Teacher-student refinement loops
- When student > teacher
- Capacity vs supervision mismatch
- Knowledge reconstruction
- Learning reasoning from weak CoT
- Implicit reasoning recovery
- Latent structure induction
- Weak labels for video understanding
- Noisy grounding signals
- Cross-modal alignment
- Weak planners β strong policies
- Learning from suboptimal trajectories
- Preference learning in interactive systems
- Weak-to-Strong Generalization: Eliciting Strong Capabilities with Weak Supervision
- Self-Improving Language Models
- Distilling Step-by-Step Reasoning
- RLAIF / DPO related works
(ζη»ζ΄ζ°οΌζ¬’θΏ PR)
- Paper A
- Paper B
- Paper C
- Paper D
- Strong models rely on pretrained priors
- Weak supervision contains partial structure
- Iterative refinement is critical
- Overfitting to weak signals is a major failure mode
- β When does weak-to-strong fail?
- β How to measure beyond-teacher generalization?
- β Robustness under biased weak signals
- β Scaling laws for weak supervision
- Benchmarks
- Codebases
- Datasets
We welcome:
- π New papers
- π§© Taxonomy improvements
- π Benchmarks / repos
Please submit a PR!
If you find this repo useful, consider giving it a star β
MIT
- Haonan Zhang
Weak supervision is not just noisyβit is compressed knowledge.
