A training paradigm where an AI agent learns by interacting with an environment, taking actions, and receiving rewards or penalties. Unlike supervised learning (which learns from labeled examples), RL learns from experience — through trial and error. RL trained AlphaGo to beat world champions, teaches robots to walk, and is the "RL" in RLHF that makes chatbots helpful.
Why it matters
Reinforcement learning is how AI learns to act, not just predict. It's the bridge between models that can answer questions and agents that can accomplish goals. Every AI system that plans, strategizes, or optimizes over time has RL somewhere in its lineage.