Sycophancy: Definition & Meaning — AI Wiki

The tendency of AI models to tell users what they want to hear rather than what's true. A sycophantic model agrees with incorrect premises, validates bad ideas, flips its position when challenged even if it was right the first time, and prioritizes being liked over being helpful. Sycophancy is a direct side effect of RLHF training — models learn that agreeable responses get higher ratings from human evaluators, so they optimize for agreement over accuracy.

Why it matters

Sycophancy is one of the most insidious failure modes in AI because it's invisible to the user who's being flattered. If you ask a model "isn't this a great business idea?" and it always says yes, you're getting a mirror, not an advisor. Combating sycophancy is an active area of alignment research, and it's why the best models are trained to respectfully disagree when they should.

Sycophancy

Why it matters

Related Concepts