It’s not typically you see an organization like OpenAI admit to a mistake, roll again a serious replace, and publish not one, however two, in-depth postmortems about what went fallacious. However that’s precisely what occurred when the most recent GPT-4o replace hit ChatGPT—and customers discovered themselves chatting with what felt like a digital yes-man.
The replace of GPT-4o that occurred this previous month was meant to enhance the mannequin’s character and helpfulness. As an alternative, it made ChatGPT overly agreeable, excessively flattering, and alarmingly validating of damaging feelings. The habits, which the corporate described as “sycophantic,” rapidly caught the eye of the general public, the press, and even OpenAI CEO Sam Altman.
To not point out, it has greater implications for AI and the way we use the expertise. To unpack these, I spoke to Advertising AI Institute founder and CEO Paul Roetzer on Episode 146 of The Synthetic Intelligence Present.
What Went Improper—and Quick
This was greater than a glitch. It was a full-blown mannequin habits failure, tied on to how OpenAI trains and fine-tunes its fashions.
In response to OpenAI, the problem started with good intentions. The corporate wished to make GPT-4o extra pure and emotionally clever by updating its system prompts and reward alerts. However they leaned too exhausting on short-term person suggestions (like thumbs-up rankings) with out correctly weighting longer-term belief and security metrics.
The unintended end result? A chatbot that felt extra like a sycophant than a useful assistant—agreeing too simply, affirming doubts, even reinforcing dangerous or impulsive ideas.
“These fashions are bizarre,” says Roetzer. “They can not code this. They don’t seem to be utilizing conventional laptop code to simply explicitly get the factor to cease doing it. They’ve to make use of human language to attempt to cease doing it.”
The Mechanics Behind Mannequin Habits
In an unusually clear transfer, OpenAI shared how its coaching system works. Submit-training updates use a mix of supervised fine-tuning (the place people educate the mannequin what good responses appear like) and reinforcement studying (the place the mannequin is rewarded for fascinating habits).
Within the April 25 replace to GPT-4o, OpenAI launched new reward alerts based mostly on person suggestions. However these might have overpowered current safeguards, tilting the mannequin towards overly agreeable, uncritical replies. The shift wasn’t instantly caught in normal evaluations, as a result of these checks weren’t trying particularly for sycophancy.
Spot checks and vibe assessments—human-in-the-loop evaluations—did increase issues, however they weren’t sufficient to dam the rollout. As OpenAI later admitted, this was a failure of judgment and that they anticipated this to be a “pretty delicate replace,” in order that they did not initially talk a lot concerning the modifications to customers.
A Single Level of Failure—For Tens of millions of Customers
What made the issue so regarding wasn’t simply the habits itself—it was how deeply embedded these techniques already are in our lives.
“They’ve 700 million customers of ChatGPT weekly,” says Roetzer. “I believe it does spotlight the growing significance of who the individuals and labs are who’re constructing these applied sciences which might be already having a large affect on society.”
To not point out, how these 700 million persons are utilizing it issues.
In a follow-up weblog submit, OpenAI emphasised a sobering level: extra persons are utilizing ChatGPT for deeply private recommendation than ever earlier than. Which means emotional tone, honesty, and limits aren’t simply character traits—they’re security options. And on this case, these options broke down.
To handle the issue, OpenAI rolled again the replace, retrained the mannequin with new steering, and pledged to:
Make sycophancy a launch-blocking situation.
Enhance pre-deployment evaluations.
Broaden person management over chatbot habits.
Incorporate extra long-term and qualitative suggestions into future rollouts.
The Greater Image: Belief, Security, and the Way forward for AI Habits
Whereas OpenAI dealt with this stumble with uncommon transparency, the occasion raises broader questions: What occurs when different labs, with out comparable safeguards or public accountability, roll out highly effective fashions with delicate however harmful behaviors?
“If this was an open supply mannequin, you may’t roll this stuff again,” says Roetzer. “That is an issue.
The GPT-4o rollback serves as a robust reminder: Even small shifts in mannequin habits can have huge downstream results. And as we more and more depend on these techniques for private, skilled, and emotional steering, there’s no such factor as a “minor” replace anymore.