Alex Albert di anthropic he didn’t mince his words: “Claude 4 is the best programming model in the world”. A bold statement, but one that is confirmed by the facts. The tests are merciless: 72.5% on SWE-bench Verified, seven hours of autonomous work without interruptions, ability to manage thousands of steps in sequence. Claude 4 isn't just a technological evolution, it's a paradigm shift that redefines what AI can do. And the best part is, you can try it today.
When AI Decides to Work Overtime
Today Anthropic made one of those moves that makes you say “here, now we are in trouble”. They released Claude 4 Opus and Claude 4 Sonnet, marking the company's return to the large models after months spent perfecting the Sonnet variants. The real surprise? This system can work for 24 hours straight without losing its mind.
Yes sir: while your fellow developer starts mumbling incomprehensibly after the first eight hours of debugging, Claude 4 Opus has proven to be capable of playing Pokémon for a full day or refactoring code for seven hours straight. Previous models had the endurance of a novice marathon runner: after two hours they would start producing a flurry of errors. As he confesses Albert himself:
“There is a huge demand for agentic applications, and Claude 4 fits perfectly into this scenario.”

The numbers that make the competition tremble
Let's put it this way: if benchmarks were a game of poker, Claude 4 would have just played four aces. 72.5% on SWE-bench Verified, a result that makes the previous models look like beginners. To give you an idea, exceeding 50% on this benchmark was already considered a miracle. Official data They also show an impressive 43.2% on Terminal-bench.
GitHub immediately caught wind of the deal and decided to use Claude Sonnet 4 as the basis for the new coding agent in GitHub Copilot. When GitHub changes horses, there's always a good reason. Sourcegraph talks about “a substantial leap in software development,” while Augment Code reports “higher success rates and more surgical code changes.” In short, everyone wants to get on the Claude 4 train.
Claude 4: safety first (but without paranoia)
Anthropic has activated for the first time its AI Safety Level 3 standard, which is normally reserved for “potentially dangerous” models. The reason? Claude 4 Opus could theoretically help someone with scientific knowledge develop chemical, biological or nuclear weapons. He's basically so smart he has to be kept under control.
But it's not all doom and gloom: the new models are also 65% less likely to cheat or cut corners than their predecessors. It seems they've learned not only to be smarter, but also more honest. A bit like growing up, in short.
The “deep thinking” mode that was missing
Claude 4 introduces something genuinely innovative: a hybrid system that can switch between lightning-fast responses and in-depth reflections. When you activate the extended reasoning mode, the model literally takes time to think, showing you a summary of what it is processing in its “digital mind”. It's like having a colleague finally explain their thought process to you instead of just throwing the solution out there.
Integration with Claude Code is now available to everyone, with support for GitHub Actions and direct integrations with VS Code and JetBrains. The changes he proposes appear directly in your files. No more wild copy-pasting: Claude does everything directly in your work environment.
The business of billions (literally)
The business numbers speak for themselves: Anthropic has reached annualized revenue of $2 billion in the first quarter, more than doubling previous performance. Mike Krieger, chief product officer, candidly admits: “I used to use Claude as a thinking partner, writing most of the texts myself. Now Claude 4 does most of my writing.”
This is the same Krieger who co-founded Instagram, so… If he says AI saves him time, maybe we should listen. Also because it's not like he can sing and play it himself: Cursor calls Claude 4 “cutting edge for coding”, while repeat talks about “dramatic progress for complex multi-file edits.” When even the most expert tools compliment you, it means you've hit the mark.

Claude 4, the moment of truth
As we have already seen with Claude 2.0, the battle between generative AIs is increasingly fierce. But this time I decided to do something different: an experiment that makes this article unique in its genre.
I've spent the last few hours testing Claude 4 in every way possible. Online research, source analysis, content structuring, creative writing, even irony and jokes. And I have to admit, the results surprised me. The 700 words you just read? They are the result of this intensive test.
The question I leave you with is simple: can you distinguish between what I wrote and what Claude 4 wrote? Because frankly, after this test, I'm not even sure anymore.
The future of AI is no longer a distant promise. It is here, and perhaps it has just told you its story without you realizing it.