But can you imagine a Wall Street hedge fund deciding to burn billions to build ageneral artificial intelligence? That's exactly what he did High Flyer, a Chinese quantitative fund that has transformed its entire R&D department into DeepSeek in 2023. With a mountain of GPUs accumulated before the US sanctions, the founder Liang Wenfeng bet on under-30 researchers and extreme optimization.
“We are not looking for immediate profits, but answers to the world's most difficult questions,” said Liang.
The result of this philosophy is DeepSeek-R1, an open source model that outperforms OpenAI o1 in mathematics and logic, using 1/10 of the resources by Llama 3.1. The secret? “Making a virtue of necessity”, explains Marina Zhang ofUniversity of Sydney. Without access to the most advanced Nvidia chips, DeepSeek revolutionized the model architecture, creating algorithms that communicate like a jazz orchestra: few instruments, maximum harmony. And now they make the rich (and expensive, too) tremble in terms of energy resources) Western AI world.
Young Geniuses and Patriotism: The Secret (and Slightly Anarchic) Recipe
While Google and Meta they hire veterans (and talents from abroad), DeepSeek is aiming for Beijing and Tsinghua graduates: brains hungry for academic glory, not for golden salaries. “We hire those who have won international awards, even with zero industrial experience”, Liang admits. An approach that pays off: the team developed the Multi-Head Latent Attention, a technique which reduces memory consumption by 40%.
“They are like startups from the 70s: little money, lots of creativity,” he says. Wendy Chang, analyst of the Mercator Institute. “They combined engineering tricks: custom communication schemes, data compression… Stuff that is known, but never used like this”.
And there's an extra ingredient: technological patriotism. “This generation wants to prove that China can innovate despite sanctions”, adds Zhang. A mindset (more or less spontaneous) that transforms limits into springboards.
MLA and Mixture-of-Experts: DeepSeek's Secret Weapons to Beat OpenAI
What makes DeepSeek-R1 so efficient? Three main factors:
- Multi-head Latent Attention (MLA): reduces redundant calculations by focusing attention on key patterns.
- Mixture-of-Experts: activates only specific parts of the neural network depending on the task, like a mechanic who uses only the necessary tools.
- It's open source, at least for now. “It's the only way to regain ground on the West”, explains Chang. “You attract global contributors, you improve the model, you create an ecosystem”. Winning strategy: In 2 months, 20.000 developers contributed to the code.
It's like having a Ferrari engine that consumes like a Panda. Training DeepSeek-R1 costs $ 15 million against i $ 150 million by Meta. A gap that makes Silicon Valley tremble.
US Sanctions? A Boomerang (Maybe)
When the US blocked the export of advanced chips in 2022, many predicted the collapse of Chinese AI. For now, DeepSeek shows that ingenuity beats hardware. “Estimates of what China can do with its resources need to be revised”, warns Chang.
The Chinese model? Extreme optimization + open source + technological nationalism. “If others follow, sanctions will lose their meaning”, concludes Zhang. Meanwhile, the codes may also be Open Source, but DeepSeek does not respond to Wired's emails (let alone ours).
We will definitely hear about it.