DeepSeek's Surprisingly Affordable (Yet Massively Funded) AI Competitor
DeepSeek, a Chinese AI startup, has made waves with its new chatbot, boasting impressive capabilities at a purportedly low cost. The chatbot's introduction: "Hi, I was created so you can ask anything and get an answer that might even surprise you," reflects its ambition. Its impact is already evident, contributing to a significant drop in NVIDIA's stock price.
Image: ensigame.com
DeepSeek V3's success stems from its innovative architecture and training methods, incorporating:
- Multi-token Prediction (MTP): Predicts multiple words simultaneously, boosting accuracy and efficiency.
- Mixture of Experts (MoE): Utilizes 256 neural networks, activating eight for each token, accelerating training and improving performance.
- Multi-head Latent Attention (MLA): Repeatedly focuses on key sentence parts, minimizing information loss and enhancing nuanced understanding.
DeepSeek initially claimed a training cost of just $6 million using 2048 GPUs. However, SemiAnalysis revealed a far larger infrastructure: approximately 50,000 Nvidia Hopper GPUs (including H800, H100, and H20 units) spread across multiple data centers, totaling around $1.6 billion in server investment and $944 million in operational expenses.
Image: ensigame.com
A subsidiary of the High-Flyer hedge fund, DeepSeek owns its data centers, unlike cloud-reliant competitors, fostering faster innovation and optimization. Its self-funded nature contributes to agility and rapid decision-making. The company attracts top talent, with some researchers earning over $1.3 million annually, primarily from Chinese universities.
Image: ensigame.com
The $6 million figure, representing only pre-training GPU costs, significantly understates the overall investment, exceeding $500 million since its inception. DeepSeek's lean structure, however, allows for efficient innovation compared to larger, more bureaucratic competitors.
Image: ensigame.com
DeepSeek's success highlights the potential of well-funded independent AI companies. While the "revolutionary budget" claim is arguably inflated, its cost-effectiveness relative to competitors (e.g., $5 million for R1 vs. $100 million for ChatGPT4o) is undeniable. Its achievement is attributed to substantial investment, technological advancements, and a highly skilled team.