Baidu Dives into AI Chip Development to Meet Super Model Demand

访客 2025-11-17 16:10:43 40205 抢沙发

默认

百度进军AI芯片开发领域，随着大型模型对超级计算机的需求不断增长，该公司致力于研发能够满足日益增长的计算需求的先进芯片，以支持其AI技术的进一步发展，这一举措将促进AI技术的普及和应用，推动相关产业的进步。

Baidu Inc (BIDU.O) is intensifying its investment in in-house chip development as the artificial intelligence industry grapples with an uneven value chain that heavily favors hardware over applications.

Speaking at the Baidu World Conference, founder Robin Li described the AI industry’s structure as “extremely unhealthy and unsustainable,” noting that while chips capture the bulk of revenue, it is applications that generate actual value.

“To capture ten or even a hundred times more value at the model or application layer, companies must regain control over the chip layer,” Li said.

Baidu is not alone in this approach. Global tech giants such as Amazon, Microsoft, Google, and OpenAI, as well as domestic firms including Alibaba, Huawei, and Tencent, have all embarked on in-house chip strategies to counter restrictions imposed by Nvidia Corp (NVDA.O) and other suppliers.

Baidu’s Kunlun Chip team, founded in 2011, initially focused on computational acceleration using FPGAs for early AI applications like AlexNet and speech recognition models. With the rise of large-scale recommendation systems, Baidu began developing its own custom chips through the Kunlun project.

In 2021, Kunlun Chip was spun off from Baidu Group to focus on next-generation AI hardware optimized for large models. Products such as the P800 have become central to Baidu’s large model training and inference operations.

At this year’s conference, Shen Dou, president of Baidu Intelligent Cloud Business Group, introduced two new AI chips — the Kunlun M100 and M300 — alongside plans for “supernodes” designed to connect hundreds or thousands of GPU cards into high-performance clusters.

The advent of Transformer-based models has standardized AI architectures, creating clearer targets for chip developers. Standardization allows the entire supply chain to optimize costs and performance, creating a virtuous cycle where better chips drive more advanced applications, which in turn increase demand for compute power.

However, the rapid expansion of model sizes, sometimes reaching trillions of parameters, has dramatically increased the demand for computing resources, energy, and infrastructure. This has created unprecedented challenges for chip design, particularly around efficiency and scale.

Reducing computational precision — from BF16 to FP8 or FP4 — allows manufacturers to significantly increase performance by sacrificing redundant accuracy. Meanwhile, chip architecture must evolve in tandem with changes in model structures to maintain performance efficiency.

Baidu is now focused on integrating individual chips into large-scale systems known as supernodes. These configurations link dozens or hundreds of GPU cards within a single server, dramatically reducing costs compared with standalone deployments.

Scaling these systems introduces new engineering challenges. For instance, a system with thousands of GPUs can tolerate 98% stability, but as deployments scale to tens of thousands of cards, even minor disruptions can trigger system-wide failures. Verifying accuracy at such scales often requires months of costly testing.

“AI computing is no longer just stacking GPUs,” Shen said. “It has entered a new era of engineering and scientific exploration.”

Kunlunxin has now produced three generations of chips. The first focused on internal Baidu data centers, the second targeted enterprise customers, and the third generation is optimized for the demands of large AI models.

Most of Baidu’s inference tasks for large models are now handled by Kunlunxin P800 clusters. With over 10,000 GPUs deployed across multiple clusters, the company says it can train increasingly complex multimodal models efficiently.

The newly announced M100 is designed for large-scale inference scenarios and optimized for MoE (Mixture of Experts) models. It is expected to launch in early 2026. The M300, slated for 2027, will support both inference and ultra-large-scale training, targeting multimodal AI workloads.

The Kunlunxin software stack is compatible with mainstream deep learning frameworks, including CUDA, allowing customers in telecom, energy, finance, and internet sectors to integrate the chips into their operations. Reported clients include China Merchants Bank, China Southern Power Grid, China Iron & Steel Research Institute, China Oil & Gas Pipeline Network, Geely Auto, and leading Chinese internet firms. Deployment scales range from dozens to tens of thousands of GPUs.

Baidu first launched 32-card and 64-card P800 supernodes in April 2025. The Tianchi 256 integrates 256 P800 cards into a single node, quadrupling interconnect bandwidth and improving performance by more than 50%. Tianchi 512 doubles this card count and bandwidth, enabling training of trillion-parameter models.

Future supernodes, including 1,000-card and 4,000-card configurations, will leverage the newly launched M-series chips, starting in the second half of 2027. Shen said Kunlunxin plans to release new products annually over the next five years.

“While the power of a single chip is the foundation, large model training and inference require multiple chips working in close coordination,” Shen said. “Supernodes enable dozens or even hundreds of chips to operate like a single superchip, maximizing communication efficiency.”

Baidu’s efforts reflect a broader trend of AI companies moving to control the hardware that underpins next-generation models. By combining chip development, system engineering, and software optimization, firms hope to reduce dependency on foreign suppliers, increase efficiency, and capture more value from AI applications.

As AI models grow in size and complexity, companies that can integrate hardware, software, and large-scale systems are likely to maintain a competitive advantage.

标签： inference training