MooreThreads' First Thousand-Card Intelligent Computing Center Launched, Tailored for Large Models

Jan 02, 2024

On December 19th, MooreThreads unveiled its first domestic thousand-card and hundred-billion model training platform—the MooreThreads KUAE Intelligent Computing Center—in Beijing. This marks the official launch of the country's first large-scale computing cluster based on a domestically produced full-function GPU. Concurrently, MooreThreads, in collaboration with numerous partners, initiated and established the MooreThreads PES-KUAE Intelligent Computing Alliance and the MooreThreads PES-Large Model Ecosystem Alliance, aiming to strengthen the integrated ecosystem from intelligent computing infrastructure to large model training and inference, and to accelerate the development of the large model industry in China.

During the keynote speech, CEO Zhang Jianzhong made significant announcements, including the introduction of the large model intelligent computing acceleration card MTT S4000, and the MooreThreads KUAE platform, which provides robust support for the training and inference of hundred-billion parameter large models. He stated, "The official operation of the MooreThreads KUAE Intelligent Computing Center is an important milestone in the company's development. MooreThreads has built an intelligent computing product line ranging from chips to graphics cards to clusters. Relying on the diverse computing advantages of a full-function GPU, our goal is to meet the growing demand for large model training and inference with green, secure intelligent computing power, and to vigorously promote the application of AIGC, digital twins, physical simulation, metaverse, and other multimodal applications across various industries."

MTT S4000: Designed for Large Models with Dual Focus on Training and Inference

The MooreThreads large model intelligent computing acceleration card MTT S4000 is equipped with the third-generation MUSA core, supporting 48GB of video memory and 768GB/s of video memory bandwidth per card. Based on MooreThreads' proprietary MTLink1.0 technology, the MTT S4000 can support multi-card interconnectivity, facilitating distributed computing acceleration for hundred-billion parameter large models. Additionally, the MTT S4000 delivers advanced graphics rendering capabilities, video encoding/decoding, and ultra-high-definition 8K HDR display features, aiding AI computing, graphics rendering, multimedia, and other comprehensive application scenarios. Importantly, with MooreThreads' proprietary MUSIFY development tools, the MTT S4000 computing card can fully leverage the existing CUDA software ecosystem, enabling zero-cost migration of CUDA code to the MUSA platform.

KUAE Intelligent Computing Center Solution: Turnkey Integration of Hardware and Software

The MooreThreads KUAE Intelligent Computing Center solution is based on a full-function GPU and represents a fully integrated hardware and software full-stack solution. It includes the KUAE computing cluster infrastructure, the KUAE Platform cluster management platform, and the KUAE ModelStudio model services, aiming to solve the construction and operational management issues of large-scale GPU computing power with an integrated delivery approach. The solution is designed to be ready to use out of the box, significantly reducing the time and cost associated with traditional computing power construction, application development, and operation and maintenance platform setup, thereby enabling quick market entry and commercial operations.

Infrastructure includes the KUAE computing cluster, RDMA network, and distributed storage. The unveiled MooreThreads KUAE thousand-card model training platform, which can be built in just 30 days, supports pre-training, fine-tuning, and inference for hundred-billion parameter models and can achieve up to a 91% thousand-card cluster performance scaling factor. Based on the MTT S4000 and the dual-route eight-card GPU server MCCX D800, the MooreThreads KUAE cluster supports seamless expansion from single-machine multi-card to multi-machine multi-card setups, from a single card to a thousand-card cluster, and plans to introduce larger clusters in the future to meet the growing demands for large model training.
KUAE Platform cluster management platform: An integrated software and hardware platform for AI large model training, distributed graphics rendering, streaming media processing, and scientific computing, deeply integrating full-function GPU computing, networking, and storage to provide highly reliable and high-performance services. Through this platform, users can flexibly manage multi-data center and multi-cluster computing resources, integrate multidimensional operation and maintenance monitoring, alerting, and logging systems, helping intelligent computing centers achieve operational automation.
KUAE ModelStudio model services: Covering the entire process of pre-training, fine-tuning, and inference for large models, supporting all mainstream open-source large models. With MooreThreads' MUSIFY development tools, CUDA application ecosystems can be easily reused, and the built-in containerized solution enables one-click deployment via API. The platform aims to provide large model lifecycle management, and through a simple and easy-to-operate interactive interface, users can organize workflows as needed, significantly lowering the barrier to using large models.

MooreThreads KUAE Thousand-Card Cluster: Multiple Advantages Powering Efficient Large Model Training

Distributed parallel computing is a key means to achieve AI large model training. MooreThreads' KUAE supports industry-leading distributed frameworks including DeepSpeed, Megatron-DeepSpeed, Colossal-AI, and FlagScale, integrating various parallel computing strategies such as data parallelism, tensor parallelism, pipeline parallelism, and ZeRO, with additional optimizations for efficient communication computation parallelism and Flash Attention.

Currently, MooreThreads supports the training and fine-tuning of various mainstream large models including LLaMA, GLM, Aquila, Baichuan, GPT, Bloom, and YuYan. Utilizing the MooreThreads KUAE thousand-card cluster, large models with parameters ranging from 70B to 130B achieve a linear acceleration ratio of up to 91%, with essentially unchanged computing power utilization. For instance, with a training dataset of 200 billion, the Institute for Artificial Intelligence's 70 billion-parameter Aquila2 model can be trained in 33 days, and a 130 billion-parameter model can be trained in 56 days. Additionally, the MooreThreads KUAE thousand-card cluster supports long-term continuous stable operation, breakpoint resumption of training, and asynchronous Checkpoints taking less than two minutes.

With its comprehensive advantages including high compatibility, stability, scalability, and efficient utilization of computing power, the MooreThreads KUAE thousand-card computing cluster will become a solid and reliable infrastructure for large model training.

Intelligent Computing and Large Model Ecosystem Alliance: Collaboration Promotes Ecosystem Integration

In the era of large models, intelligent computing power represented by GPUs is the cornerstone and the center of the generative AI world. MooreThreads, together with more than a dozen companies including China Mobile Beijing, China Telecom Beijing, Lenovo, Century Internet, GDS Services, ChinaLink Data, Shudao Intelligent Computing, Zhifa Zhiyuan, Qishang Online, BIDR Beijing Digital Economy Computing Power Center, Unigroup Hengyue, Rayhoo Industry Holding (Shandong), CERNET, Zhongjin Financial, Zhongyun Intelligent Computing, and Jinzhou Yuanhang (listed in no particular order), has announced the establishment of the "MooreThreads PES-KUAE Intelligent Computing Alliance." The alliance will focus on building and promoting a fully domestically-produced intelligent computing platform, from the underlying hardware to software, tools, and applications, aiming to achieve high utilization rates of clusters and become the preferred choice for large model training with full-stack intelligent computing solutions that are user-friendly and easy to use.

At the event, MooreThreads signed contracts with ChinaLink Data and Shudao Intelligent Computing on-site and jointly unveiled the MooreThreads KUAE Intelligent Computing Center. More than 200 attendees witnessed this significant moment.

Ecosystem is Key for Artificial Intelligence Application Breakthroughs. To this end, MooreThreads, in collaboration with partners including 360, PaddlePaddle, JD Speech, Zhipu AI, Cosine, Wuwen XinQiong, Dipu Technology, NetEase, Tsinghua University, Fudan University, Zhejiang University, Beijing Institute of Technology, Lingyun Optoelectronics, RealAI, and Nanwei Software (listed in no particular order), has initiated and established the "MooreThreads PES-Large Model Ecosystem Alliance." Centered around the MUSA-based integrated hardware and software large model solutions, MooreThreads will actively work with a broad range of ecosystem partners on compatibility adaptation and technical tuning, collectively promoting the comprehensive prosperity of the domestic large model ecosystem.

In the final roundtable discussion, MooreThreads Vice President Dong Longfei and heavyweight guests including Wall Hu, Chairman of CNCEC Green Digital Technology (Zhongwei) Co., Ltd., Zhang Peng, CEO of Zhipu AI, Pei Jiquan, Chief AI Scientist of JD Cloud, Zhai Ying, Managing Director of China Renaissance Capital, Wu Hengkui, Founder of Cosine, and Zhen Jian, Chairman of Shudao Intelligent Computing, engaged in an in-depth discussion on topics such as current large model computing power needs and the construction and operation of intelligent computing centers. The guests unanimously agreed that the intelligent computing center should not just be a collection of hardware but a test of the integration capability of the GPU-based intelligent computing system. Issues like GPU distributed computing system adaptation, computing cluster management, and the application of efficient inference engines are crucial factors for improving the usability of a computing center. The development of domestic intelligent computing centers also depends on fully integrating the needs and strengths of all parties; industry focus is required to achieve synergy within the entire ecosystem, driving the development of domestic initiatives forward.

Whisper of the Dragon

Discussion about this post