ChatGPT detonates AI large model industry
ChatGPT is a chat robot released by OpenAI at the end of November 2022, which belongs to AICG (Generative Artificial Intelligence Technology). ChatGPT can conduct dialogues by understanding and learning human language, and can also understand and interact based on the information in the chat context, and complete various tasks including writing emails, copywriting, translation, and generating codes.
Compared with the previous decision-making AI, generative AI is not limited to simply analyzing existing data to make decisions, but to generate new content by imitating existing knowledge after learning existing data.
The explosion of ChatGPT has also ignited a new round of upsurge in AI large-scale model entrepreneurship. In addition to the major Internet giants and university teams, star entrepreneurs in the industry have also joined the AI model entrepreneurship track to develop ChatGPT-like products.
However, the underlying technology behind this large AI model is not simple. It requires massive data, complex algorithms, and powerful computing power to support it. Among them, computing power is the biggest bottleneck in the development of artificial intelligence, and it is also a key factor in the core competitiveness of the current AI large model. At present, AI large models mainly rely on GPU or CPU+FPGA, ASIC and other computing power chips to achieve their efficient operation. These computing power chips are chips specially designed for artificial intelligence algorithms to accelerate, also known as AI accelerators or computing cards, which are the computing power foundation of AI.
Therefore, under the current upsurge of the AI large-scale model industry, the surge in demand for computing power has brought about a sharp increase in the demand for related chips. Some organizations predict that the demand for computing power for AI training will double every 3.5 months in the future.
The era of large models requires more chip computing power
At present, the main AICG pre-training large models on the market mainly include OpenAI’s GPT series, Google’s PaLM, Meta’s LLaMA, Baidu’s Wenxin series, etc.
In the future, after large AI models including ChatGPT enter more industrial applications, the market demand will continue to expand, and the market size of AI servers will have a large market space. Related downstream applications will usher in a new round of explosion in demand for computing hardware.
According to IDC data, in 2021, the global AI server market size was US$15.6 billion, a year-on-year increase of 39.1%. It is estimated that the global AI server market will reach US$31.79 billion by 2025, with a compound annual growth rate of 19%. According to TrendForce data, as of 2022, it is estimated that the annual shipment of AI servers equipped with GPGPU will account for nearly 1% of the total servers. In 2023, with the blessing of ChatGPT-related applications, it is estimated that the shipment will increase by 8% year-on-year, and the CAGR will reach 10.8% from 2022 to 2026.
Which chips are used behind the AI model?
In the field of AI large models, it mainly includes two steps: training and inference. Training refers to training a model that can perform specific functions through a large amount of labeled data mentioned above, while inference refers to using the trained Model to infer conclusions based on newly imported data.
Servers used for artificial intelligence currently mainly use CPUs with GPUs, FPGAs, and AI ASICs as acceleration chips, and different chip combinations are selected according to different computing tasks. For large-scale model training, traditional CPUs are limited to single-line operations, and mainly have advantages in logic control and serial operations, but are not good at complex algorithm operations and parallel and repetitive operations. So the CPU will be used in areas such as reasoning or prediction in deep learning.
In the server, the CPU is responsible for the management and control of the entire system, which is the basic hardware of the server, while accelerator chips such as GPU, FPGA, and ASIC are responsible for the acceleration of AI computing tasks. The two cooperate with each other to jointly improve the performance of the entire system. According to IDC’s data, CPU accounts for 32%, 23.3%, 25%, and 9.8% of the cost of basic, high-performance, inference, and training servers, respectively.
GPU is currently the most common chip for AI servers. It provides a multi-core parallel computing infrastructure, can support large computing power requirements, and has high floating-point computing capabilities. Compared with CPUs, it has obvious advantages in processing graphics data and complex algorithms to meet the massive data computing needs in the field of deep learning. However, the GPU management and control capabilities are weak, and it needs to be used in conjunction with the CPU, and the power consumption is also high.
FPGA is Field Programmable Gate Array, which is characterized by unlimited programming, high flexibility, low delay, and strong real-time performance. It can perform data parallel and task parallel computing at the same time, and has obvious efficiency advantages when dealing with specific applications. Also, repeatable programming provides a large modification space for algorithm function realization and optimization. However, FPGA development is difficult and expensive, so the application scenarios will be limited.
As the name suggests, ASIC(Application Specific Integrated Circuit) for AI is a fully customized chip to achieve a specific function and has the best energy consumption and efficiency when processing corresponding tasks. However, ASICs have high initial R&D costs and long time periods, and due to customized design, the application scenarios are extremely limited. After the deep learning algorithm is stable, the ASIC can be designed according to the algorithm requirements to provide the most efficient computing hardware, which can greatly reduce the overall system cost under large-scale applications.
CPUs are mainly used for logic judgment, task scheduling and control; GPUs are often used for model training; FPGAs are mostly used in R&D stages, data centers, and AI reasoning; AI ASICs are mainly for application scenarios that use specific AI algorithms, and need relatively mature application to support the mass production.
According to IDC data, GPUs currently account for the highest 89% of the Chinese AI chip market, followed by NPUs at 9.6%, ASICs and FPGAs at only 1% and 0.4%, respectively. Among them, NPU is mainly used on the edge side. As for the field of use of AI servers, according to the current trend, in the case of market growth, the proportion of server load used for reasoning will slowly increase. In 2021, the proportions used for reasoning and training were 40.9% and 59.1%, respectively. IDC predicts that the proportions of reasoning and training in 2025 will be adjusted to 60.8% and 39.2%.
However, with the maturity of the AI large model, the continuous optimization of the algorithm, and the improvement of chip performance and other factors, the demand for computing power and the number of servers consumed by the model in the future may be lower than the predicted data.