2023 is the first year of AI LLM
Last updated
Last updated
Artificial intelligence has entered the era of "data mining". With the proposal and application of various machine learning algorithms, especially the development of deep learning technology, people hope that machines can automatically learn knowledge and achieve a level of intelligence through large amounts of data analysis. With the improvement of computer hardware and the development of big data analysis technology, the level of data collection, storage and processing by machines has been greatly improved.
With the continuous advancement of computer technology and the expansion of application fields, the global artificial intelligence market is developing rapidly. It is estimated that by 2026, the global market size will reach US$484 billion.
2023 is the year of the large language model (LLM). OpenAIβs GPT-4 shocked the world. Compared with the plain text GPT-3 and subsequent versions, GPT-4 is multi-modal: it is trained on text and images; except Among other functions, it can generate text based on images. It launched with 8,192 tokens, already surpassing the previous best GPT-3.5 in terms of possible input sizes. It is also trained using RLHF, which is at the core of the success of state-of-the-art LLM.
OpenAI conducted a comprehensive evaluation of GPT-4, not only against classic NLP benchmarks, but also against exams designed to evaluate humans (e.g. Bar exam, GRE, Leetcode).
GPT-4 solves some tasks that GPT-3.5 cannot, such as the Uniform Bar Examination, where GPT-4 scored 90%, while GPT-3.5 scored 10%. In most tasks, the added visual component will have only a small impact, but in others it can be a big help. The factual accuracy on the adversarial truth dataset is actually 40% higher than the previous best ChatGPT model.
On February 16, 2024, OpenAI launched Sora, a large Vincent video model, which can directly generate Vincent pictures and convert the pictures into vivid and lifelike dynamic videos. The most shocking thing about Sora is that it produces realistic content that is in line with people's common sense, which means that it can deeply learn and understand the interactions between many elements of the world.
Some professional organizations pointed out that if Sora is viewed from the perspective of "understanding the world", then the image quality and picture relationship of a certain frame of image are by no means a criterion for judging the quality of the model. Even the 60-second one-shot video released on the official website is not The core part. The real point is that there are different camera positions in the video. No matter far, medium, close, special, or wide, the relationship between the characters and the background in the video remains quite consistent.
The technical report released by OpenAI attributes Sora's powerful capabilities to the diffusion model based on the "converter" and the technical ability to convert visual data into a usable unified format. The emergence of Sora lays the foundation for models that can understand and simulate the real world. This ability will become an important milestone in achieving more efficient general artificial intelligence.