Autonomous Driving Algorithm Research: BEV Drives Algorithm Revolution, AI Large Model Promotes Algorithm Iteration
The core of the autonomous driving algorithm technical framework is divided into three parts: environment perception, decision planning, and control execution.
Environment perception: convert sensor data into machine language of the scenario where the vehicle is located, which can include object detection, recognition and tracking, environment modeling, motion estimation, etc.;
Decision planning: Based on the output results of perception algorithm, the final behavioral action instructions are given, including behavioral decisions (vehicle following, stopping and overtaking), action decisions (car steering, speed, etc.), path planning, etc.;
Control actuation: according to the output results of decision-making level, the underlying modules are mobilized to issue instructions to the core control components such as accelerator and brake, and promote vehicle to drive according to the planned route.
BEV drives algorithm revolution
In recent years, BEV perception has received extensive attention. BEV model mainly provides a unified space to facilitate the fusion of various tasks and sensors. It has following advantages:
BEV unifies the multimodal data processing dimension and makes multimodal fusion easier
The BEV perception system converts the information obtained from multiple cameras or radars to a bird’s-eye view, and then do tasks such as object detection and instance segmentation, which can more intuitively display the dimension and direction of objects in BEV space.
In 2022, Peking University & Ali proposed a fusion framework of LiDAR and vision - BEVFusion. The processing of radar point clouds and image processing are carried out independently, using neural networks to encode, project to a unified BEV space, and then merge the two in BEV space.
Realize timing information fusion and build 4D space
In the 4D space, the perception algorithm can better complete the perception tasks such as speed measurement, and can transmit the results of motion prediction to the decision and control module.
PhiGent Robotics proposed BEVDet4D in 2022, which is a version based on BEVDet to increase timing fusion. BEVDet4D extends BEVDet by retaining intermediate BEV features of past frames, and then fuses features by aligning and splicing with the current frame, so that time clues can be obtained by querying two candidate features.
Imagine occluded objects to realize object prediction
In the BEV space, the algorithm can predict the occluded area based on prior knowledge, and imagine whether there are objects in the occluded area.
FIERY, proposed by Wayve in cooperation with the University of Cambridge in 2021, is an end-to-end road dynamic object instance prediction algorithm that does not rely on high-precision maps and is only based on aerial views of monocular cameras.
Promoting development of an end-to-end autonomous driving framework
In the BEV space, perception and prediction can be directly optimized end-to-end through neural networks in a unified space, and the results can be obtained at the same time. Not only the perception module, but also the BEV-based planning decision-making module is also the direction of academic research.
In 2022, autonomous driving team of Shanghai Artificial Intelligence Laboratory and the team of associate professor Yan Junchi of Shanghai Jiao Tong University collaborated on paper ST-P3 to propose a spatiotemporal feature learning solution that can simultaneously provide a set of more representative features for perception, prediction and planning tasks.
AI large model drives algorithm iteration
After 2012, deep learning algorithms are widely applied in autonomous driving field. In order to support larger and more complex AI computing needs, AI large models with the characteristics of "huge data, huge computing power, and huge algorithms" were born, which accelerated the iteration speed of algorithms.
Large Model and Intelligent Computing Center
In 2021, HAOMO.AI launched research and landing attempts on large-scale Transformer model, and then gradually applied it on a large scale in projects including multi-modal perception data fusion and cognitive model training. In December 2021, HAOMO.AI released autonomous driving data intelligence system MANA (Chinese name "Snow Lake"), which integrates perception, cognition, labeling, simulation, computing and other aspects. In January 2023, HAOMO.AI together with Volcano Engine unveiled MANA OASIS, a supercomputing center with a total computing power of 670 PFLOPS. After deploying HAOMO.AI’s training platform, OASIS can run various applications including cloud large-scale model training, vehicle-side model training, annotation, and simulation. With the help of MANA OASIS, the five major models of HAOMO.AI have ushered in a new appearance and upgrade.
In August 2022, based on Alibaba Cloud intelligent computing platform, Xpeng Motors built an autonomous driving intelligent computing center "Fuyao", which is dedicated to training of autonomous driving models. In October 2022, Xpeng also announced the introduction of Transformer large model.
In November 2022, Baidu released Wenxin Big Model. Leveraging more than 1 billion parameters, it recognizes thousands of objects, helping to enlarge the scope of semantic recognition. At present, it is mainly used in three aspects: distance vision, multimodality and data mining.