Table of Content
Definitions
1 Overview of AI Foundation Models
1.1 Introduction to AI Models
Definition and Features of AI Models
Classification of AI Models by Architecture
Classification of AI Models by Task Type/Training Method
Classification of AI Models by Supervision Mode
Classification of AI Models by Modality
Application Process of AI Models
1.2 Introduction to Foundation Models
Classification of Foundation Models
Current Development of Foundation Models in Automotive Industry
Application Scenarios of Foundation Models in Automotive Industry
Application Case 1: Application of LLM in Autonomous Driving
Application Case 2: Application of VFM in Autonomous Driving
Application Case 3: Application of MFM in Autonomous Driving
2 Analysis of AI Foundation Models of Differing Types
2.1 Large Language Models (LLM)
Development History of LLM
Key Capabilities of LLM
Cases of Integration with Other Models
2.2 Multimodal Large Language Models (MLLM)
Development and Overview of Large Multimodal Models
Large Multimodal Models VS. Large Single-modal Models (1)
Large Multimodal Models VS. Large Single-modal Models (2)
Technology Panorama of Large Multimodal Models
Multimodal Information Representation
Multimodal Large Language Models (MLLM)
Architecture and Core Components of MLLM
Status Quo of MLLM
Dataset Evaluation by Different MLLM Representatives
Reasoning Capabilities of MLLM
Synergy between MLLM and Agent
Application Case 1: Application of MLLM in VQA
Application Case 2: Application of MLLM in Autonomous Driving
2.3 Vision-Language Models (VLM) and Vision-Language-Action (VLA) Models
Development History of VLM
Application of VLM
Architecture of VLM
Evolution of VLM in Intelligent Driving
Application Scenarios of VLM: End-to-end Autonomous Driving
Application Scenarios of VLM: Combination with Gaussian Framework
VLM?VLA
VLA Models
Principles of VLA
Classification of VLA Models
Application Cases of VLA (1)
Application Cases of VLA (2)
Application Cases of VLA (3)
Application Cases of VLA (4)
Case 1: Core Functions of End-to-End Multimodal Model for Autonomous Driving (EMMA)
Case 2: World Model Construction
Case 3: Improve Vision-Language Navigation Capabilities
Case 4: VLA Generalization Enhancement
Case 5: Computing Overhead of VLA
2.4 World Models
Key Definitions of World Models and Application Development
Basic Architecture of World Models
Framework Setup and Implementation Challenges of World Models
Video Generation Methods Based on Transformer and Diffusion Models
Technical Principle and Path of WorldDreamer
World Models and End-to-end Intelligent Driving
World Models and End-to-end Intelligent Driving: Data Generation
Case 1: Tesla World Model
Case 2: NVIDIA
Case 3: InfinityDrive
Case 4: Worlds Labs Spatial Intelligence
Case 5: NIO
Case 6: 1X’s "World Model"
3 Common Technologies in AI Foundation Models
Common Foundation Model Algorithms and Architectures
Comparison of Features and Application Scenarios between Foundation Model Algorithms
3.1 Foundation Model Architectures and Related Algorithms
Transformer: Architecture and Features
Transformer: Algorithm Mechanisms
Transformer: Multi-head Attention Mechanisms and Their Variants
KAN: Potential to Replace MLP
KAN: Cases of Integration with Transformer Architecture
MAMBA: Introduction
MAMBA: Architectural Foundations
MAMBA: Latest Developments
MAMBA: Application Scenarios
MAMBA: Cases of Integration with Transformer Architecture
Applicability of CNN in the Era of Foundation Models
Applicability of RNN Variants in the Era of Foundation Models
3.2 Visual Processing Algorithms
Common Vision Algorithms
ViT
CLIP Scenarios and Features
CLIP Workflow
LLaVA Model
3.3 Training and Fine-Tuning Technologies
Foundation Model Training Process
Training Case: Geely’s CPT Enhancement Solution
Instruction Fine-tuning
Training Case: Geely’s Fine-tuning Framework for Multi-round Dialogues
3.4 Reinforcement Learning
Introduction to Reinforcement Learning
Reinforcement Learning Process
Comparison between Some Reinforcement Learning Technology Routes
Cases of Reinforcement Learning (1)-(3)
3.5 Knowledge Graphs
Optimization Directions for Retrieval-Augmented Generation (RAG)
Evolution Directions of RAG (1): KAG
Evolution Directions of RAG (2): CAG
Evolution Directions of RAG (3): GraphRAG
RAG Application Case 1:
RAG Application Case 2:
RAG Application Case 3: Li Auto
RAG Application Case 4: Geely
Comparison between RAG Routes
Function Call
3.6 Reasoning Technologies
Reasoning Process of Transformer Models
Evaluation of Reasoning Capabilities
Three Optimization Directions for Foundation Model Reasoning
Reasoning Task Types (1)
Reasoning Task Types (2)
Reasoning Task Types (3)
Common Reasoning Algorithm 1: CoT
Common Reasoning Algorithm 2: GoT/ToT
Comparison between Common Reasoning Algorithms
Common Reasoning Algorithm 3: PagedAttention
Reasoning Case 1: Geely
Reasoning Case 2: NVIDIA
3.7 Sparsification
Characteristics of MoE Architecture
Principles of MoE Architecture
MoE Training Strategies
Advantages and Challenges of MoE
MoE Models from Different Foundation Model Companies
Evolution Direction of MoE
3.8 Generation Technologies
Introduction to Generative Models
Comparison between Generation Technologies
Case 1: Li Auto
Case 2: XPeng
Case 3: SAIC
4 AI Foundation Model Companies
Development History of Mainstream Foundation Models
Mainstream Foundation Models and Their Companies (Foreign)
Mainstream Foundation Models and Their Companies (Chinese)
Rankings of Evaluated Foundation Models
4.1 OpenAI
Product Layout
Product Iteration History
GPT Series: Features
GPT Series: Architecture
From GPT-4V to 4o
Reasoning Model OpenAI o1
SORA: Features
SORA: Performance Evaluation
SORA: Advantages and Limitations
4.2 Google
Development History of Foundation Models
Typical Model BERT: Architecture
Typical Model BERT: Variants
Gemini Model
Cases of Foundation Models in the Automotive Industry
4.3 Meta
LLAMA3.3
LLAMA Series: Evolution
LLAMA Series: Features
LLAMA Series: Training Methods
LLAMA Series: Alpaca
LLAMA Series: Vicuna
4.4 Anthropic
Claude Performance Evaluation
Claude-based PC-side Agent
4.5 Mistral AI
Expert Model: Architecture
Expert Model: Algorithm Features (1)
Expert Model: Algorithm Features (2)
Large Language Model: Mistral Large 2
4.6 Amazon
Nova Product System
Application Cases of Amazon AI Cloud in the Automotive Industry (1)-(3)
4.7 Stability AI
Product System
Stable Diffusion Architecture Based on Diffusion Models
Comparison between Stable Diffusion Video Generation Technology with Competitors
4.8 xAI
Product System
Capabilities of xAI Models
Capabilities of Grok-2
Capabilities of Grok-0/1
4.9 Abu Dhabi Technology Innovation Institute
Iteration History of Falcon Model Series
Parameters of Falcon 3 Series
Evaluation of Falcon 3 Series
4.10 SenseTime
Major Foundation Model Product Systems
Major Foundation Model Product Systems
Foundation Model Training Facilities
Functional Scenarios of Foundation Models
Foundation Model Technologies
4.11 Alibaba Cloud
Foundation Model Product System
End-cloud Integration Solutions of Foundation Models
4.12 Baidu AI Cloud
Foundation Model Product System
4.13 Tencent Cloud
Foundation Model Product System
Reasoning Service Solutions (1)-(3)
Generation Scenario Solutions for Foundation Models
Q&A Scenario Solutions for Foundation Models
4.14 ByteDance & Volcano Engine
Doubao Model System
Functional Highlights of Volcano Engine’s Cockpit
4.15 Huawei
Pangu Model Product System
Application Cases of Pangu Models in Data Synthesis
LLM Architecture of Pangu Models
Capabilities of Pangu Models: Multimodal Technology
Capabilities of Pangu Models: Thinking & Reasoning Technology
AI Cloud Services of Pangu Models
4.16 Zhipu AI
Product System
Foundation Model Base in the Automotive Industry
Technical Features
4.17 Flytek
Product System
Functional and Technical Highlights
Cockpit AI System
4.18 DeepSeek
Product System
Technical Inspiration from DeepSeek V3
Technical Highlights of DeepSeek R1
Application Cases of DeepSeek (1)-(3)
5 Application Cases of AI Foundation Models in Automotive
5.1 Cockpit Cases
Lenovo’s AI Vehicle Computing Framework Used in Cockpits
In-cabin Functions of Thundersoft’s Rubik Foundation Model
LLM Empowers Smart Eye’s DMS/OMS Assistance System
Application of DIT in Voice Processing Scenarios
Application of Unisound’s Shanhai Model in Cockpits
Phoenix Auto Intelligence’s Cockpit Smart Brain
5.2 Intelligent Driving Cases
Li Auto: Multimodal Technology in Autonomous Driving (1)
Li Auto: Multimodal Technology in Autonomous Driving (2)
Li Auto: Multimodal Technology in Autonomous Driving (3): Overcoming 2D Limitations
Li Auto: Data Generation Technology (1)
Li Auto: Data Generation Technology (2)
Li Auto: CoT Technology in DriveVLM
Li Auto: Application of Visual Processing
Li Auto: Data Selection
Geely: Application of Visual Processing
Geely: Multimodal Learning Framework
Waymo: Generative World Model GAIA-1
Tesla: Algorithm Architecture (Including NeRF)
Tesla: Skeleton, Neck, and Head of Vision Algorithms
Tesla: Core of Visual System - HydraNet
Giga’s World Model
6 Application Trends of AI Foundation Models
6.1 Data
Trend 1:
Trend 2:
6.2 Algorithm
Trend 1:
Trend 2:
Trend 3
Trend 4:
6.3 Computing Power
Trend 1:
Trend 2:
6.4 Engineering
Trend 1
Trend 2