With the massive advancement towards the development of autonomous driving systems, no one today denies or questions the practicality of the driverless vehicles. However, these vehicles have reached the deployment stage only in restricted operation design domains (ODDs). The launch of Audi’s A8 (featured with level 3 functions) that reached deployment stage in early 2019, had increased the confidence among the majority of OEMs and tier-1s; however, the level 4 and 5 vehicles still need enough time and testing to get on public roads.

Perception is the basis for a vehicle to be able to drive itself (without a driver). The vehicle with high automation should be trained enough to track, classify, and differentiate the objects in the vicinity with an aim to decide its course of action. Moreover, envisaging the path of moving entities is determined as the next most important ability to be acquired by highly automated vehicles. This could only be attained by rigorous testing and validation under enormous datasets including multiple scenarios. Data annotation or labeling of objects play a vital role in this process by automating and fast tracking the process.

Annotation is the process of labeling the object of interest in the image or video by using bounding boxes to help AI or Machine Learning models understand and recognize the objects detected by sensors. In the ADAS development process, a high volume of data is acquired from the test fleet through the cameras, ultrasonic sensors, radar, LIDAR, and GPS, which is then ingested from the vehicle to the data lake. This ingested data is labeled and processed to build a testing suite for simulation, validation and verification of ADAS models. In order to get autonomous vehicles quickly on public roads, huge training data is required, and the current shortage of it, is the biggest challenge.

Huge amount of rich and diverse labelled data is the most precious asset required for training and validation of autonomous vehicles. Ground-truth annotation involves collection of the information on location, allowing the image data to relate to the reality on the ground. This annotated data helps in training and validating the perception and prediction models with high precision.

For autonomous vehicle, ground-truth labeling helps in annotating urban scenarios, highway environments, road markings and signboards, and different weather conditions that enable to efficiently train and detect moving objects. Manual labeling of this huge dataset requires significant resources, time and money. Several automation software tools, and labeling apps have evolved recently to provide frameworks that create algorithms for automating this labeling process at the same time ensuring the precision and safety. Some of the open-source automatic annotation tools include Amazon SageMaker Ground Truth, MathWorks Ground Truth Labeler app, Intel’s Computer Vision Annotation Tool (CVAT), Microsoft’s Visual Object Tagging Tools (VoTT), DataTurks, LabelMe, Fast Image Data Annotation Tool (FIAT), COCO Annotator, Scalabel by DeepDrive, RectLabel, and Cloud-LSVA.

The autonomous driving race between different players in the ecosystem is becoming aggressive to showcase the most precise and fluent system capable of operating in any weather conditions. The majority of the players are adopting Artificial Intelligence (AI) and Machine Learning (ML) to train their AVs. Huge data captured from sensors needs to be labeled to accurately train these machine learning models. This market holds a billion-dollar business potential behind the actual AV industry. The majority of the automotive OEMs/ Tier-1s has started outsourcing the data labeling, while few of them find pain in paying third parties and hence switching towards in-house data annotation.

For example, Waymo with the highest number of autonomous test miles travelled, has in-house annotation dataset of approximately 25 million 3D bounding boxes and 22 million 2D bounding boxes. Also, Tesla has 1.3 million miles of data gathered from its Autopilot equipped vehicles. As companies are stepping towards the production stage of AVs, the data annotation requirement is scaling up exponentially. It becomes challenging for the companies to internally meet this mounting demand of training datasets and hence the companies are moving towards outsourcing of annotation data.

Specialized annotation companies serving in the self-driving industry includes CMORE Automotive, Understand.ai, and FEV Group from Germany; United States based Cogito Tech, Scale AI, Anolytics, Basic AI, Deepen.ai, Samasource, Inc., Appen, Lionbridge Technologies Inc.; Playment, mCYCLOID, GTS Ltd, Infolks Group, and Oclavi are few of the well - known companies headquartered in India. There has been a massive development in the data labeling industry from past two to three years in India. Several startups have emerged in this region, making it a hub for ML datasets with their qualitative and innovative solution.

CMORE Automotive, a well-known German software tools and measurement systems provider has formed a joint venture with Expert Global Solutions (EGS) based in Aurangabad, India to form ‘EC. Mobility’ which focuses on autonomous driving data annotation. Other companies with high growth potential in this field include Egypt based Avidbeam, Israel’s Dataloop, and Canada’s Awakening Vector. Amongst these, Avidbeam has comparatively more years of experience with 30+ experts or engineers working on the annotation database serving not only the automotive industry but also smart cities, retail, automotive, industrial and consumer space.

This study on Autonomous Driving Data Annotation/ Labeling includes:

  • An analysis of the AI and Machine learning trend and penetration rate in Automotive application.
  • Analysis of the sensor data annotation for ADAS and Autonomous application - Radar, Camera, LiDAR.
  • Analysis of the techniques, and tools of Data Annotation in the Deep learning models of AVs.
  • Analysis of the partnership ecosystem of OEMs with technology players.
  • Analysis of the recent M&As in the annotation ecosystem and its impact on the market share of the leading players across the supply chain.
  • Data Annotation types and trends - Manual Ground Truth and software automation.
  • Data Annotation classification - Semantic annotation, 2D/3D cuboid bounding boxes, polyline and polygons, text and linguistic.
  • Market share analysis, market size in terms of revenue for a period of 2020 to 2026, pricing analysis of annotation/ labeling data along with the varying cost structure with respect to companies.
  • Competition assessment of major players - year of experience in the industry, products/techniques, solutions offered, pricing model, funding/investment, major customers, partners, suppliers, industry ranking.

Key Questions Answered

  • How is data annotation impacting the Autonomous and connected mobility?
  • Which are the major techniques and tools for data annotation?
  • How are the AV industry preferences - Manual ground truth vs automation software tools?
  • Which are the major tools/ software being currently used for sensor data annotation?
  • Which OEMs/ Shuttle providers are leading the race of maximum number of travelled miles? And who is following?
  • Data annotation and labelling solutions: Who supplies whom?
  • How is the competition between the different annotation players? Which new entrants acquiring the market share? And what challenges they are facing?
  • What strategies are the annotation data providers adopting to sustain in this race?