- 8 min read
  1. Home
  2. LiDAR
  3. 10 Popular Public LiDAR Datasets for Autonomous Vehicles

10 Popular Public LiDAR Datasets for Autonomous Vehicles

Best 10 open-source LiDAR datasets for self-driving vehicles: essentials, features, and uses.

10 Popular Public LiDAR Datasets for Autonomous Vehicles
10 Popular Public LiDAR Datasets for Autonomous Vehicles

Autonomous driving systems, such as those used by Tesla or Waymo, require high-quality spatial data for training and testing. LiDAR (Light Detection and Ranging) technology provides precise 3D measurements of the environment by emitting laser beams, calculating the time of flight of laser pulses, and determining their location to construct a 3D point cloud consisting of billions of points. These point clouds undergo several processing steps to be usable for self-driving vehicles.

What Is LiDAR? | Complete Guide for AI Applications
A beginner-friendly introduction to LiDAR: what it is, how it works, its types, and application.

LiDAR Essentials and Applications

Constructing a high-quality, consistent LiDAR dataset for self-autonomous cars is a difficult engineering problem that also requires significant time and monetary resources. Fortunately, a number of public LiDAR datasets exist specifically designed for this use case. They eliminate the high cost of data collection, data annotation, and hardware maintenance, enabling researches to focus on core engineering problems, such as improving vehicle safety and accuracy.

In this article, we are going to discuss 10 popular LiDAR datasets. Let's dive in.

Top 10 Open-Source LiDAR Datasets

How does one construct a public, real LiDAR dataset for autonomous vehicles in urban environments in different weather conditions? The intuition is simple, yet the implementation is hard and different: you get a special car equipped with LiDAR, stereo, and 360 RGB cameras, as well as GPS systems, and drive through the busy streets. That's it.

Waymo Car for Collecting LiDAR Data
Waymo Car for Collecting LiDAR Data

But how you actually design and run this car, how you annotate raw 3D point clouds, and how you choose annotation and dataset formats differentiate your LiDAR dataset. The datasets below differ not only in their format or size, but also in collection.

Here is a summary table below:

Name Location Year Annotation Format Size License
KITTI Germany 2012 3D/2D Boxes, Semantic Seg. 15k frames, 80k objects CC BY-NC-SA 4.0
nuScenes USA, Singapore 2019 3D Boxes, Attribute labels 1,000 scenes, 1.4M boxes CC BY-NC-SA 4.0
Waymo Open 6 US Cities 2019 3D/2D Boxes, Key-points 12 million 3D labels Custom Non-Commercial
Argoverse 2 6 US Cities 2021 3D Cuboids, HD Maps 1,000 labeled scenarios CC BY-NC-SA 4.0
Toronto-3D Canada 2020 Semantic Segmentation 78.3 million points CC BY-NC-SA 4.0
PandaSet USA 2021 3D Cuboids, Semantic Seg. 103 scenes, 28 classes Apache 2.0
ApolloScape China 2018 3D Cuboids, Disparity Maps 140k frames, 1k km road Custom Non-Commercial
Oxford RobotCar UK 2014 3D Cuboids, Semantic Seg. 1,000 km recorded driving CC BY-NC-SA 4.0
A2D2 Germany 2020 3D Boxes, Semantic Seg. 41k frames, 38 categories CC BY-ND 4.0
ONCE China 2021 3D Object Detection 1 million frames (15k annotated) CC BY-NC-SA 4.0

1. KITTI

  • Year: 2012, 2015
  • Geography: Karlsruhe, Germany
  • Sensors: 2 grayscale cameras, 2 color cameras, Velodyne LiDAR, GPS/IMU
  • Data Format: BIN, PNG and JSON files
  • Annotations: 3D/2D object detection, tracking, semantic segmentation
  • Size: 7481 point cloud files, 7,518 testing frames, and ~80K labeled objects
  • License: CC BY-NC-SA 4.0
Kitti-360 LiDAR Dataset Frame
Kitti-360 LiDAR Dataset Frame

The KITTI Vision Benchmark Suite is a pioneering dataset (2012, 2015) developed by Karlsruhe Institute of Technology
and Toyota Technological Institute at Chicago. It is widely used for benchmarking autonomous driving algorithms and ML models. The dataset gave rise to other modern, deep LiDAR datasets, such as KITTI-360.

KITTI is an open-source dataset, but you need to create an account and wait for approval before you can gain access. Note that the 3D cuboid annotations are only labeled top-down, meaning they are not accurate in Z-axis. Moreover, only objects visible in the cameras are labeled.

Build better
datasets

Explore how Unitlab AI supports scalable data annotation, model‑assisted labeling, and production‑ready workflows across vision, video, and multimodal AI.

Learn more
Unitlab AI Platform – Data Annotation & Labeling QA

2. nuScenes

  • Year: 2019
  • Geography: Boston (USA), Singapore
  • Sensors: 6 cameras, 5 RADARs, 1 LiDAR, GPS/IMU
  • Data Format: BIN, PNG and JSON files
  • Annotations: 3D/2D object detection, tracking, semantic segmentation
  • Size: 1000 scenes of 20s each, 1,400,000 camera images, 390,000 lidar sweeps, 1.4M 3D bounding boxes manually annotated for 23 object classes, 1.1B lidar points manually annotated for 32 classes
  • License: CC BY-NC-SA 4.0
nuScenes LiDAR Sample
nuScenes LiDAR Sample

nuScenes is a high-resolution, multi-modal sensor data collected in diverse weather and lighting conditions with 360-degree coverage. It is the first dataset to include Radar and LiDAR fusion data.

It covers a total of 35K labeled keyframes across 850 scenes from Boston and Singapore, and consequently left versus right hand traffic. You need to create an account before you can access the dataset.

3. Waymo Open Dataset

  • Year: 2019
  • Geography: 6 US cities
  • Sensors: 5 LiDARs and 5 cameras
  • Data Format: Sharded TFRecord format files containing protocol buffer data
  • Annotations: Wide variety covering tracked 2D/3D objects, key-points & segmentation for 2D/3D
  • Size: 3 datasets, 12 million 3D labels for vehicles and pedestrians
  • License: Custom non-commercial license
Waymo LiDAR Dataset
Waymo LiDAR Dataset

The Waymo Open Dataset covers a wide variety of annotations over a large number of frames in different weather conditions and locations: downtown, suburban, daylight, night, rain.

It is composed of three datasets: the Perception Dataset with high resolution sensor data and labels for 2,030 segments. The Motion Dataset with object trajectories and corresponding 3D maps for 103,354 segments, and the End-to-End Driving Dataset with camera images providing 360-degree coverage and routing instructions for 5,000 segments.

4. Argoverse 2

  • Year: 2021
  • Geography: 6 US cities
  • Sensors: 2 Lidar sensors, 7 ring cameras and 2 stereo cameras
  • Data Format: Point cloud files and annotations as as Apache feather files, Raw images as PNG files
  • Annotations: Annotated HD maps, ground points, tracked 3D cuboids, motion forecasting and map change detection data
  • Size: 4 open-source datasets
  • License: CC BY-NC-SA 4.0
Argoverse 2 LiDAR Dataset: Semantics + 3D Cuboids
Argoverse 2 LiDAR Dataset: Semantics + 3D Cuboids

Argoverse 2 is a comprehensive dataset for a number of tasks including 3D object detection, tracking, detecting changes in HD maps and motion forecasting. The Argoverse 2 Sensor Dataset contains 1000 3D labeled scenarios of 15 seconds each.

The Argoverse 2 Motion Forecasting Dataset has 250,000 scenarios with trajectory data for many object types, while The Lidar Dataset contains 20,000 unannotated lidar sequences.Finally, The Map Change Dataset provides 1,000 scenarios, 200 of which depict real-world HD map changes.

💡
Looking for full dataset management? Unitlab got you covered.

5. Toronto-3D

  • Year: 2020
  • Geography: Toronto, Canada
  • Sensors: 1 LiDAR
  • Data Format: Point cloud files as BIN files, annotations in PLY format
  • Annotations: Semantic segmentation of Lidar data
  • Size: 1 km of road + 78.3 million labeled points
  • License: CC BY-NC-SA 4.0
Toronto-3D LiDAR Semantic Annotation
Toronto-3D LiDAR Semantic Annotation

Toronto-3D is a large-scale dataset for semantic segmentation of urban outdoor scenes. It was collected using a Teledyne Optech Maverick mobile mapping system. The dataset covers approximately 1km of Avenue Road in Toronto, Canada. It contains 78 million points labeled into eight categories.

6. PandaSet

  • Year: 2021
  • Geography: San Francisco, California
  • Sensors: 360-degree 1 mechanical LiDAR, 360-degree 1 solid-state LiDAR, 6 cameras, On-board GPS/IMU
  • Data Format: Pickle format for LiDAR and JPG for images.
  • Annotations: 3D cuboid and semantic segmentation data
  • Size: 48,000 camera images, 16,000 LiDAR sweeps, 103 scenes of 8s each, 28 annotation classes, 37 semantic segmentation labels.
  • License: Apache License, Version 2.0
PandaSet LiDAR Semantic Annotation
PandaSet LiDAR Semantic Annotation

PandaSet includes 100 scenes of 8 seconds each. The dataset provides complex labels like smoke, vegetation, and construction zones. It is useful for studying sensor-specific noise patterns.

Created by Hesai and Scale AI, PandaSet is unique because it features two different types of LiDAR: a mechanical spinning LiDAR and a solid-state LiDAR. This allows researchers to study how different sensor architectures affect perception.

7. ApolloScape

  • Year: 2018
  • Geography: Beijing, China
  • Sensors: 2 lidars
  • Data Format: Point cloud files as PCD files, images in PNG format and annotations as JSON files
  • Annotations: 3D cuboid annotations for Lidar, disparity maps for sterio and lane detection
  • Size: 100K image frames, 80k lidar point cloud and 1000km trajectories
  • License: Custom non-commercial license
ApolloScape LiDAR Annotation
ApolloScape LiDAR Annotation

Baidu’s Apolloscape consists of datasets for 3D object detection, lane detection, disparity maps for stereo images, scene inpainting, and trajectory prediction.

ApolloScape offers high-resolution 3D point cloud segmentation. It includes over 140,000 frames of sensor data. The dataset is known for its high density of non-motorized vehicles. It provides sub-centimeter accuracy for static environmental maps.

💡
Looking for LiDAR annotation applications? Join our beta to access latest features.

8. Oxford RobotCar Dataset

  • Year: 2014-2015
  • Geography: Oxford, UK
  • Sensors: 6 cameras, LiDAR, GPS, INS
  • Data Format: Images and Radar scans as PNG, LiDAR as BIN
  • Annotations: 3D cuboid annotations and semantic segmentation
  • Size: 1000km of recorded driving with over 20 million images
  • License: CC BY-NC-SA 4.0
Oxford RobotCar LiDAR
Oxford RobotCar LiDAR

This dataset captures a single 10km route over 100 times across a year. It records variations in lighting, weather, and traffic. The data is essential for testing localization and mapping (SLAM) algorithms. It uses two LMS-151 2D LiDARs and one HDL-32E 3D LiDAR.

9. A2D2 Dataset

  • Year: 2020
  • Geography: Germany (3 cities)
  • Sensors: 5 lidars and 6 cameras
  • Data Format: Point cloud files as npz(numpy zip) files, images in PNG format and annotations as JSON files
  • Annotations: Lidar and Image semantic segmentation and 3D cuboid annotations and sensor data
  • Size: 40K images and point cloud files for semantic segmentation; 12K point cloud frames for 3D object detection.
  • License: CC BY-NC-SA 4.0
A2D2 LiDAR Dataset
A2D2 LiDAR Dataset

The Audi Autonomous Driving Dataset (A2D2) provides 41,277 frames with 3D bounding boxes. It includes three Velodyne VLP-16 sensors. The labels are provided for 38 categories. It features a unique semantic segmentation for both 2D and 3D data.

10. ONCE

  • Year: 2021
  • Geography: China (urban areas)
  • Sensors: LiDAR and 7 RGB cameras
  • Data Format: PCD, PNG and JSON files
  • Annotations: Partially annotated for 3D object detection
  • Size: 1 Million LiDAR frames, 7 Million camera images, 15k fully annotated scenes with 5 classes
  • License: CC BY-NC-SA 4.0
ONCE LiDAR Dataset
ONCE LiDAR Dataset

ONCE (One Million Scenes) is one of the largest autonomous driving datasets, focusing on real-world urban scenes. This dataset targets the development of self-supervised learning. It provides 1 million LiDAR frames from various Chinese cities. Only a subset of 15,000 frames is fully annotated. This structure forces models to learn from unlabelled spatial features.

Conclusion

Open-source LiDAR datasets have transformed AV research significantly, allowing startups and universities to find novel ideas and put them into action. By providing diverse geographic, weather, and traffic conditions, these datasets ensure that the next generation of autonomous vehicles is safe, reliable, and capable of navigating the complexities of the real world.

Explore More

References

  • Alex Nguyen (Jul 30, 2021). 15 Best Open-Source Autonomous Driving Datasets. Medium: Source
  • Mindkosh (no date). Publicly available Lidar datasets for Autonomous Vehicle use-cases. Mindkosh: Source
  • Tobias Cornille (Apr 25, 2022). 10 Lidar Datasets for Autonomous Driving. Segments AI: Source