10 Popular Public LiDAR Datasets for Autonomous Vehicles

Autonomous driving systems, such as those used by Tesla or Waymo, require high-quality spatial data for training and testing. LiDAR (Light Detection and Ranging) technology provides precise 3D measurements of the environment by emitting laser beams, calculating the time of flight of laser pulses, and determining their location to construct a 3D point cloud consisting of billions of points. These point clouds undergo several processing steps to be usable for self-driving vehicles.

Constructing a high-quality, consistent LiDAR dataset for self-autonomous cars is a difficult engineering problem that also requires significant time and monetary resources. Fortunately, a number of public LiDAR datasets exist specifically designed for this use case. They eliminate the high cost of data collection, data annotation, and hardware maintenance, enabling researches to focus on core engineering problems, such as improving vehicle safety and accuracy.

In this article, we are going to discuss 10 popular LiDAR datasets. Let's dive in.

Top 10 Open-Source LiDAR Datasets

How does one construct a public, real LiDAR dataset for autonomous vehicles in urban environments in different weather conditions? The intuition is simple, yet the implementation is hard and different: you get a special car equipped with LiDAR, stereo, and 360 RGB cameras, as well as GPS systems, and drive through the busy streets. That's it.

But how you actually design and run this car, how you annotate raw 3D point clouds, and how you choose annotation and dataset formats differentiate your LiDAR dataset. The datasets below differ not only in their format or size, but also in collection.

Here is a summary table below:

Name	Location	Year	Annotation Format	Size	License
KITTI	Germany	2012	3D/2D Boxes, Semantic Seg.	15k frames, 80k objects	CC BY-NC-SA 4.0
nuScenes	USA, Singapore	2019	3D Boxes, Attribute labels	1,000 scenes, 1.4M boxes	CC BY-NC-SA 4.0
Waymo Open	6 US Cities	2019	3D/2D Boxes, Key-points	12 million 3D labels	Custom Non-Commercial
Argoverse 2	6 US Cities	2021	3D Cuboids, HD Maps	1,000 labeled scenarios	CC BY-NC-SA 4.0
Toronto-3D	Canada	2020	Semantic Segmentation	78.3 million points	CC BY-NC-SA 4.0
PandaSet	USA	2021	3D Cuboids, Semantic Seg.	103 scenes, 28 classes	Apache 2.0
ApolloScape	China	2018	3D Cuboids, Disparity Maps	140k frames, 1k km road	Custom Non-Commercial
Oxford RobotCar	UK	2014	3D Cuboids, Semantic Seg.	1,000 km recorded driving	CC BY-NC-SA 4.0
A2D2	Germany	2020	3D Boxes, Semantic Seg.	41k frames, 38 categories	CC BY-ND 4.0
ONCE	China	2021	3D Object Detection	1 million frames (15k annotated)	CC BY-NC-SA 4.0

1. KITTI

Year: 2012, 2015
Geography: Karlsruhe, Germany
Sensors: 2 grayscale cameras, 2 color cameras, Velodyne LiDAR, GPS/IMU
Data Format: BIN, PNG and JSON files
Annotations: 3D/2D object detection, tracking, semantic segmentation
Size: 7481 point cloud files, 7,518 testing frames, and ~80K labeled objects
License: CC BY-NC-SA 4.0

The KITTI Vision Benchmark Suite is a pioneering dataset (2012, 2015) developed by Karlsruhe Institute of Technology
and Toyota Technological Institute at Chicago. It is widely used for benchmarking autonomous driving algorithms and ML models. The dataset gave rise to other modern, deep LiDAR datasets, such as KITTI-360.

KITTI is an open-source dataset, but you need to create an account and wait for approval before you can gain access. Note that the 3D cuboid annotations are only labeled top-down, meaning they are not accurate in Z-axis. Moreover, only objects visible in the cameras are labeled.

Build better
datasets

Explore how Unitlab AI supports scalable data annotation, model‑assisted labeling, and production‑ready workflows across vision, video, and multimodal AI.

Learn more

Unitlab AI Platform – Data Annotation & Labeling QA

2. nuScenes

Year: 2019
Geography: Boston (USA), Singapore
Sensors: 6 cameras, 5 RADARs, 1 LiDAR, GPS/IMU
Data Format: BIN, PNG and JSON files
Annotations: 3D/2D object detection, tracking, semantic segmentation
Size: 1000 scenes of 20s each, 1,400,000 camera images, 390,000 lidar sweeps, 1.4M 3D bounding boxes manually annotated for 23 object classes, 1.1B lidar points manually annotated for 32 classes
License: CC BY-NC-SA 4.0

nuScenes is a high-resolution, multi-modal sensor data collected in diverse weather and lighting conditions with 360-degree coverage. It is the first dataset to include Radar and LiDAR fusion data.

It covers a total of 35K labeled keyframes across 850 scenes from Boston and Singapore, and consequently left versus right hand traffic. You need to create an account before you can access the dataset.

3. Waymo Open Dataset

Year: 2019
Geography: 6 US cities
Sensors: 5 LiDARs and 5 cameras
Data Format: Sharded TFRecord format files containing protocol buffer data
Annotations: Wide variety covering tracked 2D/3D objects, key-points & segmentation for 2D/3D
Size: 3 datasets, 12 million 3D labels for vehicles and pedestrians
License: Custom non-commercial license

The Waymo Open Dataset covers a wide variety of annotations over a large number of frames in different weather conditions and locations: downtown, suburban, daylight, night, rain.

It is composed of three datasets: the Perception Dataset with high resolution sensor data and labels for 2,030 segments. The Motion Dataset with object trajectories and corresponding 3D maps for 103,354 segments, and the End-to-End Driving Dataset with camera images providing 360-degree coverage and routing instructions for 5,000 segments.

4. Argoverse 2

Year: 2021
Geography: 6 US cities
Sensors: 2 Lidar sensors, 7 ring cameras and 2 stereo cameras
Data Format: Point cloud files and annotations as as Apache feather files, Raw images as PNG files
Annotations: Annotated HD maps, ground points, tracked 3D cuboids, motion forecasting and map change detection data
Size: 4 open-source datasets
License: CC BY-NC-SA 4.0

Argoverse 2 LiDAR Dataset: Semantics + 3D Cuboids

Argoverse 2 is a comprehensive dataset for a number of tasks including 3D object detection, tracking, detecting changes in HD maps and motion forecasting. The Argoverse 2 Sensor Dataset contains 1000 3D labeled scenarios of 15 seconds each.

The Argoverse 2 Motion Forecasting Dataset has 250,000 scenarios with trajectory data for many object types, while The Lidar Dataset contains 20,000 unannotated lidar sequences.Finally, The Map Change Dataset provides 1,000 scenarios, 200 of which depict real-world HD map changes.

💡

Looking for full dataset management? Unitlab got you covered.

5. Toronto-3D

Year: 2020
Geography: Toronto, Canada
Sensors: 1 LiDAR
Data Format: Point cloud files as BIN files, annotations in PLY format
Annotations: Semantic segmentation of Lidar data
Size: 1 km of road + 78.3 million labeled points
License: CC BY-NC-SA 4.0

Toronto-3D is a large-scale dataset for semantic segmentation of urban outdoor scenes. It was collected using a Teledyne Optech Maverick mobile mapping system. The dataset covers approximately 1km of Avenue Road in Toronto, Canada. It contains 78 million points labeled into eight categories.

6. PandaSet

Year: 2021
Geography: San Francisco, California
Sensors: 360-degree 1 mechanical LiDAR, 360-degree 1 solid-state LiDAR, 6 cameras, On-board GPS/IMU
Data Format: Pickle format for LiDAR and JPG for images.
Annotations: 3D cuboid and semantic segmentation data
Size: 48,000 camera images, 16,000 LiDAR sweeps, 103 scenes of 8s each, 28 annotation classes, 37 semantic segmentation labels.
License: Apache License, Version 2.0

PandaSet includes 100 scenes of 8 seconds each. The dataset provides complex labels like smoke, vegetation, and construction zones. It is useful for studying sensor-specific noise patterns.

Created by Hesai and Scale AI, PandaSet is unique because it features two different types of LiDAR: a mechanical spinning LiDAR and a solid-state LiDAR. This allows researchers to study how different sensor architectures affect perception.

7. ApolloScape

Year: 2018
Geography: Beijing, China
Sensors: 2 lidars
Data Format: Point cloud files as PCD files, images in PNG format and annotations as JSON files
Annotations: 3D cuboid annotations for Lidar, disparity maps for sterio and lane detection
Size: 100K image frames, 80k lidar point cloud and 1000km trajectories
License: Custom non-commercial license

Baidu’s Apolloscape consists of datasets for 3D object detection, lane detection, disparity maps for stereo images, scene inpainting, and trajectory prediction.

ApolloScape offers high-resolution 3D point cloud segmentation. It includes over 140,000 frames of sensor data. The dataset is known for its high density of non-motorized vehicles. It provides sub-centimeter accuracy for static environmental maps.

💡

Looking for LiDAR annotation applications? Join our beta to access latest features.

8. Oxford RobotCar Dataset

Year: 2014-2015
Geography: Oxford, UK
Sensors: 6 cameras, LiDAR, GPS, INS
Data Format: Images and Radar scans as PNG, LiDAR as BIN
Annotations: 3D cuboid annotations and semantic segmentation
Size: 1000km of recorded driving with over 20 million images
License: CC BY-NC-SA 4.0

This dataset captures a single 10km route over 100 times across a year. It records variations in lighting, weather, and traffic. The data is essential for testing localization and mapping (SLAM) algorithms. It uses two LMS-151 2D LiDARs and one HDL-32E 3D LiDAR.

9. A2D2 Dataset

Year: 2020
Geography: Germany (3 cities)
Sensors: 5 lidars and 6 cameras
Data Format: Point cloud files as npz(numpy zip) files, images in PNG format and annotations as JSON files
Annotations: Lidar and Image semantic segmentation and 3D cuboid annotations and sensor data
Size: 40K images and point cloud files for semantic segmentation; 12K point cloud frames for 3D object detection.
License: CC BY-NC-SA 4.0

The Audi Autonomous Driving Dataset (A2D2) provides 41,277 frames with 3D bounding boxes. It includes three Velodyne VLP-16 sensors. The labels are provided for 38 categories. It features a unique semantic segmentation for both 2D and 3D data.

10. ONCE

Year: 2021
Geography: China (urban areas)
Sensors: LiDAR and 7 RGB cameras
Data Format: PCD, PNG and JSON files
Annotations: Partially annotated for 3D object detection
Size: 1 Million LiDAR frames, 7 Million camera images, 15k fully annotated scenes with 5 classes
License: CC BY-NC-SA 4.0

ONCE (One Million Scenes) is one of the largest autonomous driving datasets, focusing on real-world urban scenes. This dataset targets the development of self-supervised learning. It provides 1 million LiDAR frames from various Chinese cities. Only a subset of 15,000 frames is fully annotated. This structure forces models to learn from unlabelled spatial features.

Conclusion

Open-source LiDAR datasets have transformed AV research significantly, allowing startups and universities to find novel ideas and put them into action. By providing diverse geographic, weather, and traffic conditions, these datasets ensure that the next generation of autonomous vehicles is safe, reliable, and capable of navigating the complexities of the real world.

Explore More

References

Alex Nguyen (Jul 30, 2021). 15 Best Open-Source Autonomous Driving Datasets. Medium: Source
Mindkosh (no date). Publicly available Lidar datasets for Autonomous Vehicle use-cases. Mindkosh: Source
Tobias Cornille (Apr 25, 2022). 10 Lidar Datasets for Autonomous Driving. Segments AI: Source

LiDAR Computer Vision Open-Source Datasets