Folder Structure
{seq:0>4}
denotes the sequence ID using 4 digits and {frame:0>10}
denotes the frame ID using 10 digits.
KITTI-360/
|-- calibration/
│ |-- calib_cam_to_pose.txt
│ |-- calib_cam_to_velo.txt
│ |-- calib_sick_to_velo.txt
| |-- perspective.txt
│ |-- image_02.yaml
│ `-- image_03.yaml
|-- data_2d_raw/
| `-- 2013_05_28_drive_{seq:0>4}_sync/
| `-- image_{00|01}/
| `-- data_rect/
| `-- {frame:0>10}.png
| `-- image_{02|03}/
| `-- data_rgb/
| `-- {frame:0>10}.png
|-- data_2d_semantics/
│ |-- train
| | `-- 2013_05_28_drive_{seq:0>4}_sync/
| | |-- image_{00|01}/
| | | |-- semantic/
| | | | `-- {frame:0>10}.png
| | | |-- semantic_rgb/
| | | | `-- {frame:0>10}.png
| | | |-- instance/
| | | | `-- {frame:0>10}.png
| | | `-- confidence/
| | | `-- {frame:0>10}.png
| | `-- instanceDict.json
| ...
| ...
|-- data_3d_raw/
| `-- 2013_05_28_drive_{seq:0>4}_sync/
| `-- velodyne_points/
| |-- data/
| | `-- {frame:0>10}.bin
| `-- timestamps.txt
| `-- sick_points/
| |-- data/
| | `-- {frame:0>10}.bin
| `-- timestamps.txt
|-- data_3d_semantics/
│ |-- train
| | `-- 2013_05_28_drive_{seq:0>4}_sync/
| | |-- static/
| | | `-- {start_frame:0>10}_{end_frame:0>10}.ply
| | `-- dynamic/
| | `-- {start_frame:0>10}_{end_frame:0>10}.ply
│ `-- test
| `-- 2013_05_28_drive_{seq:0>4}_sync/
| `-- static/
| `-- {start_frame:0>10}_{end_frame:0>10}.ply
|-- data_3d_bboxes/
| `-- train
| `-- 2013_05_28_drive_{seq:0>4}_sync.xml
`-- data_poses/
`-- 2013_05_28_drive_{seq:0>4}_sync/
|-- poses.txt
`-- cam0_to_world.txt
Development Toolkit
2D Data Format
-
data_2d_raw/2013_05_28_drive_{seq:0>4}_sync/image_{00|01}/data_rect/{frame:0>10}.png
:
Stereo pairs in 8-bit PNG format. -
data_2d_raw/2013_05_28_drive_{seq:0>4}_sync/image_{02|03}}/data_rgb/{frame:0>10}.png
:
Fisheye images in 8-bit PNG format.
-
data_2d_semantics/train/2013_05_28_drive_{seq:0>4}_sync/image_{00|01}/semantic/{frame:0>10}.png
:
Semantic label in single-channel 8-bit PNG format. Each pixel value denotes the correspondingsemanticID
. -
data_2d_semantics/train/2013_05_28_drive_{seq:0>4}_sync/image_{00|01}/semantic_rgb/{frame:0>10}.png
:
Semantic RGB image in 3-channel 8-bit PNG format. Each pixel value denotes the color-coded semantic label. -
data_2d_semantics/train/2013_05_28_drive_{seq:0>4}_sync/image_{00|01}/instance/{frame:0>10}.png
:
Instance label in single-channel 16-bit PNG format. Each pixel value denotes the correspondinginstanceID
. Here,instanceID = semanticID*1000 + classInstanceID
withclassInstanceID
denoting the instance ID within one class andclassInstanceID = 0
for classes without instance label. Note thatinstanceID
is unique across the full sequence, for example, a building appearing in different frames has the sameinstanceID
in all these frames. -
data_2d_semantics/train/2013_05_28_drive_{seq:0>4}_sync/image_{00|01}/confidence/{frame:0>10}.png
:
Confidence map in single-channel 8-bit PNG format. Each pixel value corresponds to a confidence score ranging from 0 to 255. Lower values suggest lower confidence.
3D Data Format
-
data_3d_raw/2013_05_28_drive_{seq:0>4}_sync/velodyne_points/data/{frame:0>10}.bin
:
Velodyne scans in BINARY format. -
data_3d_raw/2013_05_28_drive_{seq:0>4}_sync/velodyne_points/timestamps.txt
:
Timestiamps of Velodyne scans, each line contains the timestamp of one scan. -
data_3d_raw/2013_05_28_drive_{seq:0>4}_sync/sick_points/data/{frame:0>10}.bin
:
Sick scans in BINARY format. Note that the SICK laser scanner has a higher FPS, thus the frame indices of SICK scans do not align with those of images nor Velodyne scans. -
data_3d_raw/2013_05_28_drive_{seq:0>4}_sync/sick_points/timestamps.txt
:
Timestamps of SICK scans, each line contains the timestamp of one scan.
start_frame
and the end_frame
(both in 10 digits):
-
data_3d_semantics/train/2013_05_28_drive_{seq:0>4}_sync/static/{start_frame:0>10}_{end_frame:0>10}.ply
:
Fused static point clouds in PLY format for training. The PLY file contains only vertices. Each vertex of the PLY contains the following information:x y z red green blue semanticID instanceID isVisible
. Here,x y z
(32-bit float) is the location of a 3D point in the world coordinate,red green blue
(8-bit uchar) is the color of a 3D point obtained by projecting it to adjacent 2D images,semanticId instanceID
(32-bit int) describes the label of a 3D point whereinstanceID
is consistent with the 2D label. The last value,isVisible
(8-bit uchar), is a binary variable which is 0 when a 3D point is not visible in any of the perspective images. For these occluded points we keep a 3D point only if it is uniquely labeled by a 3D bounding box and assign the label according to the annotation. Unlabeled points or ambiguously labeled points are ignored. -
data_3d_semantics/train/2013_05_28_drive_{seq:0>4}_sync/dynamic/{start_frame:0>10}_{end_frame:0>10}.ply
:
Fused dynamic point clouds in PLY format. The PLY file contains only vertices. Each vertex has an additionaltimestamp
(32-bit int) value compared to the static points:x y z red green blue semantic instance isVisible timestamp
. -
data_3d_semantics/test/2013_05_28_drive_{seq:0>4}_sync/static/{start_frame:0>10}_{end_frame:0>10}.ply
:
Fused static point clouds in PLY format for testing. The test point clouds share the same format as the training point clouds except that labels are omitted:x y z red green blue isVisible
.
-
data_3d_semantics/train/2013_05_28_drive_{seq:0>4}_sync.xml
:
Each elementobject{d}
denotes a bounding box having consistentsemanticId
andinstanceId
with the 2D labels. Thevertices
andfaces
matrices form the mesh of the bounding box in a local coordinate. Thetransform
matrix transforms this mesh to the world coordinate. Thetimestamp
denotes the frame ID for dynamic objects and is -1 for static object.
Calibrations
-
calibration/calib_cam_to_pose.txt
:
Each line contains a 3x4 matrix denoting the transformation from a camera to the system pose. There are 4 rows, including two perspective camerasimage_00
,image_01
and two fisheye camerasimage_02
,image_03
. -
calibration/calib_cam_to_velo.txt
:
A 3x4 matrix denoting the rigid transformation from the first camera (image_00
) to the Velodyne. -
calibration/calib_sick_to_velo.txt
:
A 3x4 matrix denoting the rigid transformation from the SICK laser scanner to the Velodyne. -
calibration/perspective.txt
:
Intrinsics of the perspective cameras. The lines starting withP_rect_00
andP_rect_01
provide 3x4 perspective intrinsics.R_rect_00
andR_rect_01
correspond to 3x3 rectification matrices. -
calibration/image_{02|03}.yaml
:
Intrinsics of the fisheye cameras.
Poses
-
data_poses/2013_05_28_drive_{seq:0>4}_sync/poses.txt
:
Each line has 13 numbers, the first number is an integer denoting the frame index and the rest is a 3x4 matrix denoting the system pose in a global Euclidean space. -
data_poses/2013_05_28_drive_{seq:0>4}_sync/cam0_to_world.txt
:
Each line has 17 numbers, the first number is an integer denoting the frame index and the rest is a 4x4 matrix denoting the pose of camera 0 in a global Euclidean space.
Sensor Locations
