Folder Structure

Our dataset is divided into 11 individual sequences, each corresponding to a continuous driving trajectory. As there are rarely overlaps across different sequences, we split training and test data according to the sequence ID. The full dataset including raw data, semantic and instance labels in both 2D & 3D is structured as follows, where {seq:0>4} denotes the sequence ID using 4 digits and {frame:0>10} denotes the frame ID using 10 digits.

  KITTI-360/
  |-- calibration/
  │   |-- calib_cam_to_pose.txt
  │   |-- calib_cam_to_velo.txt
  │   |-- calib_sick_to_velo.txt
  |   `-- perspective.txt
  |-- data_2d_raw/
  |   `-- 2013_05_28_drive_{seq:0>4}_sync/
  |       `-- image_{00|01}/
  |           `-- data_rect/
  |               `-- {frame:0>10}.png
  |       `-- image_{02|03}/
  |           `-- data_rgb/
  |               `-- {frame:0>10}.png
  |-- data_2d_semantics/
  │   |-- train 
  |   |   `-- 2013_05_28_drive_{seq:0>4}_sync/
  |   |      |-- image_{00|01}/
  |   |      |    |-- semantic/
  |   |      |    |   `-- {frame:0>10}.png
  |   |      |    |-- semantic_rgb/
  |   |      |    |   `-- {frame:0>10}.png
  |   |      |    |-- instance/
  |   |      |    |   `-- {frame:0>10}.png
  |   |      |    `-- confidence/
  |   |      |        `-- {frame:0>10}.png
  |   |      `--  instanceDict.json
  |   ...
  

  |   ...
  |-- data_3d_raw/
  |   `-- 2013_05_28_drive_{seq:0>4}_sync/
  |       `-- velodyne_points/
  |           `-- data/
  |               `-- {frame:0>10}.bin 
  |       `-- sick_points/
  |           `-- data/
  |               `-- {frame:0>10}.bin 
  |-- data_3d_semantics/
  │   |-- train 
  |   |   `-- 2013_05_28_drive_{seq:0>4}_sync/
  |   |      |-- static/
  |   |      |    `-- {start_frame:0>10}_{end_frame:0>10}.ply
  |   |      `-- dynamic/
  |   |           `-- {start_frame:0>10}_{end_frame:0>10}.ply  
  │   `-- test 
  |       `-- 2013_05_28_drive_{seq:0>4}_sync/
  |          `-- static/
  |               `-- {start_frame:0>10}_{end_frame:0>10}.ply
  |-- data_3d_bboxes/
  |   `-- train
  |       `-- 2013_05_28_drive_{seq:0>4}_sync.xml
  `-- data_poses/
      `-- 2013_05_28_drive_{seq:0>4}_sync/
          |-- poses.txt
          `-- cam0_to_world.txt

  

2D Data Format

Our 2D raw data include images collected by a pair of perspective cameras and a pair of fisheye cameras:
  • data_2d_raw/2013_05_28_drive_{seq:0>4}_sync/image_{00|01}/data_rect/{frame:0>10}.png:
    Stereo pairs in 8-bit PNG format.
  • data_2d_raw/2013_05_28_drive_{seq:0>4}_sync/image_{02|03}}/data_rgb/{frame:0>10}.png:
    Fisheye images in 8-bit PNG format.
For each frame, we provide semantic & instance labels as well as confidence maps:
  • data_2d_semantics/train/2013_05_28_drive_{seq:0>4}_sync/image_{00|01}/semantic/{frame:0>10}.png:
    Semantic label in single-channel 8-bit PNG format. Each pixel value denotes the corresponding semanticID.
  • data_2d_semantics/train/2013_05_28_drive_{seq:0>4}_sync/image_{00|01}/semantic_rgb/{frame:0>10}.png:
    Semantic RGB image in 3-channel 8-bit PNG format. Each pixel value denotes the color-coded semantic label.
  • data_2d_semantics/train/2013_05_28_drive_{seq:0>4}_sync/image_{00|01}/instance/{frame:0>10}.png:
    Instance label in single-channel 16-bit PNG format. Each pixel value denotes the corresponding instanceID. Here, instanceID = semanticID*1000 + classInstanceID with classInstanceID denoting the instance ID within one class and classInstanceID = 0 for classes without instance label. Note that instanceID is unique across the full sequence, for example, a building appearing in different frames has the same instanceID in all these frames.
  • data_2d_semantics/train/2013_05_28_drive_{seq:0>4}_sync/image_{00|01}/confidence/{frame:0>10}.png:
    Confidence map in single-channel 8-bit PNG format. Each pixel value corresponds to a confidence score ranging from 0 to 255. Lower values suggest lower confidence.

3D Data Format

We release the 3D raw scans as well as fused point clouds. The format of the 3D raw data is:
  • data_3d_raw/2013_05_28_drive_{seq:0>4}_sync/velodyne_points/data/{frame:0>10}.bin:
    Velodyne scans in BINARY format.
We divide the fused point clouds into windows to reduce the size of individual files, where each window is defined by the start_frame and the end_frame (both in 10 digits):
  • data_3d_semantics/train/2013_05_28_drive_{seq:0>4}_sync/static/{start_frame:0>10}_{end_frame:0>10}.ply:
    Fused static point clouds in PLY format for training. The PLY file contains only vertices. Each vertex of the PLY contains the following information: x y z red green blue semanticID instanceID isVisible. Here, x y z (32-bit float) is the location of a 3D point in the world coordinate, red green blue (8-bit uchar) is the color of a 3D point obtained by projecting it to adjacent 2D images, semanticId instanceID (32-bit int) describes the label of a 3D point where instanceID is consistent with the 2D label. The last value, isVisible (8-bit uchar), is a binary variable which is 0 when a 3D point is not visible in any of the perspective images. For these occluded points we keep a 3D point only if it is uniquely labeled by a 3D bounding box and assign the label according to the annotation. Unlabeled points or ambiguously labeled points are ignored.
  • data_3d_semantics/train/2013_05_28_drive_{seq:0>4}_sync/dynamic/{start_frame:0>10}_{end_frame:0>10}.ply:
    Fused dynamic point clouds in PLY format. The PLY file contains only vertices. Each vertex has an additional timestamp (32-bit int) value compared to the static points: x y z red green blue semantic instance isVisible timestamp.
  • data_3d_semantics/test/2013_05_28_drive_{seq:0>4}_sync/static/{start_frame:0>10}_{end_frame:0>10}.ply:
    Fused static point clouds in PLY format for testing. The test point clouds share the same format as the training point clouds except that labels are omitted: x y z red green blue isVisible.

Sensor Locations

Development Toolkit

We provide a development toolkit for loading and inspecting the 2D and 3D labels. Please find more details here: https://github.com/autonomousvision/kitti360Scripts




eXTReMe Tracker