API Documentation¶

mmdet3d.core¶

anchor¶

class mmdet3d.core.anchor.AlignedAnchor3DRangeGenerator(align_corner=False, **kwargs)[source]¶

Aligned 3D Anchor Generator by range.

This anchor generator uses a different manner to generate the positions of anchors’ centers from Anchor3DRangeGenerator.

Note

The align means that the anchor’s center is aligned with the voxel grid, which is also the feature grid. The previous implementation of Anchor3DRangeGenerator does not generate the anchors’ center according to the voxel grid. Rather, it generates the center by uniformly distributing the anchors inside the minimum and maximum anchor ranges according to the feature map sizes. However, this makes the anchors center does not match the feature grid. The AlignedAnchor3DRangeGenerator add + 1 when using the feature map sizes to obtain the corners of the voxel grid. Then it shifts the coordinates to the center of voxel grid and use the left up corner to distribute anchors.

Parameters: anchor_corner (bool) – Whether to align with the corner of the voxel grid. By default it is False and the anchor’s center will be the same as the corresponding voxel’s center, which is also the center of the corresponding greature grid.

anchors_single_range(feature_size, anchor_range, scale, sizes=[[1.6, 3.9, 1.56]], rotations=[0, 1.5707963], device='cuda')[source]¶

Generate anchors in a single range.

Parameters

feature_size (list[float] | tuple[float]) – Feature map size. It is either a list of a tuple of [D, H, W](in order of z, y, and x).
anchor_range (torch.Tensor | list[float]) – Range of anchors with shape [6]. The order is consistent with that of anchors, i.e., (x_min, y_min, z_min, x_max, y_max, z_max).
scale (float | int, optional) – The scale factor of anchors.
sizes (list[list] | np.ndarray | torch.Tensor) – Anchor size with shape [N, 3], in order of x, y, z.
rotations (list[float] | np.ndarray | torch.Tensor) – Rotations of anchors in a single feature grid.
device (str) – Devices that the anchors will be put on.

Returns

Anchors with shape [*feature_size, num_sizes, num_rots, 7].

Return type

torch.Tensor

class mmdet3d.core.anchor.Anchor3DRangeGenerator(ranges, sizes=[[1.6, 3.9, 1.56]], scales=[1], rotations=[0, 1.5707963], custom_values=(), reshape_out=True, size_per_range=True)[source]¶

3D Anchor Generator by range.

This anchor generator generates anchors by the given range in different feature levels. Due the convention in 3D detection, different anchor sizes are related to different ranges for different categories. However we find this setting does not effect the performance much in some datasets, e.g., nuScenes.

Parameters

ranges (list[list[float]]) – Ranges of different anchors. The ranges are the same across different feature levels. But may vary for different anchor sizes if size_per_range is True.
sizes (list[list[float]]) – 3D sizes of anchors.
scales (list[int]) – Scales of anchors in different feature levels.
rotations (list[float]) – Rotations of anchors in a feature grid.
custom_values (tuple[float]) – Customized values of that anchor. For example, in nuScenes the anchors have velocities.
reshape_out (bool) – Whether to reshape the output into (N x 4).
size_per_range – Whether to use separate ranges for different sizes. If size_per_range is True, the ranges should have the same length as the sizes, if not, it will be duplicated.

anchors_single_range(feature_size, anchor_range, scale=1, sizes=[[1.6, 3.9, 1.56]], rotations=[0, 1.5707963], device='cuda')[source]¶

Generate anchors in a single range.

Parameters

feature_size (list[float] | tuple[float]) – Feature map size. It is either a list of a tuple of [D, H, W](in order of z, y, and x).
anchor_range (torch.Tensor | list[float]) – Range of anchors with shape [6]. The order is consistent with that of anchors, i.e., (x_min, y_min, z_min, x_max, y_max, z_max).
scale (float | int, optional) – The scale factor of anchors.
sizes (list[list] | np.ndarray | torch.Tensor) – Anchor size with shape [N, 3], in order of x, y, z.
rotations (list[float] | np.ndarray | torch.Tensor) – Rotations of anchors in a single feature grid.
device (str) – Devices that the anchors will be put on.

Returns

Anchors with shape [*feature_size, num_sizes, num_rots, 7].

Return type

torch.Tensor

grid_anchors(featmap_sizes, device='cuda')[source]¶

Generate grid anchors in multiple feature levels.

Parameters

featmap_sizes (list[tuple]) – List of feature map sizes in multiple feature levels.
device (str) – Device where the anchors will be put on.

Returns

Anchors in multiple feature levels. The sizes of each tensor should be [N, 4], where N = width * height * num_base_anchors, width and height are the sizes of the corresponding feature lavel, num_base_anchors is the number of anchors for that level.

Return type

list[torch.Tensor]

property num_base_anchors¶

Total number of base anchors in a feature grid.

Type: list[int]

property num_levels¶

Number of feature levels that the generator is applied to.

Type: int

single_level_grid_anchors(featmap_size, scale, device='cuda')[source]¶

Generate grid anchors of a single level feature map.

This function is usually called by method self.grid_anchors.

Parameters

featmap_size (tuple[int]) – Size of the feature map.
scale (float) – Scale factor of the anchors in the current level.
device (str, optional) – Device the tensor will be put on. Defaults to ‘cuda’.

Returns

Anchors in the overall feature map.

Return type

torch.Tensor

bbox¶

class mmdet3d.core.bbox.BaseInstance3DBoxes(tensor, box_dim=7, with_yaw=True, origin=0.5, 0.5, 0)[source]¶

Base class for 3D Boxes.

Note

The box is bottom centered, i.e. the relative position of origin in the box is (0.5, 0.5, 0).

Parameters

tensor (torch.Tensor | np.ndarray | list) – a N x box_dim matrix.
box_dim (int) – Number of the dimension of a box. Each row is (x, y, z, x_size, y_size, z_size, yaw). Default to 7.
with_yaw (bool) – Whether the box is with yaw rotation. If False, the value of yaw will be set to 0 as minmax boxes. Default to True.
origin (tuple[float]) – The relative position of origin in the box. Default to (0.5, 0.5, 0). This will guide the box be converted to (0.5, 0.5, 0) mode.

tensor¶

Float matrix of N x box_dim.

Type: torch.Tensor

box_dim¶

Integer indicating the dimension of a box. Each row is (x, y, z, x_size, y_size, z_size, yaw, …).

Type: int

with_yaw¶

If True, the value of yaw will be set to 0 as minmax boxes.

Type: bool

property bottom_center¶

A tensor with center of each box.

Type: torch.Tensor

property bottom_height¶

A vector with bottom’s height of each box.

Type: torch.Tensor

classmethod cat(boxes_list)[source]¶

Concatenate a list of Boxes into a single Boxes.

Parameters: boxes_list (list[BaseInstances3DBoxes]) – List of boxes.
Returns: The concatenated Boxes.
Return type: BaseInstances3DBoxes

property center¶

Calculate the center of all the boxes.

Note

In the MMDetection3D’s convention, the bottom center is usually taken as the default center.

The relative position of the centers in different kinds of boxes are different, e.g., the relative center of a boxes is (0.5, 1.0, 0.5) in camera and (0.5, 0.5, 0) in lidar. It is recommended to use bottom_center or gravity_center for more clear usage.

Returns: A tensor with center of each box.
Return type: torch.Tensor

clone()[source]¶

Clone the Boxes.

Returns: Box object with the same properties as self.
Return type: BaseInstance3DBoxes

abstract convert_to(dst, rt_mat=None)[source]¶

Convert self to dst mode.

Parameters

dst (BoxMode) – The target Box mode.
rt_mat (np.ndarray | torch.Tensor) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.

Returns

The converted box of the same type in the dst mode.

Return type

BaseInstance3DBoxes

property corners¶

a tensor with 8 corners of each box.

Type: torch.Tensor

property device¶

The device of the boxes are on.

Type: str

property dims¶

Corners of each box with size (N, 8, 3).

Type: torch.Tensor

abstract flip(bev_direction='horizontal')[source]¶: Flip the boxes in BEV along given BEV direction.

property gravity_center¶

A tensor with center of each box.

Type: torch.Tensor

property height¶

A vector with height of each box.

Type: torch.Tensor

classmethod height_overlaps(boxes1, boxes2, mode='iou')[source]¶

Calculate height overlaps of two boxes.

Note

This function calculates the height overlaps between boxes1 and boxes2, boxes1 and boxes2 should be in the same type.

Parameters

boxes1 (BaseInstanceBoxes) – Boxes 1 contain N boxes.
boxes2 (BaseInstanceBoxes) – Boxes 2 contain M boxes.
mode (str, optional) – Mode of iou calculation. Defaults to ‘iou’.

Returns

Calculated iou of boxes.

Return type

torch.Tensor

in_range_3d(box_range)[source]¶

Check whether the boxes are in the given range.

Parameters: box_range (list | torch.Tensor) – The range of box (x_min, y_min, z_min, x_max, y_max, z_max)

Note

In the original implementation of SECOND, checking whether a box in the range checks whether the points are in a convex polygon, we try to reduce the burden for simpler cases.

Returns: A binary vector indicating whether each box is inside the reference range.
Return type: torch.Tensor

abstract in_range_bev(box_range)[source]¶

Check whether the boxes are in the given range.

Parameters: box_range (list | torch.Tensor) – The range of box in order of (x_min, y_min, x_max, y_max).
Returns: Indicating whether each box is inside the reference range.
Return type: torch.Tensor

limit_yaw(offset=0.5, period=3.141592653589793)[source]¶

Limit the yaw to a given period and offset.

Parameters

offset (float) – The offset of the yaw.
period (float) – The expected period.

new_box(data)[source]¶

Create a new box object with data.

The new box and its tensor has the similar properties as self and self.tensor, respectively.

Parameters: data (torch.Tensor | numpy.array | list) – Data to be copied.
Returns: A new bbox object with data, the object’s other properties are similar to self.
Return type: BaseInstance3DBoxes

nonempty(threshold: float = 0.0)[source]¶

Find boxes that are non-empty.

A box is considered empty, if either of its side is no larger than threshold.

Parameters: threshold (float) – The threshold of minimal sizes.
Returns: A binary vector which represents whether each box is empty (False) or non-empty (True).
Return type: torch.Tensor

classmethod overlaps(boxes1, boxes2, mode='iou')[source]¶

Calculate 3D overlaps of two boxes.

Note

This function calculates the overlaps between boxes1 and boxes2, boxes1 and boxes2 should be in the same type.

Parameters

boxes1 (BaseInstanceBoxes) – Boxes 1 contain N boxes.
boxes2 (BaseInstanceBoxes) – Boxes 2 contain M boxes.
mode (str, optional) – Mode of iou calculation. Defaults to ‘iou’.

Returns

Calculated iou of boxes’ heights.

Return type

torch.Tensor

abstract rotate(angles, axis=0)[source]¶

Calculate whether the points are in any of the boxes.

Parameters

angles (float) – Rotation angles.
axis (int) – The axis to rotate the boxes.

scale(scale_factor)[source]¶

Scale the box with horizontal and vertical scaling factors.

Parameters: scale_factors (float) – Scale factors to scale the boxes.

to(device)[source]¶

Convert current boxes to a specific device.

Parameters: device (str | torch.device) – The name of the device.
Returns: A new boxes object on the specific device.
Return type: BaseInstance3DBoxes

property top_height¶

A vector with the top height of each box.

Type: torch.Tensor

translate(trans_vector)[source]¶

Calculate whether the points are in any of the boxes.

Parameters: trans_vector (torch.Tensor) – Translation vector of size 1x3.

property volume¶

A vector with volume of each box.

Type: torch.Tensor

property yaw¶

A vector with yaw of each box.

Type: torch.Tensor

class mmdet3d.core.bbox.BboxOverlaps3D(coordinate)[source]¶

3D IoU Calculator.

Parameters: coordinate (str) – The coordinate system, valid options are ‘camera’, ‘lidar’, and ‘depth’.

class mmdet3d.core.bbox.BboxOverlapsNearest3D(coordinate='lidar')[source]¶

Nearest 3D IoU Calculator.

Note

This IoU calculator first finds the nearest 2D boxes in bird eye view (BEV), and then calculates the 2D IoU using bbox_overlaps().

Parameters: coordinate (str) – ‘camera’, ‘lidar’, or ‘depth’ coordinate system.

class mmdet3d.core.bbox.Box3DMode(value)[source]¶

Enum of different ways to represent a box.

Coordinates in LiDAR:

            up z
               ^   x front
               |  /
               | /
left y <------ 0

The relative coordinate of bottom center in a LiDAR box is (0.5, 0.5, 0), and the yaw is around the z axis, thus the rotation axis=2.

Coordinates in camera:

        z front
       /
      /
     0 ------> x right
     |
     |
     v
down y

The relative coordinate of bottom center in a CAM box is [0.5, 1.0, 0.5], and the yaw is around the y axis, thus the rotation axis=1.

Coordinates in Depth mode:

up z
   ^   y front
   |  /
   | /
   0 ------> x right

The relative coordinate of bottom center in a DEPTH box is (0.5, 0.5, 0), and the yaw is around the z axis, thus the rotation axis=2.

static convert(box, src, dst, rt_mat=None)[source]¶

Convert boxes from src mode to dst mode.

Parameters

(tuple | list | np.dnarray | (box) – torch.Tensor | BaseInstance3DBoxes): Can be a k-tuple, k-list or an Nxk array/tensor, where k = 7.
src (BoxMode) – The src Box mode.
dst (BoxMode) – The target Box mode.
rt_mat (np.dnarray | torch.Tensor) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.

Returns

The converted box of the same type.

Return type

(tuple | list | np.dnarray | torch.Tensor | BaseInstance3DBoxes)

class mmdet3d.core.bbox.CameraInstance3DBoxes(tensor, box_dim=7, with_yaw=True, origin=0.5, 0.5, 0)[source]¶

3D boxes of instances in CAM coordinates.

Coordinates in camera:

        z front (yaw=0.5*pi)
       /
      /
     0 ------> x right (yaw=0)
     |
     |
     v
down y

The relative coordinate of bottom center in a CAM box is (0.5, 1.0, 0.5), and the yaw is around the y axis, thus the rotation axis=1. The yaw is 0 at the positive direction of x axis, and increases from the positive direction of x to the positive direction of z.

tensor¶

Float matrix of N x box_dim.

Type: torch.Tensor

box_dim¶

Integer indicates the dimension of a box Each row is (x, y, z, x_size, y_size, z_size, yaw, …).

Type: int

with_yaw¶

If True, the value of yaw will be set to 0 as minmax boxes.

Type: bool

property bev¶

A n x 5 tensor of 2D BEV box of each box with rotation in XYWHR format.

Type: torch.Tensor

property bottom_height¶

A vector with bottom’s height of each box.

Type: torch.Tensor

convert_to(dst, rt_mat=None)[source]¶

Convert self to dst mode.

Parameters

dst (BoxMode) – The target Box mode.
rt_mat (np.dnarray | torch.Tensor) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.

Returns

The converted box of the same type in the dst mode.

Return type

BaseInstance3DBoxes

property corners¶

Coordinates of corners of all the boxes in shape (N, 8, 3).

Convert the boxes to in clockwise order, in the form of (x0y0z0, x0y0z1, x0y1z1, x0y1z0, x1y0z0, x1y0z1, x1y1z1, x1y1z0)

             front z
                  /
                 /
   (x0, y0, z1) + -----------  + (x1, y0, z1)
               /|            / |
              / |           /  |
(x0, y0, z0) + ----------- +   + (x1, y1, z1)
             |  /      .   |  /
             | / oriign    | /
(x0, y1, z0) + ----------- + -------> x right
             |             (x1, y1, z0)
             |
             v
        down y

Type: torch.Tensor

flip(bev_direction='horizontal', points=None)[source]¶

Flip the boxes in BEV along given BEV direction.

In CAM coordinates, it flips the x (horizontal) or z (vertical) axis.

Parameters

bev_direction (str) – Flip direction (horizontal or vertical).
points (torch.Tensor, numpy.ndarray, None) – Points to flip. Defaults to None.

Returns

Flipped points.

Return type

torch.Tensor, numpy.ndarray or None

property gravity_center¶

A tensor with center of each box.

Type: torch.Tensor

property height¶

A vector with height of each box.

Type: torch.Tensor

classmethod height_overlaps(boxes1, boxes2, mode='iou')[source]¶

Calculate height overlaps of two boxes.

This function calculates the height overlaps between boxes1 and boxes2, where boxes1 and boxes2 should be in the same type.

Parameters

boxes1 (CameraInstance3DBoxes) – Boxes 1 contain N boxes.
boxes2 (CameraInstance3DBoxes) – Boxes 2 contain M boxes.
mode (str, optional) – Mode of iou calculation. Defaults to ‘iou’.

Returns

Calculated iou of boxes’ heights.

Return type

torch.Tensor

in_range_bev(box_range)[source]¶

Check whether the boxes are in the given range.

Parameters: box_range (list | torch.Tensor) – The range of box (x_min, z_min, x_max, z_max).

Note

The original implementation of SECOND checks whether boxes in a range by checking whether the points are in a convex polygon, we reduce the burden for simpler cases.

Returns: Indicating whether each box is inside the reference range.
Return type: torch.Tensor

property nearest_bev¶

A tensor of 2D BEV box of each box without rotation.

Type: torch.Tensor

rotate(angle, points=None)[source]¶

Rotate boxes with points (optional) with the given angle.

Parameters

angle (float, torch.Tensor) – Rotation angle.
points (torch.Tensor, numpy.ndarray, optional) – Points to rotate. Defaults to None.

Returns

When points is None, the function returns None, otherwise it returns the rotated points and the rotation matrix rot_mat_T.

Return type

tuple or None

property top_height¶

A vector with the top height of each box.

Type: torch.Tensor

class mmdet3d.core.bbox.DepthInstance3DBoxes(tensor, box_dim=7, with_yaw=True, origin=0.5, 0.5, 0)[source]¶

3D boxes of instances in Depth coordinates.

Coordinates in Depth:

up z    y front (yaw=0.5*pi)
   ^   ^
   |  /
   | /
   0 ------> x right (yaw=0)

The relative coordinate of bottom center in a Depth box is (0.5, 0.5, 0), and the yaw is around the z axis, thus the rotation axis=2. The yaw is 0 at the positive direction of x axis, and increases from the positive direction of x to the positive direction of y.

tensor¶

Float matrix of N x box_dim.

Type: torch.Tensor

box_dim¶

Integer indicates the dimension of a box Each row is (x, y, z, x_size, y_size, z_size, yaw, …).

Type: int

with_yaw¶

If True, the value of yaw will be set to 0 as minmax boxes.

Type: bool

property bev¶

A n x 5 tensor of 2D BEV box of each box in XYWHR format.

Type: torch.Tensor

convert_to(dst, rt_mat=None)[source]¶

Convert self to dst mode.

Parameters

dst (BoxMode) – The target Box mode.
rt_mat (np.ndarray | torch.Tensor) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.

Returns

The converted box of the same type in the dst mode.

Return type

DepthInstance3DBoxes

property corners¶

Coordinates of corners of all the boxes in shape (N, 8, 3).

Convert the boxes to corners in clockwise order, in form of (x0y0z0, x0y0z1, x0y1z1, x0y1z0, x1y0z0, x1y0z1, x1y1z1, x1y1z0)

                            up z
             front y           ^
                  /            |
                 /             |
   (x0, y1, z1) + -----------  + (x1, y1, z1)
               /|            / |
              / |           /  |
(x0, y0, z1) + ----------- +   + (x1, y1, z0)
             |  /      .   |  /
             | / oriign    | /
(x0, y0, z0) + ----------- + --------> right x
                           (x1, y0, z0)

Type: torch.Tensor

flip(bev_direction='horizontal', points=None)[source]¶

Flip the boxes in BEV along given BEV direction.

In Depth coordinates, it flips x (horizontal) or y (vertical) axis.

Parameters

bev_direction (str) – Flip direction (horizontal or vertical).
points (torch.Tensor, numpy.ndarray, None) – Points to flip. Defaults to None.

Returns

Flipped points.

Return type

torch.Tensor, numpy.ndarray or None

property gravity_center¶

A tensor with center of each box.

Type: torch.Tensor

in_range_bev(box_range)[source]¶

Check whether the boxes are in the given range.

Parameters: box_range (list | torch.Tensor) – The range of box (x_min, y_min, x_max, y_max).

Note

In the original implementation of SECOND, checking whether a box in the range checks whether the points are in a convex polygon, we try to reduce the burdun for simpler cases.

Returns: Indicating whether each box is inside the reference range.
Return type: torch.Tensor

property nearest_bev¶

A tensor of 2D BEV box of each box without rotation.

Type: torch.Tensor

points_in_boxes(points)[source]¶

Find points that are in boxes (CUDA).

Parameters: points (torch.Tensor) – Points in shape [1, M, 3] or [M, 3], 3 dimensions are [x, y, z] in LiDAR coordinate.
Returns: The index of boxes each point lies in with shape of (B, M, T).
Return type: torch.Tensor

rotate(angle, points=None)[source]¶

Rotate boxes with points (optional) with the given angle.

Parameters

angle (float, torch.Tensor) – Rotation angle.
points (torch.Tensor, numpy.ndarray, optional) – Points to rotate. Defaults to None.

Returns

When points is None, the function returns None, otherwise it returns the rotated points and the rotation matrix rot_mat_T.

Return type

tuple or None

class mmdet3d.core.bbox.LiDARInstance3DBoxes(tensor, box_dim=7, with_yaw=True, origin=0.5, 0.5, 0)[source]¶

3D boxes of instances in LIDAR coordinates.

Coordinates in LiDAR:

                     up z    x front (yaw=0.5*pi)
                        ^   ^
                        |  /
                        | /
(yaw=pi) left y <------ 0

The relative coordinate of bottom center in a LiDAR box is (0.5, 0.5, 0), and the yaw is around the z axis, thus the rotation axis=2. The yaw is 0 at the negative direction of y axis, and increases from the negative direction of y to the positive direction of x.

tensor¶

Float matrix of N x box_dim.

Type: torch.Tensor

box_dim¶

Integer indicating the dimension of a box. Each row is (x, y, z, x_size, y_size, z_size, yaw, …).

Type: int

with_yaw¶

If True, the value of yaw will be set to 0 as minmax boxes.

Type: bool

property bev¶

2D BEV box of each box with rotation in XYWHR format.

Type: torch.Tensor

convert_to(dst, rt_mat=None)[source]¶

Convert self to dst mode.

Parameters

dst (BoxMode) – the target Box mode
rt_mat (np.ndarray | torch.Tensor) – The rotation and translation matrix between different coordinates. Defaults to None. The conversion from src coordinates to dst coordinates usually comes along the change of sensors, e.g., from camera to LiDAR. This requires a transformation matrix.

Returns

The converted box of the same type in the dst mode.

Return type

BaseInstance3DBoxes

property corners¶

Coordinates of corners of all the boxes in shape (N, 8, 3).

Convert the boxes to corners in clockwise order, in form of (x0y0z0, x0y0z1, x0y1z1, x0y1z0, x1y0z0, x1y0z1, x1y1z1, x1y1z0)

                               up z
                front x           ^
                     /            |
                    /             |
      (x1, y0, z1) + -----------  + (x1, y1, z1)
                  /|            / |
                 / |           /  |
   (x0, y0, z1) + ----------- +   + (x1, y1, z0)
                |  /      .   |  /
                | / oriign    | /
left y<-------- + ----------- + (x0, y1, z0)
    (x0, y0, z0)

Type: torch.Tensor

enlarged_box(extra_width)[source]¶

Enlarge the length, width and height boxes.

Parameters: extra_width (float | torch.Tensor) – Extra width to enlarge the box.
Returns: Enlarged boxes.
Return type: LiDARInstance3DBoxes

flip(bev_direction='horizontal', points=None)[source]¶

Flip the boxes in BEV along given BEV direction.

In LIDAR coordinates, it flips the y (horizontal) or x (vertical) axis.

Parameters

bev_direction (str) – Flip direction (horizontal or vertical).
points (torch.Tensor, numpy.ndarray, None) – Points to flip. Defaults to None.

Returns

Flipped points.

Return type

torch.Tensor, numpy.ndarray or None

property gravity_center¶

A tensor with center of each box.

Type: torch.Tensor

in_range_bev(box_range)[source]¶

Check whether the boxes are in the given range.

Parameters: box_range (list | torch.Tensor) – the range of box (x_min, y_min, x_max, y_max)

Note

The original implementation of SECOND checks whether boxes in a range by checking whether the points are in a convex polygon, we reduce the burden for simpler cases.

Returns: Whether each box is inside the reference range.
Return type: torch.Tensor

property nearest_bev¶

A tensor of 2D BEV box of each box without rotation.

Type: torch.Tensor

points_in_boxes(points)[source]¶

Find the box which the points are in.

Parameters: points (torch.Tensor) – Points in shape (N, 3).
Returns: The index of box where each point are in.
Return type: torch.Tensor

rotate(angle, points=None)[source]¶

Rotate boxes with points (optional) with the given angle.

Parameters

angle (float | torch.Tensor) – Rotation angle.
points (torch.Tensor, numpy.ndarray, optional) – Points to rotate. Defaults to None.

Returns

When points is None, the function returns None, otherwise it returns the rotated points and the rotation matrix rot_mat_T.

Return type

tuple or None

mmdet3d.core.bbox.bbox3d2result(bboxes, scores, labels)[source]¶

Convert detection results to a list of numpy arrays.

Parameters

bboxes (torch.Tensor) – Bounding boxes with shape of (n, 5).
labels (torch.Tensor) – Labels with shape of (n, ).
scores (torch.Tensor) – Scores with shape of (n, ).

Returns

Bounding box results in cpu mode.

boxes_3d (torch.Tensor): 3D boxes.

scores (torch.Tensor): Prediction scores.

labels_3d (torch.Tensor): Box labels.

Return type

dict[str, torch.Tensor]

mmdet3d.core.bbox.bbox3d2roi(bbox_list)[source]¶

Convert a list of bounding boxes to roi format.

Parameters: bbox_list (list[torch.Tensor]) – A list of bounding boxes corresponding to a batch of images.
Returns: Region of interests in shape (n, c), where the channels are in order of [batch_ind, x, y …].
Return type: torch.Tensor

mmdet3d.core.bbox.bbox3d_mapping_back(bboxes, scale_factor, flip_horizontal, flip_vertical)[source]¶

Map bboxes from testing scale to original image scale.

Parameters

bboxes (BaseInstance3DBoxes) – Boxes to be mapped back.
scale_factor (float) – Scale factor.
flip_horizontal (bool) – Whether to flip horizontally.
flip_vertical (bool) – Whether to flip vertically.

Returns

Boxes mapped back.

Return type

BaseInstance3DBoxes

mmdet3d.core.bbox.bbox_overlaps_3d(bboxes1, bboxes2, mode='iou', coordinate='camera')[source]¶

Calculate 3D IoU using cuda implementation.

Note

This function calculates the IoU of 3D boxes based on their volumes. IoU calculator BboxOverlaps3D uses this function to calculate the actual IoUs of boxes.

Parameters

bboxes1 (torch.Tensor) – shape (N, 7+C) [x, y, z, h, w, l, ry].
bboxes2 (torch.Tensor) – shape (M, 7+C) [x, y, z, h, w, l, ry].
mode (str) – “iou” (intersection over union) or iof (intersection over foreground).
coordinate (str) – ‘camera’ or ‘lidar’ coordinate system.

Returns

Bbox overlaps results of bboxes1 and bboxes2 with shape (M, N) (aligned mode is not supported currently).

Return type

torch.Tensor

mmdet3d.core.bbox.bbox_overlaps_nearest_3d(bboxes1, bboxes2, mode='iou', is_aligned=False, coordinate='lidar')[source]¶

Calculate nearest 3D IoU.

Note

This function first finds the nearest 2D boxes in bird eye view (BEV), and then calculates the 2D IoU using bbox_overlaps(). Ths IoU calculator BboxOverlapsNearest3D uses this function to calculate IoUs of boxes.

If is_aligned is False, then it calculates the ious between each bbox of bboxes1 and bboxes2, otherwise the ious between each aligned pair of bboxes1 and bboxes2.

Parameters

bboxes1 (torch.Tensor) – shape (N, 7+C) [x, y, z, h, w, l, ry, v].
bboxes2 (torch.Tensor) – shape (M, 7+C) [x, y, z, h, w, l, ry, v].
mode (str) – “iou” (intersection over union) or iof (intersection over foreground).
is_aligned (bool) – Whether the calculation is aligned

Returns

If is_aligned is True, return ious between bboxes1 and bboxes2 with shape (M, N). If is_aligned is False, return shape is M.

Return type

torch.Tensor

mmdet3d.core.bbox.get_box_type(box_type)[source]¶

Get the type and mode of box structure.

Parameters: box_type (str) – The type of box structure. The valid value are “LiDAR”, “Camera”, or “Depth”.
Returns: Box type and box mode.
Return type: tuple

mmdet3d.core.bbox.limit_period(val, offset=0.5, period=3.141592653589793)[source]¶

Limit the value into a period for periodic function.

Parameters

val (torch.Tensor) – The value to be converted.
offset (float, optional) – Offset to set the value range. Defaults to 0.5.
period ([type], optional) – Period of the value. Defaults to np.pi.

Returns

Value in the range of [-offset * period, (1-offset) * period]

Return type

torch.Tensor

mmdet3d.core.bbox.points_cam2img(points_3d, proj_mat)[source]¶

Project points from camera coordicates to image coordinates.

Parameters

points_3d (torch.Tensor) – Points in shape (N, 3)
proj_mat (torch.Tensor) – Transformation matrix between coordinates.

Returns

Points in image coordinates with shape [N, 2].

Return type

torch.Tensor

mmdet3d.core.bbox.xywhr2xyxyr(boxes_xywhr)[source]¶

Convert a rotated boxes in XYWHR format to XYXYR format.

Parameters: boxes_xywhr (torch.Tensor) – Rotated boxes in XYWHR format.
Returns: Converted boxes in XYXYR format.
Return type: torch.Tensor

evaluation¶

mmdet3d.core.evaluation.indoor_eval(gt_annos, dt_annos, metric, label2cat, logger=None, box_type_3d=None, box_mode_3d=None)[source]¶

Indoor Evaluation.

Evaluate the result of the detection.

Parameters

gt_annos (list[dict]) – Ground truth annotations.
dt_annos (list[dict]) –
Detection annotations. the dict includes the following keys
- labels_3d (torch.Tensor): Labels of boxes.
- boxes_3d (BaseInstance3DBoxes): 3D bounding boxes in Depth coordinate.
- scores_3d (torch.Tensor): Scores of boxes.
metric (list[float]) – IoU thresholds for computing average precisions.
label2cat (dict) – Map from label to category.
logger (logging.Logger | str | None) – The way to print the mAP summary. See mmdet.utils.print_log() for details. Default: None.

Returns

Dict of results.

Return type

dict[str, float]

mmdet3d.core.evaluation.kitti_eval(gt_annos, dt_annos, current_classes, eval_types=['bbox', 'bev', '3d'])[source]¶

KITTI evaluation.

Parameters

gt_annos (list[dict]) – Contain gt information of each sample.
dt_annos (list[dict]) – Contain detected information of each sample.
current_classes (list[str]) – Classes to evaluation.
eval_types (list[str], optional) – Types to eval. Defaults to [‘bbox’, ‘bev’, ‘3d’].

Returns

String and dict of evaluation results.

Return type

tuple

mmdet3d.core.evaluation.kitti_eval_coco_style(gt_annos, dt_annos, current_classes)[source]¶

coco style evaluation of kitti.

Parameters

gt_annos (list[dict]) – Contain gt information of each sample.
dt_annos (list[dict]) – Contain detected information of each sample.
current_classes (list[str]) – Classes to evaluation.

Returns

Evaluation results.

Return type

string

mmdet3d.core.evaluation.lyft_eval(lyft, data_root, res_path, eval_set, output_dir, logger=None)[source]¶

Evaluation API for Lyft dataset.

Parameters

lyft (LyftDataset) – Lyft class in the sdk.
data_root (str) – Root of data for reading splits.
res_path (str) – Path of result json file recording detections.
eval_set (str) – Name of the split for evaluation.
output_dir (str) – Output directory for output json files.
logger (logging.Logger | str | None) – Logger used for printing related information during evaluation. Default: None.

Returns

The evaluation results.

Return type

dict[str, float]

visualizer¶

mmdet3d.core.visualizer.show_result(points, gt_bboxes, pred_bboxes, out_dir, filename)[source]¶

Convert results into format that is directly readable for meshlab.

Parameters

points (np.ndarray) – Points.
gt_bboxes (np.ndarray) – Ground truth boxes.
pred_bboxes (np.ndarray) – Predicted boxes.
out_dir (str) – Path of output directory
filename (str) – Filename of the current frame.

voxel¶

class mmdet3d.core.voxel.VoxelGenerator(voxel_size, point_cloud_range, max_num_points, max_voxels=20000)[source]¶

Voxel generator in numpy implementation.

Parameters

voxel_size (list[float]) – Size of a single voxel
point_cloud_range (list[float]) – Range of points
max_num_points (int) – Maximum number of points in a single voxel
max_voxels (int, optional) – Maximum number of voxels. Defaults to 20000.

generate(points)[source]¶: Generate voxels given points.

property grid_size¶

The size of grids.

Type: np.ndarray

property max_num_points_per_voxel¶

Maximum number of points per voxel.

Type: int

property point_cloud_range¶

Range of point cloud.

Type: list[float]

property voxel_size¶

Size of a single voxel.

Type: list[float]

mmdet3d.core.voxel.build_voxel_generator(cfg, **kwargs)[source]¶: Builder of voxel generator.

post_processing¶

mmdet3d.core.post_processing.aligned_3d_nms(boxes, scores, classes, thresh)[source]¶

3d nms for aligned boxes.

Parameters

boxes (torch.Tensor) – Aligned box with shape [n, 6].
scores (torch.Tensor) – Scores of each box.
classes (torch.Tensor) – Class of each box.
thresh (float) – Iou threshold for nms.

Returns

Indices of selected boxes.

Return type

torch.Tensor

mmdet3d.core.post_processing.box3d_multiclass_nms(mlvl_bboxes, mlvl_bboxes_for_nms, mlvl_scores, score_thr, max_num, cfg, mlvl_dir_scores=None)[source]¶

Multi-class nms for 3D boxes.

Parameters

mlvl_bboxes (torch.Tensor) – Multi-level boxes with shape (N, M). M is the dimensions of boxes.
mlvl_bboxes_for_nms (torch.Tensor) – Multi-level boxes with shape (N, 4). N is the number of boxes.
mlvl_scores (torch.Tensor) – Multi-level boxes with shape (N, ). N is the number of boxes.
score_thr (float) – Score thredhold to filter boxes with low confidence.
max_num (int) – Maximum number of boxes will be kept.
cfg (dict) – Configuration dict of NMS.
mlvl_dir_scores (torch.Tensor, optional) – Multi-level scores of direction classifier. Defaults to None.

Returns

Return results after nms, including 3D bounding boxes, scores, labels and direction scores.

Return type

tuple[torch.Tensor]

mmdet3d.core.post_processing.merge_aug_bboxes_3d(aug_results, img_metas, test_cfg)[source]¶

Merge augmented detection 3D bboxes and scores.

Parameters

aug_results (list[dict]) –
The dict of detection results. The dict contains the following keys
- boxes_3d (BaseInstance3DBoxes): Detection bbox.
- scores_3d (torch.Tensor): Detection scores.
- labels_3d (torch.Tensor): Predicted box labels.
img_metas (list[dict]) – Meta information of each sample.
test_cfg (dict) – Test config.

Returns

Bounding boxes results in cpu mode, containing merged results.

boxes_3d (BaseInstance3DBoxes): Merged detection bbox.

scores_3d (torch.Tensor): Merged detection scores.

labels_3d (torch.Tensor): Merged predicted box labels.

Return type

dict

mmdet3d.datasets¶

class mmdet3d.datasets.Custom3DDataset(data_root, ann_file, pipeline=None, classes=None, modality=None, box_type_3d='LiDAR', filter_empty_gt=True, test_mode=False)[source]¶

Customized 3D dataset.

This is the base dataset of SUNRGB-D, ScanNet, nuScenes, and KITTI dataset.

Parameters

data_root (str) – Path of dataset root.
ann_file (str) – Path of annotation file.
pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.
classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.
modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.
box_type_3d (str, optional) –
Type of 3D box of this dataset. Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘LiDAR’. Available options includes
- ’LiDAR’: Box in LiDAR coordinates.
- ’Depth’: Box in depth coordinates, usually for indoor dataset.
- ’Camera’: Box in camera coordinates.
filter_empty_gt (bool, optional) – Whether to filter empty GT. Defaults to True.
test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.

evaluate(results, metric=None, iou_thr=0.25, 0.5, logger=None, show=False, out_dir=None)[source]¶

Evaluate.

Evaluation in indoor protocol.

Parameters

results (list[dict]) – List of results.
metric (str | list[str]) – Metrics to be evaluated.
iou_thr (list[float]) – AP IoU thresholds.
show (bool) – Whether to visualize. Default: False.
out_dir (str) – Path to save the visualization results. Default: None.

Returns

Evaluation results.

Return type

dict

format_results(outputs, pklfile_prefix=None, submission_prefix=None)[source]¶

Format the results to pkl file.

Parameters

outputs (list[dict]) – Testing results of the dataset.
pklfile_prefix (str | None) – The prefix of pkl files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.

Returns

(outputs, tmp_dir), outputs is the detection results, tmp_dir is the temporal directory created for saving json files when jsonfile_prefix is not specified.

Return type

tuple

classmethod get_classes(classes=None)[source]¶

Get class names of current dataset.

Parameters: classes (Sequence[str] | str | None) – If classes is None, use default CLASSES defined by builtin dataset. If classes is a string, take it as a file name. The file contains the name of classes where each line contains one class name. If classes is a tuple or list, override the CLASSES defined by the dataset.
Returns: A list of class names.
Return type: list[str]

get_data_info(index)[source]¶

Get data info according to the given index.

Parameters

index (int) – Index of the sample data to get.

Returns

Data information that will be passed to the data preprocessing pipelines. It includes the following keys:

sample_idx (str): Sample index.

pts_filename (str): Filename of point clouds.

file_name (str): Filename of point clouds.

ann_info (dict): Annotation info.

Return type

dict

load_annotations(ann_file)[source]¶

Load annotations from ann_file.

Parameters: ann_file (str) – Path of the annotation file.
Returns: List of annotations.
Return type: list[dict]

pre_pipeline(results)[source]¶

Initialization before data preparation.

Parameters

results (dict) –

Dict before data preprocessing.

img_fields (list): Image fields.
bbox3d_fields (list): 3D bounding boxes fields.
pts_mask_fields (list): Mask fields of points.
pts_seg_fields (list): Mask fields of point segments.
bbox_fields (list): Fields of bounding boxes.
mask_fields (list): Fields of masks.
seg_fields (list): Segment fields.
box_type_3d (str): 3D box type.
box_mode_3d (str): 3D box mode.

prepare_test_data(index)[source]¶

Prepare data for testing.

Parameters: index (int) – Index for accessing the target data.
Returns: Testing data dict of the corresponding index.
Return type: dict

prepare_train_data(index)[source]¶

Training data preparation.

Parameters: index (int) – Index for accessing the target data.
Returns: Training data dict of the corresponding index.
Return type: dict

class mmdet3d.datasets.GlobalRotScaleTrans(rot_range=[- 0.78539816, 0.78539816], scale_ratio_range=[0.95, 1.05], translation_std=[0, 0, 0], shift_height=False)[source]¶

Apply global rotation, scaling and translation to a 3D scene.

Parameters

rot_range (list[float]) – Range of rotation angle. Defaults to [-0.78539816, 0.78539816] (close to [-pi/4, pi/4]).
scale_ratio_range (list[float]) – Range of scale ratio. Defaults to [0.95, 1.05].
translation_std (list[float]) – The standard deviation of ranslation noise. This apply random translation to a scene by a noise, which is sampled from a gaussian distribution whose standard deviation is set by translation_std. Defaults to [0, 0, 0]
shift_height (bool) – Whether to shift height. (the fourth dimension of indoor points) when scaling. Defaults to False.

class mmdet3d.datasets.IndoorPointSample(num_points)[source]¶

Indoor point sample.

Sampling data to a certain number.

Parameters

name (str) – Name of the dataset.
num_points (int) – Number of points to be sampled.

points_random_sampling(points, num_samples, replace=None, return_choices=False)[source]¶

Points random sampling.

Sample points to a certain number.

Parameters

points (np.ndarray) – 3D Points.
num_samples (int) – Number of samples to be sampled.
replace (bool) – Whether the sample is with or without replacement.
to None. (Defaults) –
return_choices (bool) – Whether return choice. Defaults to False.

Returns

points (np.ndarray): 3D Points.
choices (np.ndarray, optional): The generated random samples.

Return type

tuple[np.ndarray] | np.ndarray

class mmdet3d.datasets.KittiDataset(data_root, ann_file, split, pts_prefix='velodyne', pipeline=None, classes=None, modality=None, box_type_3d='LiDAR', filter_empty_gt=True, test_mode=False)[source]¶

KITTI Dataset.

This class serves as the API for experiments on the KITTI Dataset.

Parameters

data_root (str) – Path of dataset root.
ann_file (str) – Path of annotation file.
split (str) – Split of input data.
pts_prefix (str, optional) – Prefix of points files. Defaults to ‘velodyne’.
pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.
classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.
modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.
box_type_3d (str, optional) –
Type of 3D box of this dataset. Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘LiDAR’ in this dataset. Available options includes
- ’LiDAR’: Box in LiDAR coordinates.
- ’Depth’: Box in depth coordinates, usually for indoor dataset.
- ’Camera’: Box in camera coordinates.
filter_empty_gt (bool, optional) – Whether to filter empty GT. Defaults to True.
test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.

bbox2result_kitti(net_outputs, class_names, pklfile_prefix=None, submission_prefix=None)[source]¶

Convert 3D detection results to kitti format for evaluation and test submission.

Parameters

net_outputs (list[np.ndarray]) – List of array storing the inferenced bounding boxes and scores.
class_names (list[String]) – A list of class names.
pklfile_prefix (str | None) – The prefix of pkl file.
submission_prefix (str | None) – The prefix of submission file.

Returns

A list of dictionaries with the kitti format.

Return type

list[dict]

bbox2result_kitti2d(net_outputs, class_names, pklfile_prefix=None, submission_prefix=None)[source]¶

Convert 2D detection results to kitti format for evaluation and test submission.

Parameters

net_outputs (list[np.ndarray]) – List of array storing the inferenced bounding boxes and scores.
class_names (list[String]) – A list of class names.
pklfile_prefix (str | None) – The prefix of pkl file.
submission_prefix (str | None) – The prefix of submission file.

Returns

A list of dictionaries have the kitti format

Return type

list[dict]

convert_valid_bboxes(box_dict, info)[source]¶

Convert the predicted boxes into valid ones.

Parameters

box_dict (dict) –
Box dictionaries to be converted.
- boxes_3d (LiDARInstance3DBoxes): 3D bounding boxes.
- scores_3d (torch.Tensor): Scores of boxes.
- labels_3d (torch.Tensor): Class labels of boxes.
info (dict) – Data info.

Returns

Valid predicted boxes.

bbox (np.ndarray): 2D bounding boxes.

box3d_camera (np.ndarray): 3D bounding boxes in camera coordinate.

box3d_lidar (np.ndarray): 3D bounding boxes in LiDAR coordinate.

scores (np.ndarray): Scores of boxes.

label_preds (np.ndarray): Class label predictions.

sample_idx (int): Sample index.

Return type

dict

drop_arrays_by_name(gt_names, used_classes)[source]¶

Drop irrelevant ground truths by name.

Parameters

gt_names (list[str]) – Names of ground truths.
used_classes (list[str]) – Classes of interest.

Returns

Indices of ground truths that will be dropped.

Return type

np.ndarray

evaluate(results, metric=None, logger=None, pklfile_prefix=None, submission_prefix=None, show=False, out_dir=None)[source]¶

Evaluation in KITTI protocol.

Parameters

results (list[dict]) – Testing results of the dataset.
metric (str | list[str]) – Metrics to be evaluated.
logger (logging.Logger | str | None) – Logger used for printing related information during evaluation. Default: None.
pklfile_prefix (str | None) – The prefix of pkl files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
submission_prefix (str | None) – The prefix of submission datas. If not specified, the submission data will not be generated.
show (bool) – Whether to visualize. Default: False.
out_dir (str) – Path to save the visualization results. Default: None.

Returns

Results of each evaluation metric.

Return type

dict[str, float]

format_results(outputs, pklfile_prefix=None, submission_prefix=None)[source]¶

Format the results to pkl file.

Parameters

outputs (list[dict]) – Testing results of the dataset.
pklfile_prefix (str | None) – The prefix of pkl files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
submission_prefix (str | None) – The prefix of submitted files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.

Returns

(result_files, tmp_dir), result_files is a dict containing the json filepaths, tmp_dir is the temporal directory created for saving json files when jsonfile_prefix is not specified.

Return type

tuple

get_ann_info(index)[source]¶

Get annotation info according to the given index.

Parameters

index (int) – Index of the annotation data to get.

Returns

annotation information consists of the following keys:

gt_bboxes_3d (LiDARInstance3DBoxes): 3D ground truth bboxes.

gt_labels_3d (np.ndarray): Labels of ground truths.

gt_bboxes (np.ndarray): 2D ground truth bboxes.

gt_labels (np.ndarray): Labels of ground truths.

gt_names (list[str]): Class names of ground truths.

Return type

dict

get_data_info(index)[source]¶

Get data info according to the given index.

Parameters

index (int) – Index of the sample data to get.

Returns

Data information that will be passed to the data preprocessing pipelines. It includes the following keys:

sample_idx (str): Sample index.

pts_filename (str): Filename of point clouds.

img_prefix (str | None): Prefix of image files.

img_info (dict): Image info.

lidar2img (list[np.ndarray], optional): Transformations from lidar to different cameras.

ann_info (dict): Annotation info.

Return type

dict

keep_arrays_by_name(gt_names, used_classes)[source]¶

Keep useful ground truths by name.

Parameters

gt_names (list[str]) – Names of ground truths.
used_classes (list[str]) – Classes of interest.

Returns

Indices of ground truths that will be keeped.

Return type

np.ndarray

remove_dontcare(ann_info)[source]¶

Remove annotations that do not need to be cared.

Parameters: ann_info (dict) – Dict of annotation infos. The 'DontCare' annotations will be removed according to ann_file[‘name’].
Returns: Annotations after filtering.
Return type: dict

show(results, out_dir)[source]¶

Results visualization.

Parameters

results (list[dict]) – List of bounding boxes results.
out_dir (str) – Output directory of visualization result.

class mmdet3d.datasets.LoadPointsFromFile(load_dim=6, use_dim=[0, 1, 2], shift_height=False, file_client_args={'backend': 'disk'})[source]¶

Load Points From File.

Load sunrgbd and scannet points from file.

Parameters

load_dim (int) – The dimension of the loaded points. Defaults to 6.
use_dim (list[int]) – Which dimensions of the points to be used. Defaults to [0, 1, 2]. For KITTI dataset, set use_dim=4 or use_dim=[0, 1, 2, 3] to use the intensity dimension.
shift_height (bool) – Whether to use shifted height. Defaults to False.
file_client_args (dict) – Config dict of file clients, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/fileio/file_client.py for more details. Defaults to dict(backend=’disk’).

class mmdet3d.datasets.LoadPointsFromMultiSweeps(sweeps_num=10, load_dim=5, file_client_args={'backend': 'disk'})[source]¶

Load points from multiple sweeps.

This is usually used for nuScenes dataset to utilize previous sweeps.

Parameters

sweeps_num (int) – number of sweeps. Defaults to 10.
load_dim (int) – dimension number of the loaded points. Defaults to 5.
file_client_args (dict) – Config dict of file clients, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/fileio/file_client.py for more details. Defaults to dict(backend=’disk’).

class mmdet3d.datasets.LyftDataset(ann_file, pipeline=None, data_root=None, classes=None, load_interval=1, modality=None, box_type_3d='LiDAR', filter_empty_gt=True, test_mode=False)[source]¶

Lyft Dataset.

This class serves as the API for experiments on the Lyft Dataset.

Please refer to https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/data # noqa for data downloading.

Parameters

ann_file (str) – Path of annotation file.
pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.
data_root (str) – Path of dataset root.
classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.
load_interval (int, optional) – Interval of loading the dataset. It is used to uniformly sample the dataset. Defaults to 1.
modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.
box_type_3d (str, optional) –
Type of 3D box of this dataset. Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘LiDAR’ in this dataset. Available options includes
- ’LiDAR’: Box in LiDAR coordinates.
- ’Depth’: Box in depth coordinates, usually for indoor dataset.
- ’Camera’: Box in camera coordinates.
filter_empty_gt (bool, optional) – Whether to filter empty GT. Defaults to True.
test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.

evaluate(results, metric='bbox', logger=None, jsonfile_prefix=None, csv_savepath=None, result_names=['pts_bbox'], show=False, out_dir=None)[source]¶

Evaluation in Lyft protocol.

Parameters

results (list[dict]) – Testing results of the dataset.
metric (str | list[str]) – Metrics to be evaluated.
logger (logging.Logger | str | None) – Logger used for printing related information during evaluation. Default: None.
jsonfile_prefix (str | None) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
csv_savepath (str | None) – The path for saving csv files. It includes the file path and the csv filename, e.g., “a/b/filename.csv”. If not specified, the result will not be converted to csv file.
show (bool) – Whether to visualize. Default: False.
out_dir (str) – Path to save the visualization results. Default: None.

Returns

Evaluation results.

Return type

dict[str, float]

format_results(results, jsonfile_prefix=None, csv_savepath=None)[source]¶

Format the results to json (standard format for COCO evaluation).

Parameters

results (list[dict]) – Testing results of the dataset.
jsonfile_prefix (str | None) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
csv_savepath (str | None) – The path for saving csv files. It includes the file path and the csv filename, e.g., “a/b/filename.csv”. If not specified, the result will not be converted to csv file.

Returns

Returns (result_files, tmp_dir), where result_files is a dict containing the json filepaths, tmp_dir is the temporal directory created for saving json files when jsonfile_prefix is not specified.

Return type

tuple

get_ann_info(index)[source]¶

Get annotation info according to the given index.

Parameters

index (int) – Index of the annotation data to get.

Returns

Annotation information consists of the following keys:

gt_bboxes_3d (LiDARInstance3DBoxes): 3D ground truth bboxes.

gt_labels_3d (np.ndarray): Labels of ground truths.

gt_names (list[str]): Class names of ground truths.

Return type

dict

get_data_info(index)[source]¶

Get data info according to the given index.

Parameters

index (int) – Index of the sample data to get.

Returns

Data information that will be passed to the data preprocessing pipelines. It includes the following keys:

sample_idx (str): sample index

pts_filename (str): filename of point clouds

sweeps (list[dict]): infos of sweeps

timestamp (float): sample timestamp

img_filename (str, optional): image filename

lidar2img (list[np.ndarray], optional): transformations from lidar to different cameras

ann_info (dict): annotation info

Return type

dict

static json2csv(json_path, csv_savepath)[source]¶

Convert the json file to csv format for submission.

Parameters

json_path (str) – Path of the result json file.
csv_savepath (str) – Path to save the csv file.

load_annotations(ann_file)[source]¶

Load annotations from ann_file.

Parameters: ann_file (str) – Path of the annotation file.
Returns: List of annotations sorted by timestamps.
Return type: list[dict]

show(results, out_dir)[source]¶

Results visualization.

Parameters

results (list[dict]) – List of bounding boxes results.
out_dir (str) – Output directory of visualization result.

class mmdet3d.datasets.NormalizePointsColor(color_mean)[source]¶

Normalize color of points.

Parameters: color_mean (list[float]) – Mean color of the point cloud.

class mmdet3d.datasets.NuScenesDataset(ann_file, pipeline=None, data_root=None, classes=None, load_interval=1, with_velocity=True, modality=None, box_type_3d='LiDAR', filter_empty_gt=True, test_mode=False, eval_version='detection_cvpr_2019')[source]¶

NuScenes Dataset.

This class serves as the API for experiments on the NuScenes Dataset.

Please refer to NuScenes Dataset for data downloading.

Parameters

ann_file (str) – Path of annotation file.
pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.
data_root (str) – Path of dataset root.
classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.
load_interval (int, optional) – Interval of loading the dataset. It is used to uniformly sample the dataset. Defaults to 1.
with_velocity (bool, optional) – Whether include velocity prediction into the experiments. Defaults to True.
modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.
box_type_3d (str, optional) –
Type of 3D box of this dataset. Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘LiDAR’ in this dataset. Available options includes
- ’LiDAR’: Box in LiDAR coordinates.
- ’Depth’: Box in depth coordinates, usually for indoor dataset.
- ’Camera’: Box in camera coordinates.
filter_empty_gt (bool, optional) – Whether to filter empty GT. Defaults to True.
test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.
eval_version (bool, optional) – Configuration version of evaluation. Defaults to ‘detection_cvpr_2019’.

evaluate(results, metric='bbox', logger=None, jsonfile_prefix=None, result_names=['pts_bbox'], show=False, out_dir=None)[source]¶

Evaluation in nuScenes protocol.

Parameters

results (list[dict]) – Testing results of the dataset.
metric (str | list[str]) – Metrics to be evaluated.
logger (logging.Logger | str | None) – Logger used for printing related information during evaluation. Default: None.
jsonfile_prefix (str | None) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.
show (bool) – Whether to visualize. Default: False.
out_dir (str) – Path to save the visualization results. Default: None.

Returns

Results of each evaluation metric.

Return type

dict[str, float]

format_results(results, jsonfile_prefix=None)[source]¶

Format the results to json (standard format for COCO evaluation).

Parameters

results (list[dict]) – Testing results of the dataset.
jsonfile_prefix (str | None) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Default: None.

Returns

Returns (result_files, tmp_dir), where result_files is a dict containing the json filepaths, tmp_dir is the temporal directory created for saving json files when jsonfile_prefix is not specified.

Return type

tuple

get_ann_info(index)[source]¶

Get annotation info according to the given index.

Parameters

index (int) – Index of the annotation data to get.

Returns

Annotation information consists of the following keys:

gt_bboxes_3d (LiDARInstance3DBoxes): 3D ground truth bboxes

gt_labels_3d (np.ndarray): Labels of ground truths.

gt_names (list[str]): Class names of ground truths.

Return type

dict

get_data_info(index)[source]¶

Get data info according to the given index.

Parameters

index (int) – Index of the sample data to get.

Returns

Data information that will be passed to the data preprocessing pipelines. It includes the following keys:

sample_idx (str): Sample index.

pts_filename (str): Filename of point clouds.

sweeps (list[dict]): Infos of sweeps.

timestamp (float): Sample timestamp.

img_filename (str, optional): Image filename.

lidar2img (list[np.ndarray], optional): Transformations from lidar to different cameras.

ann_info (dict): Annotation info.

Return type

dict

load_annotations(ann_file)[source]¶

Load annotations from ann_file.

Parameters: ann_file (str) – Path of the annotation file.
Returns: List of annotations sorted by timestamps.
Return type: list[dict]

show(results, out_dir)[source]¶

Results visualization.

Parameters

results (list[dict]) – List of bounding boxes results.
out_dir (str) – Output directory of visualization result.

class mmdet3d.datasets.ObjectNoise(translation_std=[0.25, 0.25, 0.25], global_rot_range=[0.0, 0.0], rot_range=[- 0.15707963267, 0.15707963267], num_try=100)[source]¶

Apply noise to each GT objects in the scene.

Parameters

translation_std (list[float], optional) – Standard deviation of the distribution where translation noise are sampled from. Defaults to [0.25, 0.25, 0.25].
global_rot_range (list[float], optional) – Global rotation to the scene. Defaults to [0.0, 0.0].
rot_range (list[float], optional) – Object rotation range. Defaults to [-0.15707963267, 0.15707963267].
num_try (int, optional) – Number of times to try if the noise applied is invalid. Defaults to 100.

class mmdet3d.datasets.ObjectRangeFilter(point_cloud_range)[source]¶

Filter objects by the range.

Parameters: point_cloud_range (list[float]) – Point cloud range.

class mmdet3d.datasets.ObjectSample(db_sampler, sample_2d=False)[source]¶

Sample GT objects to the data.

Parameters

db_sampler (dict) – Config dict of the database sampler.
sample_2d (bool) – Whether to also paste 2D image patch to the images This should be true when applying multi-modality cut-and-paste. Defaults to False.

static remove_points_in_boxes(points, boxes)[source]¶

Remove the points in the sampled bounding boxes.

Parameters

points (np.ndarray) – Input point cloud array.
boxes (np.ndarray) – Sampled ground truth boxes.

Returns

Points with those in the boxes removed.

Return type

np.ndarray

class mmdet3d.datasets.PointShuffle[source]¶: Shuffle input points.

class mmdet3d.datasets.PointsRangeFilter(point_cloud_range)[source]¶

Filter points by the range.

Parameters: point_cloud_range (list[float]) – Point cloud range.

class mmdet3d.datasets.SUNRGBDDataset(data_root, ann_file, pipeline=None, classes=None, modality=None, box_type_3d='Depth', filter_empty_gt=True, test_mode=False)[source]¶

SUNRGBD Dataset.

This class serves as the API for experiments on the SUNRGBD Dataset.

See the download page for data downloading.

Parameters

data_root (str) – Path of dataset root.
ann_file (str) – Path of annotation file.
pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.
classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.
modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.
box_type_3d (str, optional) –
Type of 3D box of this dataset. Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘Depth’ in this dataset. Available options includes
- ’LiDAR’: Box in LiDAR coordinates.
- ’Depth’: Box in depth coordinates, usually for indoor dataset.
- ’Camera’: Box in camera coordinates.
filter_empty_gt (bool, optional) – Whether to filter empty GT. Defaults to True.
test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.

get_ann_info(index)[source]¶

Get annotation info according to the given index.

Parameters

index (int) – Index of the annotation data to get.

Returns

annotation information consists of the following keys:

gt_bboxes_3d (DepthInstance3DBoxes): 3D ground truth bboxes

gt_labels_3d (np.ndarray): Labels of ground truths.

pts_instance_mask_path (str): Path of instance masks.

pts_semantic_mask_path (str): Path of semantic masks.

Return type

dict

show(results, out_dir)[source]¶

Results visualization.

Parameters

results (list[dict]) – List of bounding boxes results.
out_dir (str) – Output directory of visualization result.

class mmdet3d.datasets.ScanNetDataset(data_root, ann_file, pipeline=None, classes=None, modality=None, box_type_3d='Depth', filter_empty_gt=True, test_mode=False)[source]¶

ScanNet Dataset.

This class serves as the API for experiments on the ScanNet Dataset.

Please refer to the github repo for data downloading.

Parameters

data_root (str) – Path of dataset root.
ann_file (str) – Path of annotation file.
pipeline (list[dict], optional) – Pipeline used for data processing. Defaults to None.
classes (tuple[str], optional) – Classes used in the dataset. Defaults to None.
modality (dict, optional) – Modality to specify the sensor data used as input. Defaults to None.
box_type_3d (str, optional) –
Type of 3D box of this dataset. Based on the box_type_3d, the dataset will encapsulate the box to its original format then converted them to box_type_3d. Defaults to ‘Depth’ in this dataset. Available options includes
- ’LiDAR’: Box in LiDAR coordinates.
- ’Depth’: Box in depth coordinates, usually for indoor dataset.
- ’Camera’: Box in camera coordinates.
filter_empty_gt (bool, optional) – Whether to filter empty GT. Defaults to True.
test_mode (bool, optional) – Whether the dataset is in test mode. Defaults to False.

get_ann_info(index)[source]¶

Get annotation info according to the given index.

Parameters

index (int) – Index of the annotation data to get.

Returns

annotation information consists of the following keys:

gt_bboxes_3d (DepthInstance3DBoxes): 3D ground truth bboxes

gt_labels_3d (np.ndarray): Labels of ground truths.

pts_instance_mask_path (str): Path of instance masks.

pts_semantic_mask_path (str): Path of semantic masks.

Return type

dict

show(results, out_dir)[source]¶

Results visualization.

Parameters

results (list[dict]) – List of bounding boxes results.
out_dir (str) – Output directory of visualization result.

mmdet3d.models¶

detectors¶

backbones¶

class mmdet3d.models.backbones.PointNet2SASSG(in_channels, num_points=2048, 1024, 512, 256, radius=0.2, 0.4, 0.8, 1.2, num_samples=64, 32, 16, 16, sa_channels=64, 64, 128, 128, 128, 256, 128, 128, 256, 128, 128, 256, fp_channels=256, 256, 256, 256, norm_cfg={'type': 'BN2d'}, pool_mod='max', use_xyz=True, normalize_xyz=True)[source]¶

PointNet2 with Single-scale grouping.

Parameters

in_channels (int) – Input channels of point cloud.
num_points (tuple[int]) – The number of points which each SA module samples.
radius (tuple[float]) – Sampling radii of each SA module.
num_samples (tuple[int]) – The number of samples for ball query in each SA module.
sa_channels (tuple[tuple[int]]) – Out channels of each mlp in SA module.
fp_channels (tuple[tuple[int]]) – Out channels of each mlp in FP module.
norm_cfg (dict) – Config of normalization layer.
pool_mod (str) – Pool method (‘max’ or ‘avg’) for SA modules.
use_xyz (bool) – Whether to use xyz as a part of features.
normalize_xyz (bool) – Whether to normalize xyz with radii in each SA module.

forward(points)[source]¶

Forward pass.

Parameters

points (torch.Tensor) – point coordinates with features, with shape (B, N, 3 + input_feature_dim).

Returns

Outputs after SA and FP modules.

fp_xyz (list[torch.Tensor]): The coordinates of each fp features.

fp_features (list[torch.Tensor]): The features from each Feature Propagate Layers.

fp_indices (list[torch.Tensor]): Indices of the input points.

Return type

dict[str, list[torch.Tensor]]

init_weights(pretrained=None)[source]¶: Initialize the weights of PointNet backbone.

class mmdet3d.models.backbones.SECOND(in_channels=128, out_channels=[128, 128, 256], layer_nums=[3, 5, 5], layer_strides=[2, 2, 2], norm_cfg={'eps': 0.001, 'momentum': 0.01, 'type': 'BN'}, conv_cfg={'bias': False, 'type': 'Conv2d'})[source]¶

Backbone network for SECOND/PointPillars/PartA2/MVXNet.

Parameters

in_channels (int) – Input channels.
out_channels (list[int]) – Output channels for multi-scale feature maps.
layer_nums (list[int]) – Number of layers in each stage.
layer_strides (list[int]) – Strides of each stage.
norm_cfg (dict) – Config dict of normalization layers.
conv_cfg (dict) – Config dict of convolutional layers.

forward(x)[source]¶

Forward function.

Parameters: x (torch.Tensor) – Input with shape (N, C, H, W).
Returns: Multi-scale features.
Return type: tuple[torch.Tensor]

init_weights(pretrained=None)[source]¶: Initialize weights of the 2D backbone.

necks¶

class mmdet3d.models.necks.SECONDFPN(in_channels=[128, 128, 256], out_channels=[256, 256, 256], upsample_strides=[1, 2, 4], norm_cfg={'eps': 0.001, 'momentum': 0.01, 'type': 'BN'}, upsample_cfg={'bias': False, 'type': 'deconv'})[source]¶

FPN used in SECOND/PointPillars/PartA2/MVXNet.

Parameters

in_channels (list[int]) – Input channels of multi-scale feature maps
out_channels (list[int]) – Output channels of feature maps
upsample_strides (list[int]) – Strides used to upsample the feature maps
norm_cfg (dict) – Config dict of normalization layers
upsample_cfg (dict) – Config dict of upsample layers

forward(x)[source]¶

Forward function.

Parameters: x (torch.Tensor) – 4D Tensor in (N, C, H, W) shape.
Returns: Multi-level feature maps.
Return type: list[torch.Tensor]

init_weights()[source]¶: Initialize weights of FPN.

dense_heads¶

class mmdet3d.models.dense_heads.Anchor3DHead(num_classes, in_channels, train_cfg, test_cfg, feat_channels=256, use_direction_classifier=True, anchor_generator={'custom_values': [], 'range': [0, - 39.68, - 1.78, 69.12, 39.68, - 1.78], 'reshape_out': False, 'rotations': [0, 1.57], 'sizes': [[1.6, 3.9, 1.56]], 'strides': [2], 'type': 'Anchor3DRangeGenerator'}, assigner_per_size=False, assign_per_class=False, diff_rad_by_sin=True, dir_offset=0, dir_limit_offset=1, bbox_coder={'type': 'DeltaXYZWLHRBBoxCoder'}, loss_cls={'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox={'beta': 0.1111111111111111, 'loss_weight': 2.0, 'type': 'SmoothL1Loss'}, loss_dir={'loss_weight': 0.2, 'type': 'CrossEntropyLoss'})[source]¶

Anchor head for SECOND/PointPillars/MVXNet/PartA2.

Parameters

num_classes (int) – Number of classes.
in_channels (int) – Number of channels in the input feature map.
train_cfg (dict) – Train configs.
test_cfg (dict) – Test configs.
feat_channels (int) – Number of channels of the feature map.
use_direction_classifier (bool) – Whether to add a direction classifier.
anchor_generator (dict) – Config dict of anchor generator.
assigner_per_size (bool) – Whether to do assignment for each separate anchor size.
assign_per_class (bool) – Whether to do assignment for each class.
diff_rad_by_sin (bool) – Whether to change the difference into sin difference for box regression loss.
dir_offset (float | int) – The offset of BEV rotation angles. (TODO: may be moved into box coder)
dir_limit_offset (float | int) – The limited range of BEV rotation angles. (TODO: may be moved into box coder)
bbox_coder (dict) – Config dict of box coders.
loss_cls (dict) – Config of classification loss.
loss_bbox (dict) – Config of localization loss.
loss_dir (dict) – Config of direction classifier loss.

static add_sin_difference(boxes1, boxes2)[source]¶

Convert the rotation difference to difference in sine function.

Parameters

boxes1 (torch.Tensor) – Original Boxes in shape (NxC), where C>=7 and the 7th dimension is rotation dimension.
boxes2 (torch.Tensor) – Target boxes in shape (NxC), where C>=7 and the 7th dimension is rotation dimension.

Returns

boxes1 and boxes2 whose 7th dimensions are changed.

Return type

tuple[torch.Tensor]

forward(feats)[source]¶

Forward pass.

Parameters: feats (list[torch.Tensor]) – Multi-level features, e.g., features produced by FPN.
Returns: Multi-level class score, bbox and direction predictions.
Return type: tuple[list[torch.Tensor]]

forward_single(x)[source]¶

Forward function on a single-scale feature map.

Parameters: x (torch.Tensor) – Input features.
Returns: Contain score of each class, bbox regression and direction classification predictions.
Return type: tuple[torch.Tensor]

get_anchors(featmap_sizes, input_metas, device='cuda')[source]¶

Get anchors according to feature map sizes.

Parameters

featmap_sizes (list[tuple]) – Multi-level feature map sizes.
input_metas (list[dict]) – contain pcd and img’s meta info.
device (str) – device of current module.

Returns

Anchors of each image, valid flags of each image.

Return type

list[list[torch.Tensor]]

get_bboxes(cls_scores, bbox_preds, dir_cls_preds, input_metas, cfg=None, rescale=False)[source]¶

Get bboxes of anchor head.

Parameters

cls_scores (list[torch.Tensor]) – Multi-level class scores.
bbox_preds (list[torch.Tensor]) – Multi-level bbox predictions.
dir_cls_preds (list[torch.Tensor]) – Multi-level direction class predictions.
input_metas (list[dict]) – Contain pcd and img’s meta info.
cfg (None | ConfigDict) – Training or testing config.
rescale (list[torch.Tensor]) – Whether th rescale bbox.

Returns

Prediction resultes of batches.

Return type

list[tuple]

get_bboxes_single(cls_scores, bbox_preds, dir_cls_preds, mlvl_anchors, input_meta, cfg=None, rescale=False)[source]¶

Get bboxes of single branch.

Parameters

cls_scores (torch.Tensor) – Class score in single batch.
bbox_preds (torch.Tensor) – Bbox prediction in single batch.
dir_cls_preds (torch.Tensor) – Predictions of direction class in single batch.
mlvl_anchors (List[torch.Tensor]) – Multi-level anchors in single batch.
input_meta (list[dict]) – Contain pcd and img’s meta info.
cfg (None | ConfigDict) – Training or testing config.
rescale (list[torch.Tensor]) – whether th rescale bbox.

Returns

Contain predictions of single batch.

bboxes (BaseInstance3DBoxes): Predicted 3d bboxes.

scores (torch.Tensor): Class score of each bbox.

labels (torch.Tensor): Label of each bbox.

Return type

tuple

init_weights()[source]¶: Initialize the weights of head.

loss(cls_scores, bbox_preds, dir_cls_preds, gt_bboxes, gt_labels, input_metas, gt_bboxes_ignore=None)[source]¶

Calculate losses.

Parameters

cls_scores (list[torch.Tensor]) – Multi-level class scores.
bbox_preds (list[torch.Tensor]) – Multi-level bbox predictions.
dir_cls_preds (list[torch.Tensor]) – Multi-level direction class predictions.
gt_bboxes (list[BaseInstance3DBoxes]) – Gt bboxes of each sample.
gt_labels (list[torch.Tensor]) – Gt labels of each sample.
input_metas (list[dict]) – Contain pcd and img’s meta info.
gt_bboxes_ignore (None | list[torch.Tensor]) – Specify which bounding.

Returns

Classification, bbox, and direction losses of each level.

loss_cls (list[torch.Tensor]): Classification losses.

loss_bbox (list[torch.Tensor]): Box regression losses.

loss_dir (list[torch.Tensor]): Direction classification losses.

Return type

dict[str, list[torch.Tensor]]

loss_single(cls_score, bbox_pred, dir_cls_preds, labels, label_weights, bbox_targets, bbox_weights, dir_targets, dir_weights, num_total_samples)[source]¶

Calculate loss of Single-level results.

Parameters

cls_score (torch.Tensor) – Class score in single-level.
bbox_pred (torch.Tensor) – Bbox prediction in single-level.
dir_cls_preds (torch.Tensor) – Predictions of direction class in single-level.
labels (torch.Tensor) – Labels of class.
label_weights (torch.Tensor) – Weights of class loss.
bbox_targets (torch.Tensor) – Targets of bbox predictions.
bbox_weights (torch.Tensor) – Weights of bbox loss.
dir_targets (torch.Tensor) – Targets of direction predictions.
dir_weights (torch.Tensor) – Weights of direction loss.
num_total_samples (int) – The number of valid samples.

Returns

Losses of class, bbox and direction, respectively.

Return type

tuple[torch.Tensor]

class mmdet3d.models.dense_heads.FreeAnchor3DHead(pre_anchor_topk=50, bbox_thr=0.6, gamma=2.0, alpha=0.5, **kwargs)[source]¶

FreeAnchor head for 3D detection.

Note

This implementation is directly modified from the mmdet implementation # noqa We find it also works on 3D detection with minor modification, i.e., different hyper-parameters and a additional direction classifier.

Parameters

pre_anchor_topk (int) – Number of boxes that be token in each bag.
bbox_thr (float) – The threshold of the saturated linear function. It is usually the same with the IoU threshold used in NMS.
gamma (float) – Gamma parameter in focal loss.
alpha (float) – Alpha parameter in focal loss.
kwargs (dict) – Other arguments are the same as those in Anchor3DHead.

loss(cls_scores, bbox_preds, dir_cls_preds, gt_bboxes, gt_labels, input_metas, gt_bboxes_ignore=None)[source]¶

Calculate loss of FreeAnchor head.

Parameters

cls_scores (list[torch.Tensor]) – Classification scores of different samples.
bbox_preds (list[torch.Tensor]) – Box predictions of different samples
dir_cls_preds (list[torch.Tensor]) – Direction predictions of different samples
gt_bboxes (list[BaseInstance3DBoxes]) – Ground truth boxes.
gt_labels (list[torch.Tensor]) – Ground truth labels.
input_metas (list[dict]) – List of input meta information.
gt_bboxes_ignore (list[BaseInstance3DBoxes], optional) – Ground truth boxes that should be ignored. Defaults to None.

Returns

Loss items.

positive_bag_loss (torch.Tensor): Loss of positive samples.

negative_bag_loss (torch.Tensor): Loss of negative samples.

Return type

dict[str, torch.Tensor]

negative_bag_loss(cls_prob, box_prob)[source]¶

Generate negative bag loss.

Parameters

cls_prob (torch.Tensor) – Classification probability of negative samples.
box_prob (torch.Tensor) – Bounding box probability of negative samples.

Returns

Loss of negative samples.

Return type

torch.Tensor

positive_bag_loss(matched_cls_prob, matched_box_prob)[source]¶

Generate positive bag loss.

Parameters

matched_cls_prob (torch.Tensor) – Classification probability of matched positive samples.
matched_box_prob (torch.Tensor) – Bounding box probability of matched positive samples.

Returns

Loss of positive samples.

Return type

torch.Tensor

class mmdet3d.models.dense_heads.PartA2RPNHead(num_classes, in_channels, train_cfg, test_cfg, feat_channels=256, use_direction_classifier=True, anchor_generator={'custom_values': [], 'range': [0, - 39.68, - 1.78, 69.12, 39.68, - 1.78], 'reshape_out': False, 'rotations': [0, 1.57], 'sizes': [[1.6, 3.9, 1.56]], 'strides': [2], 'type': 'Anchor3DRangeGenerator'}, assigner_per_size=False, assign_per_class=False, diff_rad_by_sin=True, dir_offset=0, dir_limit_offset=1, bbox_coder={'type': 'DeltaXYZWLHRBBoxCoder'}, loss_cls={'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox={'beta': 0.1111111111111111, 'loss_weight': 2.0, 'type': 'SmoothL1Loss'}, loss_dir={'loss_weight': 0.2, 'type': 'CrossEntropyLoss'})[source]¶

RPN head for PartA2.

Note

The main difference between the PartA2 RPN head and the Anchor3DHead lies in their output during inference. PartA2 RPN head further returns the original classification score for the second stage since the bbox head in RoI head does not do classification task.

Different from RPN heads in 2D detectors, this RPN head does multi-class classification task and uses FocalLoss like the SECOND and PointPillars do. But this head uses class agnostic nms rather than multi-class nms.

Parameters

num_classes (int) – Number of classes.
in_channels (int) – Number of channels in the input feature map.
train_cfg (dict) – Train configs.
test_cfg (dict) – Test configs.
feat_channels (int) – Number of channels of the feature map.
use_direction_classifier (bool) – Whether to add a direction classifier.
anchor_generator (dict) – Config dict of anchor generator.
assigner_per_size (bool) – Whether to do assignment for each separate anchor size.
assign_per_class (bool) – Whether to do assignment for each class.
diff_rad_by_sin (bool) – Whether to change the difference into sin difference for box regression loss.
dir_offset (float | int) – The offset of BEV rotation angles (TODO: may be moved into box coder)
dir_limit_offset (float | int) – The limited range of BEV rotation angles. (TODO: may be moved into box coder)
bbox_coder (dict) – Config dict of box coders.
loss_cls (dict) – Config of classification loss.
loss_bbox (dict) – Config of localization loss.
loss_dir (dict) – Config of direction classifier loss.

class_agnostic_nms(mlvl_bboxes, mlvl_bboxes_for_nms, mlvl_max_scores, mlvl_label_pred, mlvl_cls_score, mlvl_dir_scores, score_thr, max_num, cfg, input_meta)[source]¶

Class agnostic nms for single batch.

Parameters

mlvl_bboxes (torch.Tensor) – Bboxes from Multi-level.
mlvl_bboxes_for_nms (torch.Tensor) – Bboxes for nms (bev or minmax boxes) from Multi-level.
mlvl_max_scores (torch.Tensor) – Max scores of Multi-level bbox.
mlvl_label_pred (torch.Tensor) – Class predictions of Multi-level bbox.
mlvl_cls_score (torch.Tensor) – Class scores of Multi-level bbox.
mlvl_dir_scores (torch.Tensor) – Direction scores of Multi-level bbox.
score_thr (int) – Score threshold.
max_num (int) – Max number of bboxes after nms.
cfg (None | ConfigDict) – Training or testing config.
input_meta (dict) – Contain pcd and img’s meta info.

Returns

Predictions of single batch. Contain the keys:

boxes_3d (BaseInstance3DBoxes): Predicted 3d bboxes.

scores_3d (torch.Tensor): Score of each bbox.

labels_3d (torch.Tensor): Label of each bbox.

cls_preds (torch.Tensor): Class score of each bbox.

Return type

dict

get_bboxes_single(cls_scores, bbox_preds, dir_cls_preds, mlvl_anchors, input_meta, cfg, rescale=False)[source]¶

Get bboxes of single branch.

Parameters

cls_scores (torch.Tensor) – Class score in single batch.
bbox_preds (torch.Tensor) – Bbox prediction in single batch.
dir_cls_preds (torch.Tensor) – Predictions of direction class in single batch.
mlvl_anchors (List[torch.Tensor]) – Multi-level anchors in single batch.
input_meta (list[dict]) – Contain pcd and img’s meta info.
cfg (None | ConfigDict) – Training or testing config.
rescale (list[torch.Tensor]) – whether th rescale bbox.

Returns

Predictions of single batch containing the following keys:

boxes_3d (BaseInstance3DBoxes): Predicted 3d bboxes.

scores_3d (torch.Tensor): Score of each bbox.

labels_3d (torch.Tensor): Label of each bbox.

cls_preds (torch.Tensor): Class score of each bbox.

Return type

dict

loss(cls_scores, bbox_preds, dir_cls_preds, gt_bboxes, gt_labels, input_metas, gt_bboxes_ignore=None)[source]¶

Calculate losses.

Parameters

cls_scores (list[torch.Tensor]) – Multi-level class scores.
bbox_preds (list[torch.Tensor]) – Multi-level bbox predictions.
dir_cls_preds (list[torch.Tensor]) – Multi-level direction class predictions.
gt_bboxes (list[BaseInstance3DBoxes]) – Ground truth boxes of each sample.
gt_labels (list[torch.Tensor]) – Labels of each sample.
input_metas (list[dict]) – Point cloud and image’s meta info.
gt_bboxes_ignore (None | list[torch.Tensor]) – Specify which bounding.

Returns

Classification, bbox, and direction losses of each level.

loss_rpn_cls (list[torch.Tensor]): Classification losses.

loss_rpn_bbox (list[torch.Tensor]): Box regression losses.

loss_rpn_dir (list[torch.Tensor]): Direction classification losses.

Return type

dict[str, list[torch.Tensor]]

class mmdet3d.models.dense_heads.VoteHead(num_classes, bbox_coder, train_cfg=None, test_cfg=None, vote_moudule_cfg=None, vote_aggregation_cfg=None, feat_channels=128, 128, conv_cfg={'type': 'Conv1d'}, norm_cfg={'type': 'BN1d'}, objectness_loss=None, center_loss=None, dir_class_loss=None, dir_res_loss=None, size_class_loss=None, size_res_loss=None, semantic_loss=None)[source]¶

Bbox head of Votenet.

Parameters

num_classes (int) – The number of class.
bbox_coder (BaseBBoxCoder) – Bbox coder for encoding and decoding boxes.
train_cfg (dict) – Config for training.
test_cfg (dict) – Config for testing.
vote_moudule_cfg (dict) – Config of VoteModule for point-wise votes.
vote_aggregation_cfg (dict) – Config of vote aggregation layer.
feat_channels (tuple[int]) – Convolution channels of prediction layer.
conv_cfg (dict) – Config of convolution in prediction layer.
norm_cfg (dict) – Config of BN in prediction layer.
objectness_loss (dict) – Config of objectness loss.
center_loss (dict) – Config of center loss.
dir_class_loss (dict) – Config of direction classification loss.
dir_res_loss (dict) – Config of direction residual regression loss.
size_class_loss (dict) – Config of size classification loss.
size_res_loss (dict) – Config of size residual regression loss.
semantic_loss (dict) – Config of point-wise semantic segmentation loss.

forward(feat_dict, sample_mod)[source]¶

Forward pass.

Note

The forward of VoteHead is devided into 4 steps:

Generate vote_points from seed_points.

Aggregate vote_points.

Predict bbox and score.

Decode predictions.

Parameters

feat_dict (dict) – Feature dict from backbone.
sample_mod (str) – Sample mode for vote aggregation layer. valid modes are “vote”, “seed” and “random”.

Returns

Predictions of vote head.

Return type

dict

get_bboxes(points, bbox_preds, input_metas, rescale=False)[source]¶

Generate bboxes from vote head predictions.

Parameters

points (torch.Tensor) – Input points.
bbox_preds (dict) – Predictions from vote head.
input_metas (list[dict]) – Point cloud and image’s meta info.
rescale (bool) – Whether to rescale bboxes.

Returns

Bounding boxes, scores and labels.

Return type

list[tuple[torch.Tensor]]

get_targets(points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, bbox_preds=None)[source]¶

Generate targets of vote head.

Parameters

points (list[torch.Tensor]) – Points of each batch.
gt_bboxes_3d (list[BaseInstance3DBoxes]) – Ground truth bboxes of each batch.
gt_labels_3d (list[torch.Tensor]) – Labels of each batch.
pts_semantic_mask (None | list[torch.Tensor]) – Point-wise semantic label of each batch.
pts_instance_mask (None | list[torch.Tensor]) – Point-wise instance label of each batch.
bbox_preds (torch.Tensor) – Bounding box predictions of vote head.

Returns

Targets of vote head.

Return type

tuple[torch.Tensor]

get_targets_single(points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, aggregated_points=None)[source]¶

Generate targets of vote head for single batch.

Parameters

points (torch.Tensor) – Points of each batch.
gt_bboxes_3d (BaseInstance3DBoxes) – Ground truth boxes of each batch.
gt_labels_3d (torch.Tensor) – Labels of each batch.
pts_semantic_mask (None | torch.Tensor) – Point-wise semantic label of each batch.
pts_instance_mask (None | torch.Tensor) – Point-wise instance label of each batch.
aggregated_points (torch.Tensor) – Aggregated points from vote aggregation layer.

Returns

Targets of vote head.

Return type

tuple[torch.Tensor]

init_weights()[source]¶: Initialize weights of VoteHead.

loss(bbox_preds, points, gt_bboxes_3d, gt_labels_3d, pts_semantic_mask=None, pts_instance_mask=None, img_metas=None, gt_bboxes_ignore=None)[source]¶

Compute loss.

Parameters

bbox_preds (dict) – Predictions from forward of vote head.
points (list[torch.Tensor]) – Input points.
gt_bboxes_3d (list[BaseInstance3DBoxes]) – Ground truth bboxes of each sample.
gt_labels_3d (list[torch.Tensor]) – Labels of each sample.
pts_semantic_mask (None | list[torch.Tensor]) – Point-wise semantic mask.
pts_instance_mask (None | list[torch.Tensor]) – Point-wise instance mask.
img_metas (list[dict]) – Contain pcd and img’s meta info.
gt_bboxes_ignore (None | list[torch.Tensor]) – Specify which bounding.

Returns

Losses of Votenet.

Return type

dict

multiclass_nms_single(obj_scores, sem_scores, bbox, points, input_meta)[source]¶

Multi-class nms in single batch.

Parameters

obj_scores (torch.Tensor) – Objectness score of bounding boxes.
sem_scores (torch.Tensor) – semantic class score of bounding boxes.
bbox (torch.Tensor) – Predicted bounding boxes.
points (torch.Tensor) – Input points.
input_meta (dict) – Point cloud and image’s meta info.

Returns

Bounding boxes, scores and labels.

Return type

tuple[torch.Tensor]

roi_heads¶

class mmdet3d.models.roi_heads.Base3DRoIHead(bbox_head=None, mask_roi_extractor=None, mask_head=None, train_cfg=None, test_cfg=None)[source]¶

Base class for 3d RoIHeads.

aug_test(x, proposal_list, img_metas, rescale=False, **kwargs)[source]¶

Test with augmentations.

If rescale is False, then returned bboxes and masks will fit the scale of imgs[0].

abstract forward_train(x, img_metas, proposal_list, gt_bboxes, gt_labels, gt_bboxes_ignore=None, **kwargs)[source]¶

Forward function during training.

Parameters

x (dict) – Contains features from the first stage.
img_metas (list[dict]) – Meta info of each image.
proposal_list (list[dict]) – Proposal information from rpn.
gt_bboxes (list[BaseInstance3DBoxes]) – GT bboxes of each sample. The bboxes are encapsulated by 3D box structures.
gt_labels (list[torch.LongTensor]) – GT labels of each sample.
gt_bboxes_ignore (list[torch.Tensor], optional) – Ground truth boxes to be ignored.

Returns

Losses from each head.

Return type

dict[str, torch.Tensor]

abstract init_assigner_sampler()[source]¶: Initialize assigner and sampler.

abstract init_bbox_head()[source]¶: Initialize the box head.

abstract init_mask_head()[source]¶: Initialize maek head.

abstract init_weights(pretrained)[source]¶: Initialize the module with pre-trained weights.

simple_test(x, proposal_list, img_metas, proposals=None, rescale=False, **kwargs)[source]¶: Test without augmentation.

property with_bbox¶

whether the RoIHead has box head

Type: bool

property with_mask¶

whether the RoIHead has mask head

Type: bool

class mmdet3d.models.roi_heads.PartA2BboxHead(num_classes, seg_in_channels, part_in_channels, seg_conv_channels=None, part_conv_channels=None, merge_conv_channels=None, down_conv_channels=None, shared_fc_channels=None, cls_channels=None, reg_channels=None, dropout_ratio=0.1, roi_feat_size=14, with_corner_loss=True, bbox_coder={'type': 'DeltaXYZWLHRBBoxCoder'}, conv_cfg={'type': 'Conv1d'}, norm_cfg={'eps': 0.001, 'momentum': 0.01, 'type': 'BN1d'}, loss_bbox={'beta': 0.1111111111111111, 'loss_weight': 2.0, 'type': 'SmoothL1Loss'}, loss_cls={'loss_weight': 1.0, 'reduction': 'none', 'type': 'CrossEntropyLoss', 'use_sigmoid': True})[source]¶

PartA2 RoI head.

Parameters

num_classes (int) – The number of classes to prediction.
seg_in_channels (int) – Input channels of segmentation convolution layer.
part_in_channels (int) – Input channels of part convolution layer.
seg_conv_channels (list(int)) – Out channels of each segmentation convolution layer.
part_conv_channels (list(int)) – Out channels of each part convolution layer.
merge_conv_channels (list(int)) – Out channels of each feature merged convolution layer.
down_conv_channels (list(int)) – Out channels of each downsampled convolution layer.
shared_fc_channels (list(int)) – Out channels of each shared fc layer.
cls_channels (list(int)) – Out channels of each classification layer.
reg_channels (list(int)) – Out channels of each regression layer.
dropout_ratio (float) – Dropout ratio of classification and regression layers.
roi_feat_size (int) – The size of pooled roi features.
with_corner_loss (bool) – Whether to use corner loss or not.
bbox_coder (BaseBBoxCoder) – Bbox coder for box head.
conv_cfg (dict) – Config dict of convolutional layers
norm_cfg (dict) – Config dict of normalization layers
loss_bbox (dict) – Config dict of box regression loss.
loss_cls (dict) – Config dict of classifacation loss.

forward(seg_feats, part_feats)[source]¶

Forward pass.

Parameters

seg_feats (torch.Tensor) – Point-wise semantic features.
part_feats (torch.Tensor) – Point-wise part prediction features.

Returns

Score of class and bbox predictions.

Return type

tuple[torch.Tensor]

get_bboxes(rois, cls_score, bbox_pred, class_labels, class_pred, img_metas, cfg=None)[source]¶

Generate bboxes from bbox head predictions.

Parameters

rois (torch.Tensor) – Roi bounding boxes.
cls_score (torch.Tensor) – Scores of bounding boxes.
bbox_pred (torch.Tensor) – Bounding boxes predictions
class_labels (torch.Tensor) – Label of classes
class_pred (torch.Tensor) – Score for nms.
img_metas (list[dict]) – Point cloud and image’s meta info.
cfg (ConfigDict) – Testing config.

Returns

Decoded bbox, scores and labels after nms.

Return type

list[tuple]

get_corner_loss_lidar(pred_bbox3d, gt_bbox3d, delta=1)[source]¶

Calculate corner loss of given boxes.

Parameters

pred_bbox3d (torch.FloatTensor) – Predicted boxes in shape (N, 7).
gt_bbox3d (torch.FloatTensor) – Ground truth boxes in shape (N, 7).

Returns

Calculated corner loss in shape (N).

Return type

torch.FloatTensor

get_targets(sampling_results, rcnn_train_cfg, concat=True)[source]¶

Generate targets.

Parameters

sampling_results (list[SamplingResult]) – Sampled results from rois.
rcnn_train_cfg (ConfigDict) – Training config of rcnn.
concat (bool) – Whether to concatenate targets between batches.

Returns

Targets of boxes and class prediction.

Return type

tuple[torch.Tensor]

init_weights()[source]¶: Initialize weights of the bbox head.

loss(cls_score, bbox_pred, rois, labels, bbox_targets, pos_gt_bboxes, reg_mask, label_weights, bbox_weights)[source]¶

Coumputing losses.

Parameters

cls_score (torch.Tensor) – Scores of each roi.
bbox_pred (torch.Tensor) – Predictions of bboxes.
rois (torch.Tensor) – Roi bboxes.
labels (torch.Tensor) – Labels of class.
bbox_targets (torch.Tensor) – Target of positive bboxes.
pos_gt_bboxes (torch.Tensor) – Ground truths of positive bboxes.
reg_mask (torch.Tensor) – Mask for positive bboxes.
label_weights (torch.Tensor) – Weights of class loss.
bbox_weights (torch.Tensor) – Weights of bbox loss.

Returns

Computed losses.

loss_cls (torch.Tensor): Loss of classes.

loss_bbox (torch.Tensor): Loss of bboxes.

loss_corner (torch.Tensor): Loss of corners.

Return type

dict

multi_class_nms(box_probs, box_preds, score_thr, nms_thr, input_meta, use_rotate_nms=True)[source]¶

Multi-class NMS for box head.

Note

This function has large overlap with the box3d_multiclass_nms implemented in mmdet3d.core.post_processing. We are considering merging these two functions in the future.

Parameters

box_probs (torch.Tensor) – Predicted boxes probabitilies in shape (N,).
box_preds (torch.Tensor) – Predicted boxes in shape (N, 7+C).
score_thr (float) – Threshold of scores.
nms_thr (float) – Threshold for NMS.
input_meta (dict) – Meta informations of the current sample.
use_rotate_nms (bool, optional) – Whether to use rotated nms. Defaults to True.

Returns

Selected indices.

Return type

torch.Tensor

class mmdet3d.models.roi_heads.PointwiseSemanticHead(in_channels, num_classes=3, extra_width=0.2, seg_score_thr=0.3, loss_seg={'alpha': 0.25, 'gamma': 2.0, 'loss_weight': 1.0, 'reduction': 'sum', 'type': 'FocalLoss', 'use_sigmoid': True}, loss_part={'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': True})[source]¶

Semantic segmentation head for point-wise segmentation.

Predict point-wise segmentation and part regression results for PartA2. See paper for more detials.

Parameters

in_channels (int) – The number of input channel.
num_classes (int) – The number of class.
extra_width (float) – Boxes enlarge width.
loss_seg (dict) – Config of segmentation loss.
loss_part (dict) – Config of part prediction loss.

forward(x)[source]¶

Forward pass.

Parameters

x (torch.Tensor) – Features from the first stage.

Returns

Part features, segmentation and part predictions.

seg_preds (torch.Tensor): Segment predictions.

part_preds (torch.Tensor): Part predictions.

part_feats (torch.Tensor): Feature predictions.

Return type

dict

get_targets(voxels_dict, gt_bboxes_3d, gt_labels_3d)[source]¶

generate segmentation and part prediction targets.

Parameters

voxel_centers (torch.Tensor) – The center of voxels in shape (voxel_num, 3).
gt_bboxes_3d (BaseInstance3DBoxes) – Ground truth boxes in shape (box_num, 7).
gt_labels_3d (torch.Tensor) – Class labels of ground truths in shape (box_num).

Returns

Prediction targets

seg_targets (torch.Tensor): Segmentation targets with shape [voxel_num].

part_targets (torch.Tensor): Part prediction targets with shape [voxel_num, 3].

Return type

dict

get_targets_single(voxel_centers, gt_bboxes_3d, gt_labels_3d)[source]¶

generate segmentation and part prediction targets for a single sample.

Parameters

voxel_centers (torch.Tensor) – The center of voxels in shape (voxel_num, 3).
gt_bboxes_3d (BaseInstance3DBoxes) – Ground truth boxes in shape (box_num, 7).
gt_labels_3d (torch.Tensor) – Class labels of ground truths in shape (box_num).

Returns

Segmentation targets with shape [voxel_num] part prediction targets with shape [voxel_num, 3]

Return type

tuple[torch.Tensor]

loss(semantic_results, semantic_targets)[source]¶

Calculate point-wise segmentation and part prediction losses.

Parameters

semantic_results (dict) –
Results from semantic head.
- seg_preds: Segmentation predictions.
- part_preds: Part predictions.
semantic_targets (dict) –
Targets of semantic results.
- seg_preds: Segmentation targets.
- part_preds: Part targets.

Returns

Loss of segmentation and part prediction.

loss_seg (torch.Tensor): Segmentation prediction loss.

loss_part (torch.Tensor): Part prediction loss.

Return type

dict

class mmdet3d.models.roi_heads.Single3DRoIAwareExtractor(roi_layer=None)[source]¶

Point-wise roi-aware Extractor.

Extract Point-wise roi features.

Parameters: roi_layer (dict) – The config of roi layer.

build_roi_layers(layer_cfg)[source]¶: Build roi layers using layer_cfg

forward(feats, coordinate, batch_inds, rois)[source]¶

Extract point-wise roi features.

Parameters

feats (torch.FloatTensor) – Point-wise features with shape (batch, npoints, channels) for pooling.
coordinate (torch.FloatTensor) – Coordinate of each point.
batch_inds (torch.LongTensor) – Indicate the batch of each point.
rois (torch.FloatTensor) – Roi boxes with batch indices.

Returns

Pooled features

Return type

torch.FloatTensor

fusion_layers¶

class mmdet3d.models.fusion_layers.PointFusion(img_channels, pts_channels, mid_channels, out_channels, img_levels=3, conv_cfg=None, norm_cfg=None, act_cfg=None, activate_out=True, fuse_out=False, dropout_ratio=0, aligned=True, align_corners=True, padding_mode='zeros', lateral_conv=True)[source]¶

Fuse image features from multi-scale features.

Parameters

img_channels (list[int] | int) – Channels of image features. It could be a list if the input is multi-scale image features.
pts_channels (int) – Channels of point features
mid_channels (int) – Channels of middle layers
out_channels (int) – Channels of output fused features
img_levels (int, optional) – Number of image levels. Defaults to 3.
conv_cfg (dict, optional) – Dict config of conv layers of middle layers. Defaults to None.
norm_cfg (dict, optional) – Dict config of norm layers of middle layers. Defaults to None.
act_cfg (dict, optional) – Dict config of activatation layers. Defaults to None.
activate_out (bool, optional) – Whether to apply relu activation to output features. Defaults to True.
fuse_out (bool, optional) – Whether apply conv layer to the fused features. Defaults to False.
dropout_ratio (int, float, optional) – Dropout ratio of image features to prevent overfitting. Defaults to 0.
aligned (bool, optional) – Whether apply aligned feature fusion. Defaults to True.
align_corners (bool, optional) – Whether to align corner when sampling features according to points. Defaults to True.
padding_mode (str, optional) – Mode used to pad the features of points that do not have corresponding image features. Defaults to ‘zeros’.
lateral_conv (bool, optional) – Whether to apply lateral convs to image features. Defaults to True.

forward(img_feats, pts, pts_feats, img_metas)[source]¶

Forward function.

Parameters

img_feats (list[torch.Tensor]) – Image features.
pts – [list[torch.Tensor]]: A batch of points with shape N x 3.
pts_feats (torch.Tensor) – A tensor consist of point features of the total batch.
img_metas (list[dict]) – Meta information of images.

Returns

Fused features of each point.

Return type

torch.Tensor

init_weights()[source]¶: Initialize the weights of modules.

obtain_mlvl_feats(img_feats, pts, img_metas)[source]¶

Obtain multi-level features for each point.

Parameters

img_feats (list(torch.Tensor)) – Multi-scale image features produced by image backbone in shape (N, C, H, W).
pts (list[torch.Tensor]) – Points of each sample.
img_metas (list[dict]) – Meta information for each sample.

Returns

Corresponding image features of each point.

Return type

torch.Tensor

sample_single(img_feats, pts, img_meta)[source]¶

Sample features from single level image feature map.

Parameters

img_feats (torch.Tensor) – Image feature map in shape (N, C, H, W).
pts (torch.Tensor) – Points of a single sample.
img_meta (dict) – Meta information of the single sample.

Returns

Single level image features of each point.

Return type

torch.Tensor

losses¶

class mmdet3d.models.losses.ChamferDistance(mode='l2', reduction='mean', loss_src_weight=1.0, loss_dst_weight=1.0)[source]¶

Calculate Chamfer Distance of two sets.

Parameters

mode (str) – Criterion mode to calculate distance. The valid modes are smooth_l1, l1 or l2.
reduction (str) – Method to reduce losses. The valid reduction method are none, sum or mean.
loss_src_weight (float) – Weight of loss_source.
loss_dst_weight (float) – Weight of loss_target.

forward(source, target, src_weight=1.0, dst_weight=1.0, reduction_override=None, return_indices=False, **kwargs)[source]¶

Forward function of loss calculation.

Parameters

source (torch.Tensor) – Source set with shape [B, N, C] to calculate Chamfer Distance.
target (torch.Tensor) – Destination set with shape [B, M, C] to calculate Chamfer Distance.
src_weight (torch.Tensor | float, optional) – Weight of source loss. Defaults to 1.0.
dst_weight (torch.Tensor | float, optional) – Weight of destination loss. Defaults to 1.0.
reduction_override (str, optional) – Method to reduce losses. The valid reduction method are ‘none’, ‘sum’ or ‘mean’. Defaults to None.
return_indices (bool, optional) – Whether to return indices. Defaults to False.

Returns

If return_indices=True, return losses of source and target with their corresponding indices in the order of (loss_source, loss_target, indices1, indices2). If return_indices=False, return (loss_source, loss_target).

Return type

tuple[torch.Tensor]

mmdet3d.models.losses.chamfer_distance(src, dst, src_weight=1.0, dst_weight=1.0, criterion_mode='l2', reduction='mean')[source]¶

Calculate Chamfer Distance of two sets.

Parameters

src (torch.Tensor) – Source set with shape [B, N, C] to calculate Chamfer Distance.
dst (torch.Tensor) – Destination set with shape [B, M, C] to calculate Chamfer Distance.
src_weight (torch.Tensor or float) – Weight of source loss.
dst_weight (torch.Tensor or float) – Weight of destination loss.
criterion_mode (str) – Criterion mode to calculate distance. The valid modes are smooth_l1, l1 or l2.
reduction (str) – Method to reduce losses. The valid reduction method are ‘none’, ‘sum’ or ‘mean’.

Returns

Source and Destination loss with the corresponding indices.

loss_src (torch.Tensor): The min distance from source to destination.

loss_dst (torch.Tensor): The min distance from destination to source.

indices1 (torch.Tensor): Index the min distance point for each point in source to destination.

indices2 (torch.Tensor): Index the min distance point for each point in destination to source.

Return type

tuple

middle_encoders¶

class mmdet3d.models.middle_encoders.PointPillarsScatter(in_channels, output_shape)[source]¶

Point Pillar’s Scatter.

Converts learned features from dense tensor to sparse pseudo image.

Parameters

in_channels (int) – Channels of input features.
output_shape (list[int]) – Required output shape of features.

forward(voxel_features, coors, batch_size=None)[source]¶: Foraward function to scatter features.

forward_batch(voxel_features, coors, batch_size)[source]¶

Scatter features of single sample.

Parameters

voxel_features (torch.Tensor) – Voxel features in shape (N, M, C).
coors (torch.Tensor) – Coordinates of each voxel in shape (N, 4). The first column indicates the sample ID.
batch_size (int) – Number of samples in the current batch.

forward_single(voxel_features, coors)[source]¶

Scatter features of single sample.

Parameters

voxel_features (torch.Tensor) – Voxel features in shape (N, M, C).
coors (torch.Tensor) – Coordinates of each voxel. The first column indicates the sample ID.

class mmdet3d.models.middle_encoders.SparseEncoder(in_channels, sparse_shape, order='conv', 'norm', 'act', norm_cfg={'eps': 0.001, 'momentum': 0.01, 'type': 'BN1d'}, base_channels=16, output_channels=128, encoder_channels=16, 32, 32, 32, 64, 64, 64, 64, 64, 64, encoder_paddings=1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1)[source]¶

Sparse encoder for SECOND and Part-A2.

Parameters

in_channels (int) – The number of input channels.
sparse_shape (list[int]) – The sparse shape of input tensor.
norm_cfg (dict) – Config of normalization layer.
base_channels (int) – Out channels for conv_input layer.
output_channels (int) – Out channels for conv_out layer.
encoder_channels (tuple[tuple[int]]) – Convolutional channels of each encode block.
encoder_paddings (tuple[tuple[int]]) – Paddings of each encode block.

forward(voxel_features, coors, batch_size)[source]¶

Forward of SparseEncoder.

Parameters

voxel_features (torch.float32) – Voxel features in shape (N, C).
coors (torch.int32) – Coordinates in shape (N, 4), the columns in the order of (batch_idx, z_idx, y_idx, x_idx).
batch_size (int) – Batch size.

Returns

Backbone features.

Return type

dict

make_encoder_layers(make_block, norm_cfg, in_channels)[source]¶

make encoder layers using sparse convs.

Parameters

make_block (method) – A bounded function to build blocks.
norm_cfg (dict[str]) – Config of normalization layer.
in_channels (int) – The number of encoder input channels.

Returns

The number of encoder output channels.

Return type

int

class mmdet3d.models.middle_encoders.SparseUNet(in_channels, sparse_shape, order='conv', 'norm', 'act', norm_cfg={'eps': 0.001, 'momentum': 0.01, 'type': 'BN1d'}, base_channels=16, output_channels=128, encoder_channels=16, 32, 32, 32, 64, 64, 64, 64, 64, 64, encoder_paddings=1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, decoder_channels=64, 64, 64, 64, 64, 32, 32, 32, 16, 16, 16, 16, decoder_paddings=1, 0, 1, 0, 0, 0, 0, 1)[source]¶

SparseUNet for PartA^2.

See the paper for more detials.

Parameters

in_channels (int) – The number of input channels.
sparse_shape (list[int]) – The sparse shape of input tensor.
norm_cfg (dict) – Config of normalization layer.
base_channels (int) – Out channels for conv_input layer.
output_channels (int) – Out channels for conv_out layer.
encoder_channels (tuple[tuple[int]]) – Convolutional channels of each encode block.
encoder_paddings (tuple[tuple[int]]) – Paddings of each encode block.
decoder_channels (tuple[tuple[int]]) – Convolutional channels of each decode block.
decoder_paddings (tuple[tuple[int]]) – Paddings of each decode block.

decoder_layer_forward(x_lateral, x_bottom, lateral_layer, merge_layer, upsample_layer)[source]¶

Forward of upsample and residual block.

Parameters

x_lateral (SparseConvTensor) – Lateral tensor.
x_bottom (SparseConvTensor) – Feature from bottom layer.
lateral_layer (SparseBasicBlock) – Convolution for lateral tensor.
merge_layer (SparseSequential) – Convolution for merging features.
upsample_layer (SparseSequential) – Convolution for upsampling.

Returns

Upsampled feature.

Return type

SparseConvTensor

forward(voxel_features, coors, batch_size)[source]¶

Forward of SparseUNet.

Parameters

voxel_features (torch.float32) – Voxel features in shape [N, C].
coors (torch.int32) – Coordinates in shape [N, 4], the columns in the order of (batch_idx, z_idx, y_idx, x_idx).
batch_size (int) – Batch size.

Returns

Backbone features.

Return type

dict[str, torch.Tensor]

make_decoder_layers(make_block, norm_cfg, in_channels)[source]¶

make decoder layers using sparse convs.

Parameters

make_block (method) – A bounded function to build blocks.
norm_cfg (dict[str]) – Config of normalization layer.
in_channels (int) – The number of encoder input channels.

Returns

The number of encoder output channels.

Return type

int

make_encoder_layers(make_block, norm_cfg, in_channels)[source]¶

make encoder layers using sparse convs.

Parameters

make_block (method) – A bounded function to build blocks.
norm_cfg (dict[str]) – Config of normalization layer.
in_channels (int) – The number of encoder input channels.

Returns

The number of encoder output channels.

Return type

int

static reduce_channel(x, out_channels)[source]¶

reduce channel for element-wise addition.

Parameters

x (SparseConvTensor) – Sparse tensor, x.features are in shape (N, C1).
out_channels (int) – The number of channel after reduction.

Returns

Channel reduced feature.

Return type

SparseConvTensor

model_utils¶

class mmdet3d.models.model_utils.VoteModule(in_channels, vote_per_seed=1, gt_per_seed=3, conv_channels=16, 16, conv_cfg={'type': 'Conv1d'}, norm_cfg={'type': 'BN1d'}, norm_feats=True, vote_loss=None)[source]¶

Vote module.

Generate votes from seed point features.

Parameters

in_channels (int) – Number of channels of seed point features.
vote_per_seed (int) – Number of votes generated from each seed point.
gt_per_seed (int) – Number of ground truth votes generated from each seed point.
conv_channels (tuple[int]) – Out channels of vote generating convolution.
conv_cfg (dict) – Config of convolution. Default: dict(type=’Conv1d’).
norm_cfg (dict) – Config of normalization. Default: dict(type=’BN1d’).
norm_feats (bool) – Whether to normalize features. Default: True.
vote_loss (dict) – Config of vote loss.

forward(seed_points, seed_feats)[source]¶

forward.

Parameters

seed_points (torch.Tensor) – Coordinate of the seed points in shape (B, N, 3).
seed_feats (torch.Tensor) – Features of the seed points in shape (B, C, N).

Returns

vote_points: Voted xyz based on the seed points with shape (B, M, 3), M=num_seed*vote_per_seed.
vote_features: Voted features based on the seed points with shape (B, C, M) where M=num_seed*vote_per_seed, C=vote_feature_dim.

Return type

tuple[torch.Tensor]

get_loss(seed_points, vote_points, seed_indices, vote_targets_mask, vote_targets)[source]¶

Calculate loss of voting module.

Parameters

seed_points (torch.Tensor) – Coordinate of the seed points.
vote_points (torch.Tensor) – Coordinate of the vote points.
seed_indices (torch.Tensor) – Indices of seed points in raw points.
vote_targets_mask (torch.Tensor) – Mask of valid vote targets.
vote_targets (torch.Tensor) – Targets of votes.

Returns

Weighted vote loss.

Return type

torch.Tensor