Our evaluation table ranks all methods according to the Average Precision (AP) over 10 IoU thresholds, ranging from 0.5 to 0.95 with a step size of 0.05. The IoU is weighted by the confidence as \(\text{IoU} = \frac{\sum_{i\in{\{\text{TP}\}}}c_{i}}{\sum_{i\in{\{\text{TP, FP, FN}\}}}c_{i}}\) where \(\{\text{TP}\}\) and \(\{\text{TP, FP, FN}\}\) are the set of image pixels in the intersection and the union of one instance, respectively. \(c_i \in [0, 1]\) denotes the confidence value at pixel \(i\). In constrast to standard evaluation where \(c_i=1\) for all pixels, we adopt confidence weighted evaluation metrics leveraging the uncertainty to take into account the ambiguity in our automatically generated annotations.

**AP:**Average Precision over 10 IoU thresholds ranging from 0.5 to 0.95**AP 50:**Average Precision at a threshold of 0.5