New: WildDash 2 with 4256 public frames, new labels & panoptic GT!
See also: RailSem19 dataset for rail scene understanding.
For all metrics, higher scores are better. To participate in the benchmark, check our submission instructions.
Meta AVG | Classic | Negative | Impact (AP) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algorithm | AP | AP | AP 50% | AP | Blur | Coverage | Distortion | Hood | Occ. | Overexp. | Particles | Screen | Underexp. | Var. |
RUSH_ROB | 26.6% | 27.6% | 56.5% | 21.2% | -25% | 0% | 0% | -9% | -15% | -46% | -21% | -16% | -9% | -22% |
NL_ROI_ROB | 21.9% | 19.4% | 34.0% | 19.7% | -65% | -21% | -34% | 0% | -36% | -30% | -31% | 0% | 0% | -43% |
MaskRCNN-R-50-FPN-GN | 20.0% | 21.1% | 49.9% | 14.2% | -45% | -5% | -18% | -21% | -17% | -71% | -15% | -22% | -39% | -36% |
MRCNN_CS | 12.2% | 12.4% | 28.3% | 6.9% | -13% | 0% | -56% | -33% | -35% | -65% | -12% | -33% | -53% | -61% |
MRCNN_VSCMLab_ROB | 11.8% | 9.3% | 18.7% | 63.3% | -60% | -13% | -29% | -14% | -57% | -63% | 0% | -54% | -33% | -63% |
MaskRCNN_ROB | 9.0% | 9.0% | 20.2% | 6.6% | -52% | -27% | -36% | -40% | -44% | -43% | -47% | -68% | -38% | -53% |
MRCNN++_VSCMLab_ROB | 8.4% | 7.3% | 14.7% | 40.7% | -55% | -20% | -32% | -35% | -41% | -78% | -32% | -27% | -64% | -67% |
Sem2Ins | 8.3% | 7.7% | 14.5% | 3.9% | -62% | 0% | -26% | -30% | -51% | -58% | -27% | 0% | 0% | -3% |
MRCNN_K | 4.0% | 3.9% | 8.2% | 4.2% | -54% | -13% | -23% | -15% | -47% | -55% | -57% | -27% | -42% | -29% |
BAMRCNN_ROB | 1.7% | 0.8% | 1.9% | 24.9% | -14% | -4% | -83% | -35% | -89% | -51% | -33% | 0% | -43% | -21% |
Methodology:
Our benchmark evaluates the negative Impact of common visual hazards on algorithm output performance. It is calculated by this formula:
impact = min(metriclow,metrichigh) / max(metricnone,metriclow) - 1.0
The metricsnone/low/high are evaluated on subsets of the benchmark dataset that correspond to the identified severity of the hazard (e.g. the subset Blurhigh contains images which have a lot of blur visible). Positive impacts are truncated to zero.
An impact of -10% at Blur translates to an expected performance degradation for the algorithm of 10 percent when there is a considerable blur in the input image as opposed to supplying the same algorithm a similar image without noticeable image blur.
These are all currently evaluated hazards:
Blur: Image is noticeably affected by blur (e.g. motion blur, defocusing, compression artifacts...)
Coverage: Normally visible parts of the road are covered (e.g. unusual lane markings, snow, leaves...)
Distortion: Visible lens distortion
Hood: Ego-vehicle is visible, non-windscreen parts (e.g. car hood, mirrors)
Occl: Objects are partially occluded or cut off by image border
Overexp.: The scene is overexposed
Particle: Particles in the air obstruct the view (e.g. heavy rain, snow, fog)
Screen: The windscreen is interfering (e.g. interior reflections, wipers, rain on the windscreen,...)
Underexp.: The image is underexposed
Variation: Intra-class variations within the image (i.e. unusual representations of labels like unique cars)
More details on evaluation metrics and negative test cases can also be found on the FAQ page.