WildDash 2 Benchmark

View Visualizations: Algorithm Results / Validation Ex. (1) / Validation Ex. (2)

For all metrics, higher scores are better. To participate in the benchmark, check our submission instructions.

	Meta AVG	Classic				Negative	Impact (IoU class)
Algorithm	IoU Class	IoU Class	iIoU Class	IoU Cat.	iIoU Cat.	IoU Class	Blur	Coverage	Distortion	Hood	Occ.	Overexp.	Particles	Screen	Underexp.	Var.
MSeg_1080	48.3%	49.8%	43.1%	63.3%	56.0%	65.0%	-7%	-10%	0%	-20%	0%	-20%	-7%	-13%	-16%	-9%
LDN_BIN_768	46.9%	48.8%	42.8%	63.6%	59.3%	47.7%	-10%	-10%	-1%	-18%	0%	-23%	-6%	-8%	-25%	-7%
MSeg	43.0%	42.2%	31.0%	59.5%	51.9%	51.8%	-5%	-7%	-7%	-11%	-3%	-16%	0%	-5%	-20%	-3%
LDN_OE	42.7%	43.3%	31.9%	60.7%	50.3%	52.8%	-11%	-13%	-7%	-10%	-5%	-24%	0%	-6%	-30%	-7%
LDN_BIN	41.8%	43.8%	37.3%	58.6%	53.3%	54.3%	-14%	-14%	-22%	-14%	-3%	-35%	-3%	-9%	-25%	-8%
DN169_CAT_DUAL	41.0%	41.7%	34.4%	57.7%	49.7%	52.6%	-4%	-7%	-11%	-10%	-5%	-24%	-7%	-4%	-26%	-9%
MSeg_low_res	40.5%	42.2%	34.0%	55.7%	42.2%	43.1%	-1%	-17%	-5%	-20%	0%	-23%	-14%	-11%	-22%	-13%
AHiSS_ROB	39.0%	41.0%	32.2%	53.9%	39.3%	43.6%	-11%	-12%	-2%	-24%	0%	-27%	-13%	-13%	-28%	-16%
MapillaryAI_ROB	38.9%	41.3%	38.0%	60.5%	57.6%	25.0%	-15%	-5%	-4%	-23%	0%	-23%	-12%	-21%	-25%	-6%
PSP-IBN-SA_ROB	38.5%	39.4%	33.6%	60.6%	51.0%	65.3%	-18%	-3%	-5%	-18%	-3%	-27%	-17%	-13%	-27%	-12%
DN_2_4_CWVI_BIN_SEG	36.6%	37.9%	30.9%	52.5%	43.7%	63.5%	-16%	-7%	0%	-15%	-2%	-30%	-9%	-10%	-41%	-14%
IBN-PSP-SA_ROB	33.6%	34.7%	30.8%	55.1%	38.9%	68.5%	-8%	0%	0%	-22%	0%	-27%	-23%	-23%	-36%	-8%
IBN-PSA-SA_ROB	32.5%	33.6%	30.1%	53.8%	39.3%	69.5%	-9%	-1%	0%	-25%	0%	-28%	-25%	-20%	-32%	-11%
LDN2_ROB	32.1%	34.4%	30.7%	56.6%	47.6%	29.9%	-7%	-0%	-11%	-36%	0%	-37%	-16%	-24%	-42%	-6%
LDN_ROB	32.1%	34.4%	30.7%	56.6%	47.6%	29.9%	-7%	-0%	-11%	-36%	0%	-37%	-16%	-24%	-42%	-6%
BatMAN_ROB	31.7%	31.4%	17.4%	51.9%	37.3%	36.3%	-9%	-8%	-11%	-20%	-11%	-29%	-5%	-10%	-37%	-6%
Mapillary_ROB	31.6%	32.7%	27.5%	55.2%	51.1%	22.7%	-12%	-7%	-15%	-23%	-1%	-26%	-12%	-28%	-31%	-3%
HiSS_ROB	31.3%	31.0%	16.3%	50.3%	34.6%	44.1%	-11%	-10%	-11%	-25%	-10%	-32%	-2%	-10%	-44%	-0%
DeepLabv3+_CS	30.6%	34.2%	24.6%	49.0%	38.6%	15.7%	-13%	-15%	-15%	-34%	0%	-55%	-17%	-23%	-53%	-6%
AdapNet2_ROB	29.5%	28.7%	16.5%	51.5%	38.0%	43.6%	-15%	-10%	-20%	-24%	-14%	-21%	-8%	-7%	-37%	-7%
AdapNetv2_ROB	29.5%	28.7%	16.5%	51.5%	38.0%	43.6%	-15%	-10%	-20%	-24%	-14%	-21%	-8%	-7%	-37%	-7%
VlocNet++_ROB	29.2%	28.4%	16.4%	51.3%	37.3%	39.4%	-19%	-8%	-17%	-23%	-14%	-23%	-4%	-9%	-36%	-11%
M_DN	29.1%	29.6%	22.9%	55.8%	48.0%	16.7%	-15%	-9%	-13%	-23%	-7%	-26%	-16%	-14%	-37%	-6%
DRN_MPC	28.3%	29.1%	13.9%	49.2%	29.2%	15.9%	-17%	-8%	-15%	-32%	-5%	-47%	-3%	-12%	-34%	-9%
VENUS_ROB_update	28.2%	29.8%	22.7%	51.5%	35.0%	50.6%	-3%	-0%	0%	-32%	0%	-42%	-15%	-31%	-43%	-21%
DN_2_4_CITY_WD	27.2%	28.3%	18.2%	50.6%	38.6%	17.5%	-5%	-3%	-10%	-40%	0%	-45%	-15%	-23%	-44%	0%
DRN_MPS	26.3%	27.4%	11.9%	47.5%	27.1%	12.9%	-19%	-12%	-14%	-32%	-8%	-51%	-9%	-12%	-45%	-14%
VENUS_ROB	25.1%	26.4%	19.8%	46.9%	29.8%	54.4%	-2%	-0%	0%	-37%	0%	-49%	-17%	-30%	-48%	-16%
GoogLeNetV1_ROB	22.9%	22.4%	17.3%	36.7%	36.6%	50.7%	-21%	-21%	-43%	-26%	-9%	-29%	-21%	-28%	-46%	-2%
APMoE_seg_ROB	22.2%	22.5%	12.6%	48.1%	35.2%	22.8%	-11%	-2%	-23%	-23%	-4%	-44%	-12%	-11%	-46%	0%
PAG_ROB	22.1%	21.7%	12.5%	48.8%	35.6%	34.1%	-9%	-10%	-20%	-27%	-3%	-35%	-6%	-8%	-41%	-3%
DRN_CS	14.8%	15.4%	7.1%	28.9%	14.2%	7.2%	-43%	-9%	-29%	-29%	-15%	-27%	-18%	-24%	-74%	-35%
FCN101_ROB	12.2%	11.1%	2.1%	29.3%	8.3%	38.7%	0%	-7%	-26%	-27%	-11%	-49%	-17%	-4%	-32%	-10%
PSPNetv0	8.3%	8.5%	5.5%	17.7%	15.5%	10.1%	-17%	-33%	-10%	-20%	0%	-34%	-26%	-52%	-30%	-32%

Cached July 12, 2025, 8:16 a.m. UTC+0

Click here for the extended metrics table

Methodology:
Our benchmark evaluates the negative Impact of common visual hazards on algorithm output performance. It is calculated by this formula:
impact = min(metric_low,metric_high) / max(metric_none,metric_low) - 1.0
The metrics_{none/low/high} are evaluated on subsets of the benchmark dataset that correspond to the identified severity of the hazard (e.g. the subset Blur_high contains images which have a lot of blur visible). Positive impacts are truncated to zero.
An impact of -10% at Blur translates to an expected performance degradation for the algorithm of 10 percent when there is a considerable blur in the input image as opposed to supplying the same algorithm a similar image without noticeable image blur.
These are all currently evaluated hazards:
Blur: Image is noticeably affected by blur (e.g. motion blur, defocusing, compression artifacts...)
Coverage: Normally visible parts of the road are covered (e.g. unusual lane markings, snow, leaves...)
Distortion: Visible lens distortion
Hood: Ego-vehicle is visible, non-windscreen parts (e.g. car hood, mirrors)
Occl: Objects are partially occluded or cut off by image border
Overexp.: The scene is overexposed
Particle: Particles in the air obstruct the view (e.g. heavy rain, snow, fog)
Screen: The windscreen is interfering (e.g. interior reflections, wipers, rain on the windscreen,...)
Underexp.: The image is underexposed
Variation: Intra-class variations within the image (i.e. unusual representations of labels like unique cars)
More details on evaluation metrics and negative test cases can also be found on the FAQ page.