WildDash 2 Benchmark

View Visualizations: Algorithm Results / Validation Ex. (1) / Validation Ex. (2)

For all metrics, higher scores are better. To participate in the benchmark, check our submission instructions.

	Meta AVG	Classic				Negative	Impact (IoU class)
Algorithm	IoU Class	IoU Class	iIoU Class	IoU Cat.	iIoU Cat.	IoU Class	Blur	Coverage	Distortion	Hood	Occ.	Overexp.	Particles	Screen	Underexp.	Var.
ltbgnn_trainwd	50.2%	53.2%	49.2%	73.2%	69.5%	43.6%	-9%	-11%	-6%	-6%	-3%	-3%	-11%	-21%	-6%	-16%
ltbgnn2_rvc	50.0%	52.2%	47.5%	72.4%	68.6%	44.6%	-8%	-11%	-6%	-8%	-3%	-4%	-8%	-18%	-7%	-14%
MIX6D_RVC	48.5%	51.2%	46.5%	72.4%	66.1%	40.8%	-7%	-5%	-6%	-7%	-4%	-7%	-7%	-17%	-10%	-11%
test_RVC_1	47.5%	50.8%	44.0%	74.2%	67.5%	34.4%	-5%	-4%	-4%	-6%	-5%	-1%	-9%	-17%	-9%	-17%
FAN_NV_RVC	47.5%	50.8%	44.0%	74.2%	67.5%	34.4%	-5%	-4%	-4%	-6%	-5%	-1%	-9%	-17%	-9%	-17%
UNIV_CNP_RVC_UE	46.9%	51.6%	45.9%	72.8%	67.5%	29.0%	-7%	-6%	-3%	-7%	-0%	-6%	-5%	-14%	-7%	-8%
SN_DN161_fat_pyrx8	46.8%	51.0%	43.9%	71.4%	65.5%	32.6%	-7%	-11%	-5%	-9%	-3%	-2%	-7%	-22%	-8%	-8%
MIX6D_old	46.6%	48.6%	43.3%	70.7%	64.7%	41.6%	-9%	-9%	-3%	-8%	-2%	-0%	-10%	-17%	-10%	-13%
segformer-data5+1	46.6%	48.6%	43.3%	70.7%	64.7%	41.6%	-9%	-9%	-3%	-8%	-2%	-0%	-10%	-17%	-10%	-13%
UNIV_CNP_RVC	46.3%	50.4%	44.7%	71.3%	65.9%	32.0%	-6%	-9%	-3%	-8%	-1%	-5%	-7%	-15%	-7%	-8%
SN_DN161s3pyrx8	45.6%	49.8%	41.6%	71.3%	65.3%	31.0%	-10%	-6%	-6%	-10%	-3%	-3%	-6%	-20%	-9%	-10%
UDSSEG_RVC	45.5%	51.0%	44.3%	72.1%	66.2%	25.4%	-5%	-6%	-5%	-7%	-4%	-0%	-10%	-19%	-11%	-10%
Anonymous	45.5%	51.0%	44.3%	72.1%	66.2%	25.4%	-5%	-6%	-5%	-7%	-4%	-0%	-10%	-19%	-11%	-10%
SN_RN152pyrx8_RVC	45.4%	48.9%	42.7%	70.1%	64.8%	32.5%	-6%	-7%	-5%	-7%	-1%	-2%	-7%	-19%	-11%	-3%
StudentNetwork	45.3%	50.6%	44.2%	71.9%	66.7%	26.5%	-5%	-5%	-6%	-5%	-5%	-1%	-12%	-21%	-10%	-16%
mmseg segformer22	44.5%	46.9%	38.4%	70.4%	64.1%	37.7%	-5%	-8%	-2%	-7%	-3%	-6%	-10%	-19%	-11%	-14%
ltbgnn2_fixbug	40.5%	41.2%	33.7%	65.4%	54.2%	43.4%	-18%	-16%	-3%	-12%	-1%	-20%	-11%	-25%	-12%	-5%
UniSeg	39.4%	41.7%	35.3%	65.8%	57.4%	34.8%	-18%	-12%	-4%	-13%	-3%	-11%	-9%	-26%	-13%	-20%
ltbgnn2	38.5%	42.2%	35.1%	65.7%	56.4%	27.4%	-16%	-14%	-4%	-12%	-1%	-20%	-10%	-23%	-11%	-3%
SIW_new	37.9%	41.9%	41.6%	65.7%	54.1%	27.2%	-15%	-10%	-5%	-16%	-2%	-12%	-11%	-22%	-8%	-15%
seamseg_rvcsubset	37.9%	41.2%	37.2%	63.1%	58.1%	30.5%	-16%	-17%	0%	-7%	-4%	-14%	-18%	-31%	-14%	-7%
Tong	37.2%	41.0%	41.2%	65.2%	53.5%	26.0%	-18%	-9%	-5%	-16%	-2%	-13%	-12%	-24%	-10%	-1%
ltbgnn_new	37.2%	40.6%	31.8%	65.0%	54.4%	26.7%	-15%	-13%	-3%	-12%	-0%	-21%	-7%	-23%	-12%	-3%
seamseg_mvd_ss	37.1%	41.3%	36.9%	63.4%	55.7%	26.6%	-15%	-14%	0%	-11%	-4%	-11%	-30%	-36%	-20%	-10%
U_test	36.9%	39.1%	32.6%	63.2%	51.2%	32.2%	-19%	-12%	-5%	-11%	-4%	-11%	-6%	-22%	-12%	-17%
ltbgnn2_fix	36.7%	40.5%	35.3%	62.5%	57.3%	25.6%	-17%	-14%	-4%	-13%	-1%	-22%	-11%	-24%	-11%	-2%
SIW	36.5%	41.0%	38.6%	65.8%	53.1%	24.1%	-16%	-17%	-6%	-14%	-2%	-7%	-19%	-23%	-10%	-6%
ltbgnn	36.4%	38.3%	31.1%	64.1%	52.4%	30.7%	-11%	-10%	-3%	-12%	-1%	-14%	-4%	-28%	-13%	-10%
UniSeg Baseline	36.0%	39.0%	33.4%	63.7%	53.5%	27.9%	-23%	-14%	-6%	-15%	-2%	-19%	-8%	-26%	-12%	-16%
hs1	35.7%	40.0%	38.0%	64.8%	52.3%	23.0%	-17%	-10%	-8%	-18%	-1%	-15%	-11%	-27%	-9%	-9%
MSeg1080_RVC	35.2%	38.7%	35.4%	65.1%	50.7%	24.7%	-15%	-11%	-9%	-19%	-3%	-14%	-6%	-25%	-8%	-13%
w_test	35.0%	39.1%	37.6%	64.5%	52.3%	22.4%	-20%	-9%	-8%	-18%	-0%	-12%	-14%	-30%	-11%	0%
BASE-DeepLabV2	35.0%	39.5%	28.9%	65.6%	53.0%	18.7%	-7%	-8%	-9%	-11%	-6%	-14%	-5%	-19%	-7%	-6%
DeepLabV2@ResNet50	34.9%	39.4%	28.7%	65.6%	53.7%	18.7%	-8%	-5%	-10%	-12%	-3%	-10%	-3%	-19%	-8%	-4%
tong_test	34.6%	38.7%	36.3%	63.6%	50.8%	22.5%	-17%	-7%	-8%	-15%	-1%	-12%	-9%	-24%	-11%	-20%
hs	34.4%	38.4%	36.2%	64.2%	52.1%	22.3%	-19%	-11%	-8%	-18%	0%	-13%	-15%	-29%	-11%	-6%
submit_test	34.0%	36.6%	31.2%	61.4%	48.2%	26.6%	-20%	-13%	-6%	-14%	-3%	-16%	-7%	-21%	-9%	-17%
test_base	33.8%	37.8%	36.1%	63.1%	50.6%	22.1%	-17%	-11%	-7%	-18%	-1%	-12%	-14%	-28%	-13%	-14%
EffPS_b1bs4sem_RVC	32.2%	35.7%	24.4%	63.8%	56.0%	20.4%	-10%	-6%	-4%	-7%	-1%	-7%	-10%	-25%	-8%	-6%
CARB	16.8%	19.1%	13.8%	45.8%	35.8%	10.0%	-24%	-2%	-5%	-25%	-2%	-26%	-15%	-33%	-20%	-6%
DeepTrain	16.4%	17.5%	1.1%	32.2%	30.0%	11.8%	-15%	-7%	0%	-5%	-7%	-2%	-14%	-18%	-5%	-1%
WSSS-CLIP-ES	13.0%	14.6%	7.1%	40.4%	25.6%	8.1%	-18%	0%	-9%	-28%	-2%	-16%	-13%	-32%	-16%	-2%
FAN_RVC1	7.2%	2.5%	6.2%	7.5%	23.8%	28.1%	-36%	-19%	-98%	-32%	-2%	-11%	-25%	-45%	-30%	-2%
FAN_RVC	7.0%	2.5%	6.2%	7.6%	23.8%	26.9%	-36%	-14%	-98%	-31%	-3%	-10%	-25%	-45%	-29%	0%
test_RVC	6.5%	2.6%	6.3%	10.5%	34.6%	24.1%	-37%	-12%	-98%	-31%	-3%	-9%	-31%	-43%	-30%	-1%
WSSS-CLIMS	1.2%	1.3%	0.0%	4.5%	6.6%	1.0%	-2%	-13%	-80%	-6%	-15%	-28%	-15%	0%	-26%	-17%

Cached Aug. 21, 2025, 4:20 a.m. UTC+0

Click here for the extended metrics table

Methodology:
Our benchmark evaluates the negative Impact of common visual hazards on algorithm output performance. It is calculated by this formula:
impact = min(metric_low,metric_high) / max(metric_none,metric_low) - 1.0
The metrics_{none/low/high} are evaluated on subsets of the benchmark dataset that correspond to the identified severity of the hazard (e.g. the subset Blur_high contains images which have a lot of blur visible). Positive impacts are truncated to zero.
An impact of -10% at Blur translates to an expected performance degradation for the algorithm of 10 percent when there is a considerable blur in the input image as opposed to supplying the same algorithm a similar image without noticeable image blur.
These are all currently evaluated hazards:
Blur: Image is noticeably affected by blur (e.g. motion blur, defocusing, compression artifacts...)
Coverage: Normally visible parts of the road are covered (e.g. unusual lane markings, snow, leaves...)
Distortion: Visible lens distortion
Hood: Ego-vehicle is visible, non-windscreen parts (e.g. car hood, mirrors)
Occl: Objects are partially occluded or cut off by image border
Overexp.: The scene is overexposed
Particle: Particles in the air obstruct the view (e.g. heavy rain, snow, fog)
Screen: The windscreen is interfering (e.g. interior reflections, wipers, rain on the windscreen,...)
Underexp.: The image is underexposed
Variation: Intra-class variations within the image (i.e. unusual representations of labels like unique cars)
More details on evaluation metrics and negative test cases can also be found on the FAQ page.