Most end-to-end autonomous driving methods rely solely on instantaneous sensor observations, limiting them to reactive behavior without the anticipatory foresight human drivers employ through prior experience. We introduce geospatial visual priors, street-level visual context anchored to the intended driving route, providing visual-spatial foresight independent of real-time sensors. We propose a memory augmentation module featuring a dual-memory architecture and an adaptive memory gate, which can be easily integrated into existing end-to-end approaches. This design pairs a contextual memory for retrieved priors with a persistent fallback memory, and dynamically regulates the influence of memories based on current state compatibility. Evaluated on the NAVSIM-v2 benchmark, our approach consistently improves performance across diverse end-to-end baselines. Furthermore, because these priors are independent of onboard sensors, our method inherently improves robustness against sensor corruption, while the dual-memory design ensures safe fallback when the retrieved priors themselves become unreliable.
The Memory Augmentation Module is our core component: a dual-memory architecture with an adaptive memory gate that can be easily integrated into existing end-to-end approaches. It augments the current state (S) into an enhanced state (S′) for the downstream planner. The contextual memory (C) embeds street-level visual priors (V) together with their spatial anchors (X) via projection and positional embedding, supplying location-specific context retrieved along the route. The persistent memory (P) is a learned fallback that stays reliable when the retrieved priors are corrupted. The current state queries both memories through cross-attention, and the memory gate dynamically regulates the influence of the retrieved priors based on current-state compatibility.
Across diverse end-to-end driving baselines (LTF, GTRS-DP, GTRS-Dense, and DrivoR), integrating PriorEye consistently improves EPDMS within the NAVSIM-v2 framework. The gains hold on both the navhard-two-stage (Table 1) and navtest (Table 2) splits, confirming that the benefit of geospatial visual priors is model-agnostic.
Table 1. Results on the NAVSIM-v2 navhard-two-stage benchmark.
Table 2. Results on the NAVSIM-v2 navtest split.
Lane-keeping scenario in which the retrieved visual priors stay clear even though the current image is shadowed. With geospatial visual priors, the model generates a more confident trajectory.
Left-turn scenario with a crosswalk not yet visible to onboard sensors. Relying only on instantaneous observations, the baseline (red) drives at 24.2 km/h, while our method (green) anticipates the crosswalk from retrieved visual priors (10-15) and slows to a more cautious 18.2 km/h.
Qualitative comparison under sensor degradation (Mud, Heavy). The baseline (GTRS-Dense, left) generates a trajectory that deviates toward the road boundary, while our method (GTRS-Dense + PriorEye, right) plans a safe, lane-keeping trajectory. The retrieved geospatial visual priors are unaffected by the sensor corruption, providing reliable scene context despite the degraded onboard cameras.
(a) Normal operation
(b) Corrupted priors
A natural question when using priors is what happens if the priors are corrupted. To handle this, we introduce a dual memory mechanism, and this example shows its effect. (a) Under normal operation, contextual memory dominates attention (0.71). (b) When priors are retrieved from ∼500 m away, the proposed module assigns full attention to persistent memory (1.00), ignoring the irrelevant context. The planned trajectory remains similar in both cases, demonstrating graceful degradation under the corrupted-prior condition.
To appear in ECCV 2026.
@inproceedings{yeon2026prioreye,
title={PriorEye: Geospatial Visual Priors for End-to-End Autonomous Driving},
author={Yeon, Kyuhwan and Ramtoula, Benjamin and De Martini, Daniele},
year={2026},
booktitle={ECCV},
}