PriorEye: Geospatial Visual Priors for
End-to-End Autonomous Driving

Mobile Robotics Group, Oxford Robotics Institute, University of Oxford
ECCV 2026

Why do most autonomous driving methods rely only on the current frame and a few seconds of recent images? PriorEye shows how to leverage street-level images of the same place captured at different times to improve end-to-end autonomous driving.

Abstract

Most end-to-end autonomous driving methods rely solely on instantaneous sensor observations, limiting them to reactive behavior without the anticipatory foresight human drivers employ through prior experience. We introduce geospatial visual priors, street-level visual context anchored to the intended driving route, providing visual-spatial foresight independent of real-time sensors. We propose a memory augmentation module featuring a dual-memory architecture and an adaptive memory gate, which can be easily integrated into existing end-to-end approaches. This design pairs a contextual memory for retrieved priors with a persistent fallback memory, and dynamically regulates the influence of memories based on current state compatibility. Evaluated on the NAVSIM-v2 benchmark, our approach consistently improves performance across diverse end-to-end baselines. Furthermore, because these priors are independent of onboard sensors, our method inherently improves robustness against sensor corruption, while the dual-memory design ensures safe fallback when the retrieved priors themselves become unreliable.

Memory Augmentation

Memory Augmentation Module

The Memory Augmentation Module is our core component: a dual-memory architecture with an adaptive memory gate that can be easily integrated into existing end-to-end approaches. It augments the current state (S) into an enhanced state (S′) for the downstream planner. The contextual memory (C) embeds street-level visual priors (V) together with their spatial anchors (X) via projection and positional embedding, supplying location-specific context retrieved along the route. The persistent memory (P) is a learned fallback that stays reliable when the retrieved priors are corrupted. The current state queries both memories through cross-attention, and the memory gate dynamically regulates the influence of the retrieved priors based on current-state compatibility.

Results

Across diverse end-to-end driving baselines (LTF, GTRS-DP, GTRS-Dense, and DrivoR), integrating PriorEye consistently improves EPDMS within the NAVSIM-v2 framework. The gains hold on both the navhard-two-stage (Table 1) and navtest (Table 2) splits, confirming that the benefit of geospatial visual priors is model-agnostic.

Results on the NAVSIM-v2 navhard-two-stage benchmark

Table 1. Results on the NAVSIM-v2 navhard-two-stage benchmark.

Results on the NAVSIM-v2 navtest split

Table 2. Results on the NAVSIM-v2 navtest split.

Qualitative Results

Qualitative result 1

Lane-keeping scenario in which the retrieved visual priors stay clear even though the current image is shadowed. With geospatial visual priors, the model generates a more confident trajectory.

Qualitative result 2

Left-turn scenario with a crosswalk not yet visible to onboard sensors. Relying only on instantaneous observations, the baseline (red) drives at 24.2 km/h, while our method (green) anticipates the crosswalk from retrieved visual priors (10-15) and slows to a more cautious 18.2 km/h.

Sensor Robustness

Sensor robustness

Qualitative comparison under sensor degradation (Mud, Heavy). The baseline (GTRS-Dense, left) generates a trajectory that deviates toward the road boundary, while our method (GTRS-Dense + PriorEye, right) plans a safe, lane-keeping trajectory. The retrieved geospatial visual priors are unaffected by the sensor corruption, providing reliable scene context despite the degraded onboard cameras.

Geospatial Visual Prior Corruption

(a) Normal operation

(a) Normal operation

(b) Corrupted priors

(b) Corrupted priors

A natural question when using priors is what happens if the priors are corrupted. To handle this, we introduce a dual memory mechanism, and this example shows its effect. (a) Under normal operation, contextual memory dominates attention (0.71). (b) When priors are retrieved from ∼500 m away, the proposed module assigns full attention to persistent memory (1.00), ignoring the irrelevant context. The planned trajectory remains similar in both cases, demonstrating graceful degradation under the corrupted-prior condition.

BibTeX

To appear in ECCV 2026.

@inproceedings{yeon2026prioreye,
    title={PriorEye: Geospatial Visual Priors for End-to-End Autonomous Driving},
    author={Yeon, Kyuhwan and Ramtoula, Benjamin and De Martini, Daniele},
    year={2026},
    booktitle={ECCV},
}

Acknowledgements

We acknowledge the open-source projects that made this work possible: NAVSIM, GTRS, DrivoR, and SimScale.