Abstract

Autonomous systems require a continuous and dependable environment perception for navigation and decision making which is best achieved by combining different sensor types. Radar continues to function robustly in compromised circumstances in which cameras become impaired, guaranteeing a steady inflow of information. Yet camera images provide a more intuitive and readily applicable impression of the world. This work combines the complementary strengths of both sensor types in a unique self-learning fusion approach for a probabilistic scene reconstruction in adverse surrounding conditions. After reducing the memory requirements of the synchronized measurements through a decoupled stochastic self-supervised compression technique, the proposed algorithm exploits similarities and establishes correspondences between both domains at different feature levels during training. Then, at inference time, relying exclusively on radio frequencies the model successively predicts camera constituents in an autoregressive and self-contained process. These discrete tokens are finally transformed into an instructive view of the respective surrounding allowing to visually perceive potential dangers for important tasks downstream.

1. Stage: Probabilistic Measurement Compression

(hover for animation)

2. Stage: Crossmodal Modeling of Sensor Constituents

Using the memory-reduced domain representations, an autoregressive transformer model finds links between radar and camera measurements in latent space and learns to recognize correlations between both modalities. The incorporated attention mechanism is used to condition camera tokens on discretized radar information. Below animation shows the inter-modal attention span for every head in every layer of the model. Each matrix denotes the strength with which camera tokens pay attention to radar tokens.

Exploring the conditional sample space with temperature sweeps

The designed method succeeds in reflecting on the integral objects of a scene and reconstructs crucial entities in the sensors vicinity.

Acknowledgment

The author would like to mention the EleutherAI community and members of the EleutherAI discord channels for fruitful and interesting discussions along the way of composing this paper. Additional thanks to Phil Wang (lucidrains) for his tireless efforts of making attention-based algorithms accessible to the humble deep learning research community.

GenRadar: Self-supervised Probabilistic Camera Synthesis based on Radar Frequencies

Maintaining clear impressions of the environment for autonomous systems even in adverse weather conditions

Abstract

Without any explicit annotation, the model relies exclusively on radar-based environment sensing to construct intuitive camera views of the surrounding

Approach

This work addresses two fundamental aspects of applied modern deep learning research:

1. Stage: Probabilistic Measurement Compression

2. Stage: Crossmodal Modeling of Sensor Constituents

Range-Doppler conditioned SYNTHESIS OF CAMERA Views

Exploring the conditional sample space with temperature sweeps

The designed method succeeds in reflecting on the integral objects of a scene and reconstructs crucial entities in the sensors vicinity.

Acknowledgment