MAESTRO: Masked AutoEncoders for Multimodal, Multitemporal, and Multispectral Earth Observation Data

Published in WACV 2026, 2025

Recommended citation: Labatie A. (2026). "MAESTRO: Masked AutoEncoders for Multimodal, Multitemporal, and Multispectral Earth Observation Data" WACV. https://arxiv.org/pdf/2508.10894.pdf

Antoine Labatie, Michael Vaccaro, Nina Lardiere, Anatol Garioud and Nicolas Gonthier

PDF - Code

Abstract

Self-supervised learning holds great promise for remote sensing, but standard self-supervised methods must be adapted to the unique characteristics of Earth observation data. We take a step in this direction by conducting a comprehensive benchmark of fusion strategies and reconstruction target normalization schemes for multimodal, multitemporal, and multispectral Earth observation data. Based on our findings, we propose MAESTRO, a novel adaptation of the Masked Autoencoder, featuring optimized fusion strategies and a tailored target normalization scheme that introduces a spectral prior as a self-supervisory signal. Evaluated on four Earth observation datasets, MAESTRO sets a new state-of-the-art on tasks that strongly rely on multitemporal dynamics, while remaining highly competitive on tasks dominated by a single mono-temporal modality.

Keywords

Self Supervised Learning
Multi Modal Learning
Satellite Image Time Series
Masked Autoencoder

Recommended citation: Labatie A., Vaccaro M., Lardiere N., Garioud A., and Gonthier N. (2025). “MAESTRO: Masked AutoEncoders for Multimodal, Multitemporal, and Multispectral Earth Observation Data” Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2026.

Share on

Bluesky Facebook LinkedIn

Dr. Nicolas Gonthier

Abstract

Keywords

Share on