Foundation models for Satellite Imaging and Climate
Adapt the state-of-the-art Visual Language Models (VLMs) to the task of geospatial imagery analysis. This project focuses on pretraining and finetuning general-purpose image VLMs to remote-sensing applications that require spatial reasoning and interpretability. Representative downstream tasks include visual question answering, scene classification, and visual grounding.

Tasks:
Collect datasets and train baseline models to match the current SOTA. (Done)
Improve the baseline GeoVLM (e.g. architecture tweaks, dataset curation).
Design and implement the robust pipeline for model evaluation.

Perceiver-VAE: A Universal Latent Codec for Weather Fields
Description
Perceiver-VAE — the idea is to leverage the Perceiver IO approach—successfully applied in the Aurora foundation weather model—to universally encode meteorological variables into a unified VAE latent space. The study centers on fine-tuning pretrained Perceiver-VAE models for new types of weather variables. The primary applications are compression of weather fields and building forecasting models directly on the universal VAE latents.


Tasks
Unify & tokenize data: dataloaders for reanalyses, surface fields, satellite swaths; encode (lat, lon, level, variable, time) with positional/spherical features; handle masks/missing values.
Train the codec: Perceiver-IO VAE with rate–distortion objective; add physics-aware and spectral regularizers; curriculum from coarse→fine; mixed-precision + activation checkpointing.
Query-driven decoding & evaluation: design variable/level-aware query heads; support arbitrary output grids; evaluate RMSE/ACC/CRPS, power spectra.
ChronoWeather: Temporal Interpolation of Weather dynamics
Description
ChronoWeather (physics-guided temporal interpolation) — recover T+1…T+5 from only T and T+6 via endpoint-conditioned interpolation. We use ModAFNO baseline as the primary time-conditioned spectral operator, with alternatives based on continuous-time latent dynamics (Neural ODE/CDE) and flow matching bridges. The study combines endpoint conditioning with physics constraints (mass/energy/moisture budgets, advection/divergence) to improve extremes and cross-variable coherence; applications include cadence upsampling for nowcasting, denser timelines for assimilation, and lower error when stepping down from 6-hour synoptic cycles to hourly forecasts.

Tasks
Data & baselines: build (T, T+6) pairs with verified hourly ground truth; establish linear/spline and AFNO/FourCastNet baselines; define conservation diagnostics.
Time-conditioned spectral operator: implement ModAFNO head conditioned on the target offset; train with reconstruction + physics penalties.
Continuous-time latents: evolve VAE latents via Neural ODE/CDE; compare fixed-step vs adaptive solvers; ablate latent dimensionality vs stability.
Bridging by generation: add flow-matching bridges constrained by endpoints (T, T+6); evaluate RMSE/ACC/CRPS, spectral fidelity, extremes skill.

Participants
Ilya Makarov
Team lead
Daniil Sukhorukov
Research engineer