Multi-Modal Foundation Models for Space-Air-Ground Integrated 6G and Beyond Networks: A Survey and Tutorial


Khan W. U., Adil M., Malik J., Sheemar C. K., Chatzinotas S., Alqahtani S. A., ...Daha Fazla

IEEE Open Journal of the Communications Society, 2026 (ESCI, Scopus) identifier

  • Yayın Türü: Makale / Derleme
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1109/ojcoms.2026.3687570
  • Dergi Adı: IEEE Open Journal of the Communications Society
  • Derginin Tarandığı İndeksler: Emerging Sources Citation Index (ESCI), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Anahtar Kelimeler: 6G, foundation models (FMs), multimodal learning, non-terrestrial networks (NTN), space-air-ground integrated networks (SAGIN)
  • İstanbul Gelişim Üniversitesi Adresli: Evet

Özet

Space-air-ground integrated networks (SAGINs) are emerging as a key architectural paradigm for 6G and beyond wireless systems, enabling seamless connectivity by integrating terrestrial radio access networks with aerial platforms and non-terrestrial networks (NTNs). The operation of SAGINs, however, departs significantly from conventional terrestrial assumptions due to high mobility, Doppler dynamics, intermittent connectivity, long and variable round-trip times, gateway bottlenecks, and strong cross-layer coupling. These characteristics motivate intelligence frameworks that leverage heterogeneous contextual information beyond traditional radio frequency (RF) measurements. This paper presents a comprehensive survey and tutorial on multi-modal foundation models (FMs) for SAGIN-centric 6G systems. The survey first introduces a structured taxonomy of SAGIN data modalities, including RF/channel state information (CSI), topology graphs, trajectories and ephemeris, telemetry and traffic traces, weather and atmospheric context, and sensing signals enabled by integrated sensing and communication (ISAC). It then reviews representation and tokenization strategies for heterogeneous wireless modalities and surveys emerging architecture families for multi-modal FMs, including early, late, and hybrid fusion designs, cross-attention mechanisms, mixture-of-experts models, hierarchical multi-timescale architectures, and retrieval-augmented frameworks. The paper further summarizes training paradigms such as masked modeling, cross-modal contrastive learning, and predictive pretraining, together with adaptation strategies including parameter-efficient fine-tuning, continual learning, and federated updates. In addition, the survey discusses system-level considerations for integrating multi-modal FMs into network control loops and deployment environments, covering distributed inference, model compression, intermittency-aware operation, and safety constraints. Finally, the paper outlines benchmarking principles for evaluating multi-modal intelligence in SAGINs and identifies key open research challenges and future directions for building robust, deployable AI-native 6G networks.