Investigation of the USV-AUV Cooperative Environment via Reinforcement Learning and Its Impact on Data Collection and Energy Efficiency

Metehan Türkoğlu, MURAT; Akyüz, Emre

doi:10.1016/j.oceaneng.2025.124004

Investigation of the USV-AUV Cooperative Environment via Reinforcement Learning and Its Impact on Data Collection and Energy Efficiency

Metehan Türkoğlu M. M., Akyüz E.

Ocean Engineering, cilt.348, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 348
Basım Tarihi: 2026
Doi Numarası: 10.1016/j.oceaneng.2025.124004
Dergi Adı: Ocean Engineering
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, Environment Index, Geobase, ICONDA Bibliographic, INSPEC
Anahtar Kelimeler: AUV, Curriculum Learning, PPO, Reinforcement Learning, Reliability Analysis, USV
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
İstanbul Gelişim Üniversitesi Adresli: Evet

Özet

Autonomous Underwater Vehicles (AUVs) and Unmanned Surface Vehicles (USVs) are key enablers for underwater sensing, yet their performance degrades under turbulent currents and wave-induced disturbances. We present a unified simulation framework for cooperative USV-AUV missions that operationalizes Fisher Information Matrix (FIM) within multi-agent reinforcement learning: FIM-guided USV positioning feeds into the agents’ observation and, when applicable, reward signals to reduce localization uncertainty and improve coordination. We evaluate three policies-Proximal Policy Optimization (PPO), Curriculum Reinforcement Learning (CRL), and Twin Delayed Deep Deterministic Policy Gradient (TD3) with FIM features (TD3-FIM)–under a single, reproducible setup across two regimes (normal and extreme sea states), with training and testing performed in both, using task-level metrics (data rate, total throughput), resource metrics (energy), system health (overflow events), and tracking error. A composite Reliability Index (RI) summarizes multi-objective performance. Results show that PPO consistently achieves higher reliability and more stable data collection than CRL in both regimes, while FIM-guided USV adaptation markedly lowers tracking error versus static baselines. The three-arm comparison establishes a practical benchmark for USV-AUV cooperation with physically motivated sea dynamics. Limitations include a simulation-based evaluation and simplified acoustic assumptions; future work will consider multi-USV coordination, latency/packet-loss models, and hardware-in-the-loop trials.