This is tensorflow 2.2 based repository of SCAMET framework for remote sensing image captioning. This is official implementation of Spatial-Channel Attention based Memory-guided Transformer (SCAMET) approach. We have designed encode-decoder based CNN-Transformer approach for describing the multi-spectral, multi-resolution, multi-directional remote sensing images.
- cuda>10
- Tensorflow 2.2
- Matplotlib
- PIL
- NLTK
- Store the remote sensing images of three datasets (Sydney captions, UCM captions and RSICD) from "https://github.com/201528014227051/RSICD_optimal".
- Qualitative analysis shows, proposed SCAMET produces more reliable captions for any kind of remote sensing images than baseline.
- Attention heatmap illustrates, the individual ability of spatial and channel wise attention encorporated with CNN for selecting pertinent objects in remote sensing images.
Our research work is published at "Engineering Appliations of Artificial Intelligence", International scientific journal of Elsevier.
Cite it as:
@article{gajbhiye2022generating,
title={Generating the captions for remote sensing images: A spatial-channel attention based memory-guided transformer approach},
author={Gajbhiye, Gaurav O and Nandedkar, Abhijeet V},
journal={Engineering Applications of Artificial Intelligence},
volume={114},
pages={105076},
year={2022},
publisher={Elsevier}
}