ECW-EGNet: Exploring Cross-Modal Weighting and Edge-Guided Decoder Network for RGB-D Salient Object Detection

Chenxing Xia1, 2, 3, Feng Yang1, Songsong Duan4, Xiuju Gao5, Bin Ge1, Kuan-Ching Li6, Xianjin Fang1, 7, Yan Zhang8 and Ke Yang2

  1. College of Computer Science and Engineering, Anhui University of Science and Technology,
    Huainan, 232001, China
    2021201237@aust.edu.cn
  2. The First Affiliated Hospital of Anhui University of Science and Technology
    Huainan First People’s Hospital
    cxxia@aust.edu.cn
  3. Anhui Purvar Bigdata Technology Co. Ltd,
    Huainan, 232001, China
  4. State Key Laboratory of Integrated Services Networks,
    School of Telecommunications Engineering, Xidian University, Xi’an 710071, China
  5. College of Electrical and Information Engineering, Anhui University of Science and Technology,
    Huainan, Anhui, China
  6. Department of Computer Science and Information Engineering,
    Providence University, Taiwan
  7. Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
  8. The School of Electronics and Information Engineering,
    Anhui University, Hefei, Anhui, China

Abstract

Existing RGB-D salient object detection (SOD) techniques concentrate on combining data from multiple modalities (e.g., depth and RGB) and extracting multi-scale data for improved saliency reasoning. However, they frequently perform poorly as a factor of the drawbacks of low-quality depth maps and the lack of correlation between the extracted multi-scale data. In this paper, we propose a Exploring Cross-Modal Weighting and Edge-Guided Decoder Network (ECW-EGNet) for RGB-D SOD, which includes three prominent components. Firstly, we deploy a Cross-Modality Weighting Fusion (CMWF) module that utilizes Channel-Spatial Attention Feature Enhancement (CSAE) mechanism and Depth-Quality Assessment (DQA) mechanism to achieve the cross-modal feature interaction. The former parallels channel attention and spatial attention enhances the features of extracted RGB streams and depth streams while the latter assesses the depth-quality reduces the detrimental influence of the low-quality depth maps during the cross-modal fusion. Then, in order to effectively integrate multi-scale features for high-level and produce salient objects with precise locations, we construct a Bi-directional Scale-Correlation Convolution (BSCC) module in a bi-directional structure. Finally, we construct an Edge-Guided (EG) decoder that uses the edge detection operator to obtain edge masks to guide the enhancement of salient map edge details. The comprehensive experiments on five benchmark RGB-D SOD datasets demonstrate that the proposed ECW-EGNet outperforms 21 state-of-the-art (SOTA) saliency detectors in four widely used evaluation metrics.

Key words

cross-modality fusion, depth-quality, edge-guided, RGB-D images, salient object detection

Digital Object Identifier (DOI)

https://doi.org/10.2298/CSIS231206022X

Publication information

Volume 21, Issue 3 (June 2024)
Year of Publication: 2024
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium

Full text

DownloadAvailable in PDF
Portable Document Format

How to cite

Xia, C., Yang, F., Duan, S., Gao, X., Ge, B., Li, K., Fang, X., Zhang, Y., Yang, K.: ECW-EGNet: Exploring Cross-Modal Weighting and Edge-Guided Decoder Network for RGB-D Salient Object Detection. Computer Science and Information Systems, Vol. 21, No. 3, 947-969. (2024), https://doi.org/10.2298/CSIS231206022X