SLMSF-Net: A Semantic Localization and Multi-Scale Fusion Network for RGB-D Salient Object Detection

Peng, Yanbin and Zhai, Zhinian and Feng, Mingkun (2024) SLMSF-Net: A Semantic Localization and Multi-Scale Fusion Network for RGB-D Salient Object Detection. Sensors, 24 (4). p. 1117. ISSN 1424-8220

[thumbnail of sensors-24-01117.pdf] Text
sensors-24-01117.pdf - Published Version

Download (3MB)

Abstract

SLMSF-Net: A Semantic Localization and Multi-Scale Fusion Network for RGB-D Salient Object Detection Yanbin Peng School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, China Zhinian Zhai School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, China Mingkun Feng School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, China

Salient Object Detection (SOD) in RGB-D images plays a crucial role in the field of computer vision, with its central aim being to identify and segment the most visually striking objects within a scene. However, optimizing the fusion of multi-modal and multi-scale features to enhance detection performance remains a challenge. To address this issue, we propose a network model based on semantic localization and multi-scale fusion (SLMSF-Net), specifically designed for RGB-D SOD. Firstly, we designed a Deep Attention Module (DAM), which extracts valuable depth feature information from both channel and spatial perspectives and efficiently merges it with RGB features. Subsequently, a Semantic Localization Module (SLM) is introduced to enhance the top-level modality fusion features, enabling the precise localization of salient objects. Finally, a Multi-Scale Fusion Module (MSF) is employed to perform inverse decoding on the modality fusion features, thus restoring the detailed information of the objects and generating high-precision saliency maps. Our approach has been validated across six RGB-D salient object detection datasets. The experimental results indicate an improvement of 0.20~1.80%, 0.09~1.46%, 0.19~1.05%, and 0.0002~0.0062, respectively in maxF, maxE, S, and MAE metrics, compared to the best competing methods (AFNet, DCMF, and C2DFNet).
02 08 2024 1117 s24041117 National Natural Science Foundation of China http://dx.doi.org/10.13039/501100001809 61972357 basic public welfare research program of Zhejiang Province http://dx.doi.org/10.13039/501100017577 LGF22F020017 GG21F010013 Natural Science Foundation of Zhejiang Province http://dx.doi.org/10.13039/501100004731 Y21F020030 https://creativecommons.org/licenses/by/4.0/ 10.3390/s24041117 https://www.mdpi.com/1424-8220/24/4/1117 https://www.mdpi.com/1424-8220/24/4/1117/pdf Liu Poolnet+: Exploring the potential of pooling for salient object detection IEEE Trans. Pattern Anal. Mach. Intell. 2022 10.1109/TPAMI.2021.3140168 45 887 Liang Multi-modal interactive attention and dual progressive decoding network for RGB-D/T salient object detection Neurocomputing 2022 10.1016/j.neucom.2022.03.029 490 132 10.1109/IGARSS46834.2022.9884365 Zakharov, I., Ma, Y., Henschel, M.D., Bennett, J., and Parsons, G. (2022, January 17–22). Object Tracking and Anomaly Detection in Full Motion Video. Proceedings of the IGARSS 2022, 2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia. 10.1007/978-3-031-20047-2_1 Zhang, Y., Sun, P., Jiang, Y., Yu, D.D., Weng, F.C., Yuan, Z.H., Luo, P., Liu, W.Y., and Wang, X.G. (2022, January 23–27). Bytetrack: Multi-object tracking by associating every detection box. Proceedings of the Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel. Wang Medical image segmentation using deep learning: A survey IET Image Process. 2022 10.1049/ipr2.12419 16 1243 He Image segmentation algorithm of lung cancer based on neural network model Expert Syst. 2022 10.1111/exsy.12822 39 e12822 Fan Vision-based holistic scene understanding towards proactive human–robot collaboration Robot. Comput.-Integr. Manuf. 2022 10.1016/j.rcim.2021.102304 75 102304 Gong Global contextually guided lightweight network for RGB-thermal urban scene understanding Eng. Appl. Artif. Intell. 2023 10.1016/j.engappai.2022.105510 117 105510 Chen Modality-Induced Transfer-Fusion Network for RGB-D and RGB-T Salient Object Detection IEEE Trans. Circuits Syst. Video Technol. 2022 10.1109/TCSVT.2022.3215979 33 1787 10.1007/s00371-023-02870-6 Gao, L., Fu, P., Xu, M., Wang, T., and Liu, B. (2023). UMINet: A unified multi-modality interaction network for RGB-D and RGB-T salient object detection. Vis. Comput., 1–18. Wu MobileSal: Extremely efficient RGB-D salient object detection IEEE Trans. Pattern Anal. Mach. Intell. 2021 10.1109/TPAMI.2021.3134684 44 10261 10.1007/978-3-030-01252-6_24 Liu, S., and Huang, D. (2018, January 8–14). Receptive field block net for accurate and fast object detection. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany. Zhang Learning implicit class knowledge for rgb-d co-salient object detection with transformers IEEE Trans. Image Process. 2022 10.1109/TIP.2022.3185550 31 4556 Wu EDN: Salient object detection via extremely-downsampled network IEEE Trans. Image Process. 2022 10.1109/TIP.2022.3164550 31 3125 Wu Recursive multi-model complementary deep fusion for robust salient object detection via parallel sub-networks Pattern Recognit. 2022 10.1016/j.patcog.2021.108212 121 108212 10.1007/978-3-030-58610-2_17 Fan, D.P., Zhai, Y., Borji, A., Yang, J., and Shao, L. (2020, January 23–28). BBS-Net: RGB-D salient object detection with a bifurcated backbone strategy network. Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK. Itti A model of saliency-based visual attention for rapid scene analysis IEEE Trans. Pattern Anal. Mach. Intell. 1998 10.1109/34.730558 20 1254 10.1109/CVPRW.2009.5206596 Achanta, R., Hemami, S., Estrada, F., and Susstrunk, S. (2009, January 20–25). Frequency-tuned salient region detection. Proceedings of the 2009 IEEE Conference on Computer vision And Pattern Recognition, Miami, FL, USA. Tong Salient object detection via global and local cues Pattern Recognit. 2015 10.1016/j.patcog.2014.12.005 48 3258 Chen Depth-quality-aware salient object detection IEEE Trans. Image Process. 2021 10.1109/TIP.2021.3052069 30 2350 Cong Global-and-local collaborative learning for co-salient object detection IEEE Trans. Cybern. 2022 10.1109/TCYB.2022.3169431 53 1920 10.1109/CVPR.2017.563 Hou, Q., Cheng, M.M., Hu, X., Borji, A., Tu, Z., and Torr, P. (2017, January 21–26). Deeply supervised salient object detection with short connections. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA. 10.1007/978-3-030-58536-5_3 Zhao, X., Pang, Y., Zhang, L., and Lu, H. (2020, January 23–28). Suppress and balance: A simple gated network for salient object detection. Proceedings of the Computer Vision–ECCV, Glasgow, UK. 10.1109/ICCV.2017.31 Zhang, P., Wang, D., Lu, H., Wang, H., and Ruan, X. (2017, January 22–29). Amulet: Aggregating multi-level convolutional features for salient object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy. 10.1109/CVPR.2019.00403 Wu, Z., Su, L., and Huang, Q. (2019, January 16–20). Cascaded partial decoder for fast and accurate salient object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA. 10.1007/978-3-030-01240-3_15 Chen, S., Tan, X.L., Wang, B., and Hu, X.L. (2018, January 8–14). Reverse attention for salient object detection. Proceedings of the European Conference on Computer Vision ECCV, Munich, Germany. 10.1109/CVPR.2019.00154 Wang, W., Zhao, S.Y., Shen, J.B., Hoi, S.C.H., and Borji, A. (2019, January 16–20). Salient object detection with pyramid attention and salient edges. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA. Wang Feature specific progressive improvement for salient object detection Pattern Recognit. 2024 10.1016/j.patcog.2023.110085 147 110085 10.1007/978-3-642-33709-3_8 Lang, C., Nguyen, T.V., Katti, H., Yadati, K., Kankanhalli, M., and Yan, S. (2012, January 7–13). Depth matters: Influence of depth cues on visual saliency. Proceedings of the Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy. 10.1007/978-3-319-10578-9_7 Peng, H., Li, B., Xiong, W., Hu, W., and Ji, R. (2014, January 6–12). RGBD salient object detection: A benchmark and algorithms. Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland. 10.1109/TCSVT.2023.3296581 Zhang, Q., Qin, Q., Yang, Y., Jiao, Q., and Han, J. (2023). Feature Calibrating and Fusing Network for RGB-D Salient Object Detection. IEEE Trans. Circuits Syst. Video Technol., 1–15. Ikeda RGB-D Salient Object Detection Using Saliency and Edge Reverse Attention IEEE Access 2023 10.1109/ACCESS.2023.3292880 11 68818 Xu RGB-D salient object detection via convolutional capsule network based on feature extraction and integration Sci. Rep. 2023 10.1038/s41598-023-44698-z 13 17652 Cong, R., Liu, H., Zhang, C., Zhang, W., Zheng, F., Song, R., and Kwong, S. (November, January 29). Point-aware interaction and cnn-induced refinement network for RGB-D salient object detection. Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada. Qu RGBD salient object detection via deep fusion IEEE Trans. Image Process. 2017 10.1109/TIP.2017.2682981 26 2274 Yi Cross-Stage Multi-Scale Interaction Network for RGB-D Salient Object Detection IEEE Signal Process. Lett. 2022 10.1109/LSP.2022.3223599 29 2402 Liu A cross-modal edge-guided salient object detection for RGB-D image Neurocomputing 2021 10.1016/j.neucom.2021.05.013 454 168 Sun, F., Hu, X.H., Wu, J.Y., Sun, J., and Wang, F.S. (2023). RGB-D Salient Object Detection Based on Cross-modal Interactive Fusion and Global Awareness. J. Softw., 1–15. Peng RGB-D Salient Object Detection Method Based on Multi-modal Fusion and Contour Guidance IEEE Access 2023 10.1109/ACCESS.2023.3344644 11 145217 Sun CATNet: A cascaded and aggregated transformer network for RGB-D salient object detection IEEE Trans. Multimed. 2023 10.1109/TMM.2023.3294003 26 2249 Theckedath Detecting affect states using VGG16, ResNet50 and SE-ResNet50 networks SN Comput. Sci. 2020 10.1007/s42979-020-0114-9 1 79 Li NestFuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models IEEE Trans. Instrum. Meas. 2020 10.1109/TIM.2020.3005230 69 9645 10.1109/3DV.2016.79 Milletari, F., Navab, N., and Ahmadi, S.A. (2016, January 25–28). V-net: Fully convolutional neural networks for volumetric medical image segmentation. Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA. Paszke Pytorch: An imperative style, high-performance deep learning library Adv. Neural Inf. Process. Syst. 2019 32 1 10.1007/978-1-4842-5364-9 Ketkar, N., and Moolayil, J. (2021). Deep Learning with Python: Learn Best Practices of Deep Learning Models with PyTorch, Apress. Krizhevsky ImageNet classification with deep convolutional neural networks Commun. ACM 2017 10.1145/3065386 60 84 10.1109/ICIP.2014.7025222 Ju, R., Ge, L., Geng, W., Ren, T., and Wu, G. (2014, January 27–30). Depth saliency based on anisotropic center-surround difference. Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), Paris, France. Niu, Y., Geng, Y., Li, X., and Liu, F. (2012, January 16–21). Leveraging stereopsis for saliency analysis. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA. Zhu, C., and Li, G. (2017, January 22–29). A three-pathway psychobiological framework of salient object detection using stereoscopic technology. Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy. Fan Rethinking RGB-D salient object detection: Models, data sets, and large-scale benchmarks IEEE Trans. Neural Netw. Learn. Syst. 2020 10.1109/TNNLS.2020.2996406 32 2075 10.1145/2632856.2632866 Cheng, Y., Fu, H., Wei, X., Xiao, J., and Cao, X. (2014, January 10–12). Depth enhanced saliency detection method. Proceedings of the International Conference on Internet Multimedia Computing and Service, Xiamen, China. 10.24963/ijcai.2018/97 Fan, D.P., Gong, C., Cao, Y., Ren, B., Cheng, M.M., and Borji, A. (2018, January 16). Enhanced-alignment measure for binary foreground map evaluation. Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden. Chen Adaptive fusion network for RGB-D salient object detection Neurocomputing 2023 10.1016/j.neucom.2022.12.004 522 152 Bi Cross-modal hierarchical interaction network for RGB-D salient object detection Pattern Recognit. 2023 10.1016/j.patcog.2022.109194 136 109194 Zhang C2DFNet: Criss-Cross Dynamic Filter Network for RGB-D Salient Object Detection IEEE Trans. Multimed. 2022 10.1109/TMM.2022.3187856 25 5142 Wang Learning discriminative cross-modality features for RGB-D saliency detection IEEE Trans. Image Process. 2022 10.1109/TIP.2022.3140606 31 1285 Chen CFIDNet: Cascaded feature interaction decoder for RGB-D salient object detection Neural Comput. Appl. 2022 10.1007/s00521-021-06845-3 34 7547 Cong CIR-Net: Cross-modality interaction and refinement for RGB-D salient object detection IEEE Trans. Image Process. 2022 10.1109/TIP.2022.3216198 31 6800 10.1109/CVPR46437.2021.00935 Ji, W., Li, J., Yu, S., Zhang, M., Piao, Y., Yao, S., Bi, Q., Ma, K., Zheng, Y., and Lu, H. (2021, January 19–25). Calibrated RGB-D salient object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual. 10.1145/3394171.3413855 Zhao, J., Zhao, Y., Li, J., and Chen, X. (2020, January 12–16). Is depth really necessary for salient object detection?. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA. Li ICNet: Information conversion network for RGB-D based salient object detection IEEE Trans. Image Process. 2020 10.1109/TIP.2020.2976689 29 4873

Item Type: Article
Subjects: Science Repository > Multidisciplinary
Depositing User: Managing Editor
Date Deposited: 09 Feb 2024 06:03
Last Modified: 09 Feb 2024 06:03
URI: http://research.manuscritpub.com/id/eprint/3956

Actions (login required)

View Item
View Item