ALFormer: Attribute Localization Transformer in Pedestrian Attribute Recognition

Yuxin Liu1, Mingzhe Wang1, Chao Li1 and Shuoyan Liu1

  1. Institute of Computing Technology, China Academy of Railway Sciences Corporation Limited
    Beijing, China
    {yxin l,zzh355997367,lchao1234}@163.com,06112062@bjtu.edu.cn

Abstract

Pedestrian attribute recognition is an important task for intelligent video surveillance. However, existing methods struggle to accurately localize discriminative regions for each attribute. We propose Attribute Localization Transformer (ALFormer), a novel framework to improve spatial localization through two key components. First, we introduce Mask Contrast Learning (MCL) to suppress regional feature relevance, forcing the model to focus on intrinsic spatial areas for each attribute. Second, we design an Attribute Spatial Memory (ASM) module to generate reliable attention maps that capture inherent locations for each attribute. Extensive experiments on two benchmark datasets demonstrate state-of-the-art performance of ALFormer. Ablation studies and visualizations verify the effectiveness of the proposed modules in improving attribute localization. Our work provides a simple yet effective approach to exploit spatial consistency for enhanced pedestrian attribute recognition.

Key words

spatial attention, attribute localization, contrast loss, random mask

Digital Object Identifier (DOI)

https://doi.org/10.2298/CSIS231015048L

Publication information

Volume 21, Issue 4 (September 2024)
Year of Publication: 2024
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium

Full text

DownloadAvailable in PDF
Portable Document Format

How to cite

Liu, Y., Wang, M., Li, C., Liu, S.: ALFormer: Attribute Localization Transformer in Pedestrian Attribute Recognition. Computer Science and Information Systems, Vol. 21, No. 4, 1567–1582. (2024), https://doi.org/10.2298/CSIS231015048L