AGMS-GCN: Attention-guided multi-scale graph convolutional networks for skeleton-based action recognition


Kilic U., Karadag O. O., Ozyer G. T.

Knowledge-Based Systems, cilt.311, 2025 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 311
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1016/j.knosys.2025.113045
  • Dergi Adı: Knowledge-Based Systems
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC, Library and Information Science Abstracts, Library, Information Science & Technology Abstracts (LISTA)
  • Anahtar Kelimeler: Action recognition, Attention mechanism, Graph convolutional networks, Multi-scale, Skeletal data
  • Atatürk Üniversitesi Adresli: Evet

Özet

Graph Convolutional Networks have the capability to model non-Euclidean data with high effectiveness. Due to this capability, they perform well on standard benchmarks for skeleton-based action recognition (SBAR). Specifically, spatial–temporal graph convolutional networks (ST-GCNs) function effectively in learning spatial–temporal relationships on skeletal graph patterns. In ST-GCN models, a fixed skeletal graph pattern is used across all layers. ST-GCN models obtain spatial–temporal features by performing standard convolution on this fixed graph topology within a local neighborhood limited by the size of the convolution kernel. This convolution kernel dimension can only model dependencies between joints at short distances and short-range temporal dependencies. However, it fails to model long-range temporal information and long-distance joint dependencies. Effectively capturing these dependencies is key to improving the performance of ST-GCN models. In this study, we propose AGMS-GCN, an attention-guided multi-scale graph convolutional network structure that dynamically determines the weights of the dependencies between joints. In the proposed AGMS-GCN architecture, new adjacency matrices that represent action-specific joint relationships are generated by obtaining spatial–temporal dependencies with the attention mechanism on the feature maps extracted using spatial–temporal graph convolutions. This enables the extraction of features that take into account both the short- and long-range spatial–temporal relationship between action-specific joints. This data-driven graph construction method provides a more robust graph representation for capturing subtle differences between different actions. In addition, actions occur through the coordinated movement of multiple body joints. However, most existing SBAR approaches overlook this coordination, considering the skeletal graph from a single-scale perspective. Consequently, these methods miss high-level contextual features necessary for distinguishing actions. The AGMS-GCN architecture addresses this shortcoming with its multi-scale structure. Comprehensive experiments demonstrate that our proposed method attains state-of-the-art (SOTA) performance on the NTU RGB+D 60 and Northwestern-UCLA datasets. It also achieves SOTA competitive performance on the NTU RGB+D 120 dataset. The source code of the proposed AGMS-GCN model is available at: https://github.com/ugrkilc/AGMS-GCN.