2023 Innovations in Intelligent Systems and Applications Conference, ASYU 2023, Sivas, Türkiye, 11 - 13 Ekim 2023
Skeleton-based action recognition has attracted much attention in recent years as skeleton data is robust to body scales, dynamic camera images, illumination changes and complex background situations. The natural structure of the human skeleton is well suited to structuring it as a graph. Therefore, many researchers have been working on graph convolution networks for skeleton-based action recognition task. In particular, spatial-temporal graph convolutional networks (STGCN) have proven effective in learning both spatial and temporal dependencies on skeleton graph data. Although it has proven to perform well on skeleton data, the topology of the graph representing the human body in ST-GCN models is manually adjusted and fixed on all layers. This limits the ability to obtain more complex and richer representations. In contrast, spatial-temporal attention graph convolutional networks (STA-GCN) operates by taking into account the critical connections between joints and the importance of joints in each frame, according to actions. In this way, it can identify important joint relations specific to actions. In this study, four network models with different layer depths were designed, based on the ST-GCN and STA-GCN architectures. Subsequently, these models were trained with various numbers of epochs, and the effects of layer depth and the number of epochs on the performance of skeleton-based action recognition were experimentally investigated. According to the experimental results, the proposed 3xSTA-GCN block structured model achieved an accuracy rate of 86.71% on the CS test set and 92.76% on the CV test set.