Skeleton-based deep pose feature learning for action quality assessment on figure skating videos (2024)

research-article

Free Access

Authors:
Huiying Li School of Computer Science and Technology, Huaqiao University, Xiamen, 361000, China

School of Computer Science and Technology, Huaqiao University, Xiamen, 361000, China
View Profile

,
Qing Lei School of Computer Science and Technology, Huaqiao University, Xiamen, 361000, China

School of Computer Science and Technology, Huaqiao University, Xiamen, 361000, China
View Profile

,
Hongbo Zhang School of Computer Science and Technology, Huaqiao University, Xiamen, 361000, China

School of Computer Science and Technology, Huaqiao University, Xiamen, 361000, China
View Profile

,
Jixiang Du School of Computer Science and Technology, Huaqiao University, Xiamen, 361000, China

School of Computer Science and Technology, Huaqiao University, Xiamen, 361000, China
View Profile

,
Shangce Gao Faculty of Engineering, University of Toyama, Toyama-shi, 930-8555, Japan

Faculty of Engineering, University of Toyama, Toyama-shi, 930-8555, Japan
View Profile

Journal of Visual Communication and Image RepresentationVolume 89Issue CNov 2022https://doi.org/10.1016/j.jvcir.2022.103625

Published:01 November 2022Publication History

1citation
0
Downloads

Metrics

Total Citations1Total Downloads0

Last 12 Months0

Last 6 weeks0

Get Citation Alerts
New Citation Alert added!
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
Manage my Alerts
New Citation Alert!
Please log in to your account
See Also
What is GOE in figure skating (Grade of Execution)? - Ice Twizzle
Publisher Site

Journal of Visual Communication and Image Representation

Volume 89, Issue C

PreviousArticleNextArticle

Skip Abstract Section

Abstract

Most of the existing Action Quality Assessment (AQA) methods for scoring sports videos have deeply researched how to evaluate the single action or several sequential-defined actions that performed in short-term sport videos, such as diving, vault, etc. They attempted to extract features directly from RGB videos through 3D ConvNets, which makes the features mixed with ambiguous scene information. To investigate the effectiveness of deep pose feature learning on automatically evaluating the complicated activities in long-duration sports videos, such as figure skating and artistic gymnastic, we propose a skeleton-based deep pose feature learning method to address this problem. For pose feature extraction, a spatial–temporal pose extraction module (STPE) is built to capture the subtle changes of human body movements and obtain the detail representations for skeletal data in space and time dimensions. For temporal information representation, an inter-action temporal relation extraction module (ATRE) is implemented by recurrent neural network to model the dynamic temporal structure of skeletal subsequences. We evaluate the proposed method on figure skating activity of MIT-skate and FIS-V datasets. The experimental results show that the proposed method is more effective than RGB video-based deep feature learning methods, including SENet and C3D. Significant performance progress has been achieved for the Spearman Rank Correlation (SRC) on MIT-Skate dataset. On FIS-V dataset, for the Total Element Score (TES) and the Program Component Score (PCS), better SRC and MSE have been achieved between the predicted scores against the judge’s ones when compared with SENet and C3D feature methods.

Highlights

•	A skeleton-based pose feature method for AQA of complicated activity in long video.
•	Effective spatial–temporal pose features and inter-action relations are learned.
•	Performance outperform SENet and C3D feature methods on MIT-Skate and FIS-V dataset.

References

[1] Lei Q., Du J.-X., Zhang H.-B., Ye S., Chen D.-S., A survey of vision-based human action evaluation methods, Sensors 19 (19) (2019) 4129.Google Scholar
[2] Bruce X., Liu Y., Chan K.C., Yang Q., Wang X., Skeleton-based human action evaluation using graph convolutional network for monitoring Alzheimer’s progression, Pattern Recognit. 119 (2021).Google Scholar
[3] Liao Y., Vakanski A., Xian M., A deep learning framework for assessing physical rehabilitation exercises, IEEE Trans. Neural Syst. Rehabil. Eng. 28 (2) (2020) 468–477.Google Scholar
[4] Debnath B., O’brien M., Yamaguchi M., Behera A., A review of computer vision-based approaches for physical rehabilitation and assessment, Multimedia Syst. (2021) 1–31.Google Scholar
[5] Dong L.-J., Zhang H.-B., Shi Q., Lei Q., Du J.-X., Gao S., Learning and fusing multiple hidden substages for action quality assessment, Knowl.-Based Syst. 229 (2021).Google Scholar
[6] Li Y., Chai X., Chen X., End-to-end learning for action quality assessment, in: Pacific Rim Conference on Multimedia, Springer, 2018, pp. 125–134.Google Scholar
[7] Lei Q., Zhang H., Du J., Temporal attention learning for action quality assessment in sports video, Signal Imag. Video Process. 15 (7) (2021) 1575–1583.Google Scholar
[8] Fard M.J., Ameri S., DarinEllis R., Chinnam R.B., Pandya A.K., Klein M.D., Automated robot-assisted surgical skill evaluation: Predictive analytics approach, Int. J. Med. Robot. Comput. Assist. Surg. 14 (1) (2018).Google Scholar
[9] Wang Z., MajewiczFey A., Deep learning with convolutional neural network for objective skill evaluation in robot-assisted surgery, Int. J. Comput. Assist. Radiol. Surg. 13 (12) (2018) 1959–1970.Google Scholar
[10] H. Doughty, D. Damen, W. Mayol-Cuevas, Who’s better? who’s best? pairwise deep ranking for skill determination, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6057–6066.Google Scholar
[11] H. Doughty, W. Mayol-Cuevas, D. Damen, The pros and cons: Rank-aware temporal attention for skill determination in long videos, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7862–7871.Google Scholar
[12] D. Tran, L. Bourdev, R. Fergus, L. Torresani, M. Paluri, Learning spatio-temporal features with 3d convolutional networks, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 4489–4497.Google Scholar
[13] J. Carreira, A. Zisserman, Quo vadis, action recognition? a new model and the kinetics dataset, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6299–6308.Google Scholar
[14] J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132–7141.Google Scholar
[15] Pirsiavash H., Vondrick C., Torralba A., Assessing the quality of actions, in: European Conference on Computer Vision, Springer, 2014, pp. 556–571.Google Scholar
[16] Venkataraman V., Vlachos I., Turaga P.K., Dynamical regularity for action analysis, in: BMVC, 2015.Google Scholar
[17] M. Nekoui, F.O.T. Cruz, L. Cheng, Falcons: Fast learner-grader for contorted poses in sports, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 900–901.Google Scholar
[18] M. Nekoui, F.O.T. Cruz, L. Cheng, EAGLE-Eye: Extreme-Pose Action Grader Using Detail Bird’s-Eye View, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 394–402.Google Scholar
[19] Elkholy A., Hussein M.E., Gomaa W., Damen D., Saba E., Efficient and robust skeleton-based quality assessment and abnormality detection in human action performance, IEEE J. Biomed. Health Inf. 24 (1) (2019) 280–291.Google Scholar
[20] J.-H. Pan, J. Gao, W.-S. Zheng, Action assessment by joint relation graphs, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6331–6340.Google Scholar
[21] Li Y., Chai X., Chen X., Scoringnet: Learning key fragment for action quality assessment with ranking loss in skilled sports, in: Asian Conference on Computer Vision, Springer, 2018, pp. 149–164.Google Scholar
[22] Parmar P., Morris B., Action quality assessment across multiple actions, in: 2019 IEEE Winter Conference on Applications of Computer Vision, WACV, IEEE, 2019, pp. 1468–1476.Google Scholar
[23] Xiang X., Tian Y., Reiter A., Hager G.D., Tran T.D., S3d: Stacking segmental p3d for action quality assessment, in: 2018 25th IEEE International Conference on Image Processing, ICIP, IEEE, 2018, pp. 928–932.Google Scholar
[24] P. Parmar, B.T. Morris, What and how well you performed? a multitask learning approach to action quality assessment, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 304–313.Google Scholar
[25] Xu C., Fu Y., Zhang B., Chen Z., Jiang Y.-G., Xue X., Learning to score figure skating sport videos, IEEE Trans. Circuits Syst. Video Technol. 30 (12) (2019) 4578–4590.Google Scholar
[26] P. Parmar, B. TranMorris, Learning to score olympic events, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 20–28.Google Scholar
[27] Y. Tang, Z. Ni, J. Zhou, D. Zhang, J. Lu, Y. Wu, J. Zhou, Uncertainty-aware score distribution learning for action quality assessment, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9839–9848.Google Scholar
[28] Jain H., Harit G., Sharma A., Action quality assessment using siamese network-based deep metric learning, IEEE Trans. Circuits Syst. Video Technol. 31 (6) (2020) 2260–2273.Google Scholar
[29] Wang J., Du Z., Li A., Wang Y., Assessing action quality via attentive spatio-temporal convolutional networks, in: Chinese Conference on Pattern Recognition and Computer Vision, PRCV, Springer, 2020, pp. 3–16.Google Scholar
[30] Sardari F., Paiement A., Hannuna S., Mirmehdi M., VI-net: View-invariant quality of human movement assessment, Sensors 20 (18) (2020) 5258.Google Scholar
[31] Gao J., Zheng W.-S., Pan J.-H., Gao C., Wang Y., Zeng W., Lai J., An asymmetric modeling for action assessment, in: European Conference on Computer Vision, Springer, 2020, pp. 222–238.Google Scholar
[32] Nair V., Hemeren P., Vignolo A., Noceti N., Nicora E., Sciutti A., Rea F., Billing E., Odone F., Sandini G., Action similarity judgment based on kinematic primitives, in: 2020 Joint IEEE 10th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), IEEE, 2020, pp. 1–8.Google Scholar
[33] IsmailFawaz H., Forestier G., Weber J., Idoumghar L., Muller P.-A., Evaluating surgical skills from kinematic data using convolutional neural networks, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2018, pp. 214–221.Google Scholar
[34] International Skating Union, Network data, 2021, https://www.isu.org/figure-skating/rules/fsk-regulations-rules/file.Google Scholar
[35] Z. Cao, T. Simon, S.-E. Wei, Y. Sheikh, Realtime multi-person 2d pose estimation using part affinity fields, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7291–7299.Google Scholar
[36] S. Yan, Y. Xiong, D. Lin, Spatial temporal graph convolutional networks for skeleton-based action recognition, in: Thirty-Second AAAI Conference on Artificial Intelligence, 2018.Google Scholar
[37] Hochreiter S., Schmidhuber J., Long short-term memory, Neural Comput. 9 (8) (1997) 1735–1780.Google ScholarDigital Library
[38] Kingma D.P., Ba J., Adam: A method for stochastic optimization, 2014, arXiv preprint arXiv:1412.6980.Google Scholar
[39] Le Q.V., Zou W.Y., Yeung S.Y., Ng A.Y., Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis, in: CVPR 2011, IEEE, 2011, pp. 3361–3368.Google Scholar

Cited By

View all

Index Terms

Skeleton-based deep pose feature learning for action quality assessment on figure skating videos
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks
        Activity recognition and understanding
        Scene understanding
  2. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Learning to Score Figure Skating Sport Videos
This paper aims at learning to score the figure skating sports videos. To address this task, we propose a deep architecture that includes two complementary components, <italic>i.e</italic>., Self-Attentive LSTM and Multi-scale Convolutional Skip LSTM. ...
Read More
Learning time-aware features for action quality assessment
Highlights
- We propose to use the TA attention module to capture the relationship of different video clips.
Abstract
Action quality assessment (AQA) is a task to assess the performance of a human action, which can be widely used in many real-world scenarios such as sport events. Current AQA methods generally extract features from the video and ...
Read More
Image representation of pose-transition feature for 3D skeleton-based action recognition
Highlights
- An efficient 3D skeleton-based action recognition using Deep CNNs.
- A novel ...
Abstract
Recently, skeleton-based human action recognition has received more interest from industrial and research communities for many practical applications thanks to the popularity of depth sensors. A large number of conventional approaches, ...
Read More

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Information
Contributors

Published in

Journal of Visual Communication and Image Representation Volume 89, Issue C
Nov 2022
530 pages
ISSN:1047-3203
Issue’s Table of Contents

Elsevier Inc.
Sponsors
In-Cooperation
Publisher
Academic Press, Inc.
United States
Publication History
- Published: 1 November 2022
Author Tags
Action quality assessment
Figure skating sport videos
Spatial–temporal pose feature extraction
Action relation learning
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Bibliometrics
Citations1

Article Metrics
- 1
  Total Citations
  View Citations
- Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

Digital Edition

View this article in digital edition.

View Digital Edition

Figures
Other

Close Figure Viewer

Browse AllReturnChange zoom level

Caption

View Issue’s Table of Contents

Skeleton-based deep pose feature learning for action quality assessment on figure skating videos (2024)

New Citation Alert added!

New Citation Alert!

Journal of Visual Communication and Image Representation

Abstract

Abstract

Highlights

References

Cited By

Index Terms

Skeleton-based deep pose feature learning for action quality assessment on figure skating videos

Recommendations

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

Digital Edition

Caption

Export Citations

References