research-article Free Access
- Authors:
- Huiying Li School of Computer Science and Technology, Huaqiao University, Xiamen, 361000, China
School of Computer Science and Technology, Huaqiao University, Xiamen, 361000, China
View Profile
- Qing Lei School of Computer Science and Technology, Huaqiao University, Xiamen, 361000, China
School of Computer Science and Technology, Huaqiao University, Xiamen, 361000, China
View Profile
- Hongbo Zhang School of Computer Science and Technology, Huaqiao University, Xiamen, 361000, China
School of Computer Science and Technology, Huaqiao University, Xiamen, 361000, China
View Profile
- Jixiang Du School of Computer Science and Technology, Huaqiao University, Xiamen, 361000, China
School of Computer Science and Technology, Huaqiao University, Xiamen, 361000, China
View Profile
- Shangce Gao Faculty of Engineering, University of Toyama, Toyama-shi, 930-8555, Japan
Faculty of Engineering, University of Toyama, Toyama-shi, 930-8555, Japan
View Profile
Journal of Visual Communication and Image RepresentationVolume 89Issue CNov 2022https://doi.org/10.1016/j.jvcir.2022.103625
Published:01 November 2022Publication History
- 1citation
- 0
- Downloads
Metrics
Total Citations1Total Downloads0Last 12 Months0
Last 6 weeks0
- Get Citation Alerts
New Citation Alert added!
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
Manage my Alerts
New Citation Alert!
Please log in to your account
- Publisher Site
Journal of Visual Communication and Image Representation
Volume 89, Issue C
PreviousArticleNextArticle
Abstract
Abstract
Most of the existing Action Quality Assessment (AQA) methods for scoring sports videos have deeply researched how to evaluate the single action or several sequential-defined actions that performed in short-term sport videos, such as diving, vault, etc. They attempted to extract features directly from RGB videos through 3D ConvNets, which makes the features mixed with ambiguous scene information. To investigate the effectiveness of deep pose feature learning on automatically evaluating the complicated activities in long-duration sports videos, such as figure skating and artistic gymnastic, we propose a skeleton-based deep pose feature learning method to address this problem. For pose feature extraction, a spatial–temporal pose extraction module (STPE) is built to capture the subtle changes of human body movements and obtain the detail representations for skeletal data in space and time dimensions. For temporal information representation, an inter-action temporal relation extraction module (ATRE) is implemented by recurrent neural network to model the dynamic temporal structure of skeletal subsequences. We evaluate the proposed method on figure skating activity of MIT-skate and FIS-V datasets. The experimental results show that the proposed method is more effective than RGB video-based deep feature learning methods, including SENet and C3D. Significant performance progress has been achieved for the Spearman Rank Correlation (SRC) on MIT-Skate dataset. On FIS-V dataset, for the Total Element Score (TES) and the Program Component Score (PCS), better SRC and MSE have been achieved between the predicted scores against the judge’s ones when compared with SENet and C3D feature methods.
Highlights
• | A skeleton-based pose feature method for AQA of complicated activity in long video. | ||||
• | Effective spatial–temporal pose features and inter-action relations are learned. | ||||
• | Performance outperform SENet and C3D feature methods on MIT-Skate and FIS-V dataset. |
References
- [1] Lei Q., Du J.-X., Zhang H.-B., Ye S., Chen D.-S., A survey of vision-based human action evaluation methods, Sensors 19 (19) (2019) 4129.Google Scholar
- [2] Bruce X., Liu Y., Chan K.C., Yang Q., Wang X., Skeleton-based human action evaluation using graph convolutional network for monitoring Alzheimer’s progression, Pattern Recognit. 119 (2021).Google Scholar
- [3] Liao Y., Vakanski A., Xian M., A deep learning framework for assessing physical rehabilitation exercises, IEEE Trans. Neural Syst. Rehabil. Eng. 28 (2) (2020) 468–477.Google Scholar
- [4] Debnath B., O’brien M., Yamaguchi M., Behera A., A review of computer vision-based approaches for physical rehabilitation and assessment, Multimedia Syst. (2021) 1–31.Google Scholar
- [5] Dong L.-J., Zhang H.-B., Shi Q., Lei Q., Du J.-X., Gao S., Learning and fusing multiple hidden substages for action quality assessment, Knowl.-Based Syst. 229 (2021).Google Scholar
- [6] Li Y., Chai X., Chen X.,
End-to-end learning for action quality assessment , in: Pacific Rim Conference on Multimedia, Springer, 2018, pp. 125–134.Google Scholar - [7] Lei Q., Zhang H., Du J., Temporal attention learning for action quality assessment in sports video, Signal Imag. Video Process. 15 (7) (2021) 1575–1583.Google Scholar
- [8] Fard M.J., Ameri S., DarinEllis R., Chinnam R.B., Pandya A.K., Klein M.D., Automated robot-assisted surgical skill evaluation: Predictive analytics approach, Int. J. Med. Robot. Comput. Assist. Surg. 14 (1) (2018).Google Scholar
- [9] Wang Z., MajewiczFey A., Deep learning with convolutional neural network for objective skill evaluation in robot-assisted surgery, Int. J. Comput. Assist. Radiol. Surg. 13 (12) (2018) 1959–1970.Google Scholar
- [10] H. Doughty, D. Damen, W. Mayol-Cuevas, Who’s better? who’s best? pairwise deep ranking for skill determination, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6057–6066.Google Scholar
- [11] H. Doughty, W. Mayol-Cuevas, D. Damen, The pros and cons: Rank-aware temporal attention for skill determination in long videos, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7862–7871.Google Scholar
- [12] D. Tran, L. Bourdev, R. Fergus, L. Torresani, M. Paluri, Learning spatio-temporal features with 3d convolutional networks, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 4489–4497.Google Scholar
- [13] J. Carreira, A. Zisserman, Quo vadis, action recognition? a new model and the kinetics dataset, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6299–6308.Google Scholar
- [14] J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132–7141.Google Scholar
- [15] Pirsiavash H., Vondrick C., Torralba A.,
Assessing the quality of actions , in: European Conference on Computer Vision, Springer, 2014, pp. 556–571.Google Scholar - [16] Venkataraman V., Vlachos I., Turaga P.K.,
Dynamical regularity for action analysis , in: BMVC, 2015.Google Scholar - [17] M. Nekoui, F.O.T. Cruz, L. Cheng, Falcons: Fast learner-grader for contorted poses in sports, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 900–901.Google Scholar
- [18] M. Nekoui, F.O.T. Cruz, L. Cheng, EAGLE-Eye: Extreme-Pose Action Grader Using Detail Bird’s-Eye View, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 394–402.Google Scholar
- [19] Elkholy A., Hussein M.E., Gomaa W., Damen D., Saba E., Efficient and robust skeleton-based quality assessment and abnormality detection in human action performance, IEEE J. Biomed. Health Inf. 24 (1) (2019) 280–291.Google Scholar
- [20] J.-H. Pan, J. Gao, W.-S. Zheng, Action assessment by joint relation graphs, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6331–6340.Google Scholar
- [21] Li Y., Chai X., Chen X.,
Scoringnet: Learning key fragment for action quality assessment with ranking loss in skilled sports , in: Asian Conference on Computer Vision, Springer, 2018, pp. 149–164.Google Scholar - [22] Parmar P., Morris B.,
Action quality assessment across multiple actions , in: 2019 IEEE Winter Conference on Applications of Computer Vision,WACV , IEEE, 2019, pp. 1468–1476.Google Scholar - [23] Xiang X., Tian Y., Reiter A., Hager G.D., Tran T.D.,
S3d: Stacking segmental p3d for action quality assessment , in: 2018 25th IEEE International Conference on Image Processing,ICIP , IEEE, 2018, pp. 928–932.Google Scholar - [24] P. Parmar, B.T. Morris, What and how well you performed? a multitask learning approach to action quality assessment, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 304–313.Google Scholar
- [25] Xu C., Fu Y., Zhang B., Chen Z., Jiang Y.-G., Xue X., Learning to score figure skating sport videos, IEEE Trans. Circuits Syst. Video Technol. 30 (12) (2019) 4578–4590.Google Scholar
- [26] P. Parmar, B. TranMorris, Learning to score olympic events, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 20–28.Google Scholar
- [27] Y. Tang, Z. Ni, J. Zhou, D. Zhang, J. Lu, Y. Wu, J. Zhou, Uncertainty-aware score distribution learning for action quality assessment, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9839–9848.Google Scholar
- [28] Jain H., Harit G., Sharma A., Action quality assessment using siamese network-based deep metric learning, IEEE Trans. Circuits Syst. Video Technol. 31 (6) (2020) 2260–2273.Google Scholar
- [29] Wang J., Du Z., Li A., Wang Y.,
Assessing action quality via attentive spatio-temporal convolutional networks , in: Chinese Conference on Pattern Recognition and Computer Vision,PRCV , Springer, 2020, pp. 3–16.Google Scholar - [30] Sardari F., Paiement A., Hannuna S., Mirmehdi M., VI-net: View-invariant quality of human movement assessment, Sensors 20 (18) (2020) 5258.Google Scholar
- [31] Gao J., Zheng W.-S., Pan J.-H., Gao C., Wang Y., Zeng W., Lai J.,
An asymmetric modeling for action assessment , in: European Conference on Computer Vision, Springer, 2020, pp. 222–238.Google Scholar - [32] Nair V., Hemeren P., Vignolo A., Noceti N., Nicora E., Sciutti A., Rea F., Billing E., Odone F., Sandini G.,
Action similarity judgment based on kinematic primitives , in: 2020 Joint IEEE 10th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), IEEE, 2020, pp. 1–8.Google Scholar - [33] IsmailFawaz H., Forestier G., Weber J., Idoumghar L., Muller P.-A.,
Evaluating surgical skills from kinematic data using convolutional neural networks , in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2018, pp. 214–221.Google Scholar - [34] International Skating Union, Network data, 2021, https://www.isu.org/figure-skating/rules/fsk-regulations-rules/file.Google Scholar
- [35] Z. Cao, T. Simon, S.-E. Wei, Y. Sheikh, Realtime multi-person 2d pose estimation using part affinity fields, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7291–7299.Google Scholar
- [36] S. Yan, Y. Xiong, D. Lin, Spatial temporal graph convolutional networks for skeleton-based action recognition, in: Thirty-Second AAAI Conference on Artificial Intelligence, 2018.Google Scholar
- [37] Hochreiter S., Schmidhuber J., Long short-term memory, Neural Comput. 9 (8) (1997) 1735–1780.Google ScholarDigital Library
- [38] Kingma D.P., Ba J., Adam: A method for stochastic optimization, 2014, arXiv preprint arXiv:1412.6980.Google Scholar
- [39] Le Q.V., Zou W.Y., Yeung S.Y., Ng A.Y.,
Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , in: CVPR 2011, IEEE, 2011, pp. 3361–3368.Google Scholar
Cited By
View all
Index Terms
Skeleton-based deep pose feature learning for action quality assessment on figure skating videos
Computing methodologies
Artificial intelligence
Computer vision
Computer vision problems
Computer vision tasks
Activity recognition and understanding
Scene understanding
Machine learning
Index terms have been assigned to the content through auto-classification.
Recommendations
- Learning to Score Figure Skating Sport Videos
This paper aims at learning to score the figure skating sports videos. To address this task, we propose a deep architecture that includes two complementary components, <italic>i.e</italic>., Self-Attentive LSTM and Multi-scale Convolutional Skip LSTM. ...
Read More
- Learning time-aware features for action quality assessment
Highlights
- We propose to use the TA attention module to capture the relationship of different video clips.
Abstract
Action quality assessment (AQA) is a task to assess the performance of a human action, which can be widely used in many real-world scenarios such as sport events. Current AQA methods generally extract features from the video and ...
Read More
- Image representation of pose-transition feature for 3D skeleton-based action recognition
Highlights
- An efficient 3D skeleton-based action recognition using Deep CNNs.
- A novel ...
Abstract
Recently, skeleton-based human action recognition has received more interest from industrial and research communities for many practical applications thanks to the popularity of depth sensors. A large number of conventional approaches, ...
Read More
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in
Full Access
Get this Article
- Information
- Contributors
Published in
Journal of Visual Communication and Image Representation Volume 89, Issue C
Nov 2022
530 pages
ISSN:1047-3203
Issue’s Table of Contents
Elsevier Inc.
Sponsors
In-Cooperation
Publisher
Academic Press, Inc.
United States
Publication History
- Published: 1 November 2022
Author Tags
- Action quality assessment
- Figure skating sport videos
- Spatial–temporal pose feature extraction
- Action relation learning
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics
- Bibliometrics
- Citations1
Article Metrics
- View Citations
1
Total Citations
Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all
Digital Edition
View this article in digital edition.
View Digital Edition
- Figures
- Other
Close Figure Viewer
Browse AllReturn
Caption
View Issue’s Table of Contents