Computer-Vision-Enabled Worker Video Analysis for Motion Amount Quantification

Hari Iyer, Neel Macwan, Shenghan Guo, and Hee** Jeong The corresponding author, Hee** Jeong, is from The Polytechnic School at Arizona State University, Mesa, AZ, 85212, USA. E-mail: [email protected] first author, Hari Iyer, is from The Polytechnic School at Arizona State University, Mesa, AZ, 85212, USA.Neel Macwan and Shenghan Guo are from The School of Manufacturing Systems and Networks at Arizona State University, Mesa, AZ, 85212
Abstract

The performance of physical workers is significantly influenced by the quantity of their motions. However, monitoring and assessing these motions is challenging due to the complexities of motion sensing, tracking, and quantification. Recent advancements have utilized in-situ video analysis for real-time observation of worker behaviors, enabling data-driven quantification of motion amounts. Nevertheless, there are limitations to monitoring worker movements using video data. This paper introduces a novel framework based on computer vision to track and quantify the motion of workers’ upper and lower limbs, issuing alerts when the motion reaches critical thresholds. Using joint position data from posture estimation, the framework employs Hotelling’s T2superscript𝑇2T^{2}italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT statistic to quantify and monitor motion amounts, integrating computer vision tools to address challenges in automated worker training and enhance exploratory research in this field. We collected data of participants performing lifting and moving tasks with large boxes and small wooden cubes, to simulate macro and micro assembly tasks respectively. It was found that the correlation between workers’ joint motion amount and the Hotelling’s T2 statistic was approximately 35% greater for micro tasks compared to macro tasks, highlighting the framework’s ability to identify fine-grained motion differences. This study demonstrates the effectiveness of the proposed system in real-time applications across various industry settings. It provides a tool for enhancing worker safety and productivity through precision motion analysis and proactive ergonomic adjustments.

Index Terms:
Computer vision, Hotelling’s T2superscript𝑇2T^{2}italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, In-situ video, Joint motion amount, Posture estimation

I Introduction

Worker motion analysis involves studying human movement within different workplace environments. Sustained awkward postures and muscular overexertion occur commonly in the construction industry, leading to work-related injuries [1]. Observational methods centered on external causes are repetitive and subjective while gauging internal causes directly can be invasive and limit productivity. The primary motivation for analyzing worker motion is driven by safety concerns [2, 3]. By monitoring and analyzing the movements of workers, safety professionals and ergonomists can detect potential risks, such as injuries caused by repetitive muscular strain [4] and awkward postures [5]. Implementing safety measures and suggesting ergonomic improvements through feedback can help mitigate the risk of such injuries.

Analyzing how tasks are performed by physical workers, especially in labor-intensive industries like construction, manufacturing, and agriculture, can reveal redundancies and wasteful movements within workflows. Gouett (2010) [6] conducted a study to analyze activity for continuous productivity improvement in construction and found that applying this process continually to a construction site can significantly enhance direct-work rates throughout a project’s duration. This optimization not only increases productivity but also reduces unnecessary strain on workers. Worker motion analysis is closely associated with ergonomics [7, 2], which is the science of designing the workplace to fit the worker. For example, Golabchi et al. (2015) [8] developed a programmed simulation of biomechanical systems for design analysis of workplace ergonomics. The findings demonstrated that this approach allows for the detection and reduction of uncomfortable worker postures in virtual models, thereby decreasing ergonomic hazards during workplace design. By studying how workers move and interact with their environment, ergonomists can provide valuable inputs for designing workspaces, tools, and equipment that minimize physical strain during tasks. Understanding how workers interact with products and equipment allows designers to create tools that are easier and more comfortable to use. This user-centric approach not only enhances user satisfaction but also contributes to improved productivity and efficiency across various industries.

Worker motion analysis serves as a valuable tool for training and skill development purposes. By analyzing the movements of experienced workers, organizations can develop detailed training programs that teach new employees the most efficient and safe ways to perform their duties. Additionally, motion analysis can identify areas where workers may require additional support or training to enhance their skills and performance levels. A critical aspect of worker motion analysis is the human data collection capability. Recently, with the advances in sensing technologies, real-time sensor data can be collected for human workers while they are performing various working tasks. Data science and artificial intelligence (AI) methodology were developed to process and analyze the sensor data, thus leading to breakthroughs in the understanding and evaluation of worker motions. For example, the posture- and position-based workers’ safety risk evaluation framework developed by Chen et al. (2019) [9] demonstrates that automated evaluation of safety risk extent, combining spatial and postural factors, is almost 85% accurate. This is compared to the approximately 80% accuracy achieved by single-feature-driven risk examination using location, and 55% accuracy when using posture. This proactive approach can help mitigate the risk of musculoskeletal injuries. According to Ryu et al. (2018) [10], motion data can be automatically collected and analyzed to gain insights into the working postures of workers with varying levels of experience. This data can be used as a training tool for apprenticeship programs. By analyzing the movements of experienced workers, organizations can develop detailed training programs to teach new employees the most efficient and safe ways to perform their duties. Additionally, motion information from sensor data analysis can identify areas where workers may require additional support or training to enhance their skills and performance levels. Worker motion analysis is an essential metric to ensure safety, and efficiency, and identify shortcomings in ergonomics of posture and redundant activity in task execution.

Refer to caption
Figure 1: Architecture pipelines of posture estimation and worker motion analysis from the videos captured of assembly tasks.

Nonetheless, the existing methods for processing and analyzing human worker sensor data have limited effectiveness and efficiency. Monitoring and evaluating physical workers’ performance related to the task of joint motion quantification is rather challenging. Moran and Wallace (2007) [11] conducted a comparison of studies that examined the amount of joint motion involved in various activities such as jum** and moving. The results obtained were diverse and sometimes contradictory, which could partly be attributed to the different eccentric loading conditions that offered a range of joint motion. Videos and image frames have become common for human motion detection and monitoring, advancing traditional observational methodologies [12, 13]. To develop Computer Vision (CV)-based processing for image data in real-time, posture estimation and machine learning algorithms are used to extract actionable information from visual data. Mehrizi et al. (2018) [14] used CV to estimate the 3D posture of symmetrical lifting. To achieve this, they calculated joint motion and kinematics using a CV-based motion capture method, which they then compared to a surface marker-based motion capture method. The study found that the joint-motion kinematics assessed using the CV-driven method were almost comparable to those obtained by the surface marker-based method. The intersection of video processing with nuanced behavioral analysis necessitates a detailed understanding of contextual and cultural factors influencing human actions. Real-time sensing of human behavior can provide valuable insights into human engagement in work, demand, and workload. One such attempt is proposed by Zheng et al. (2012) [15], where the study suggests that eyeblinks could be associated with the level of cognitive load that surgeons are experiencing. In addition to basic motion detection, significant advancements have been made in the field of CV for human motion and behavioral analysis. Seo et al. (2016) [16] developed a CV model to classify posture for automated ergonomic assessment, achieving over 90% classification accuracy. Similarly, a large number of techniques have been proposed and developed to extract valuable information from visual data, with the choice of method depending on the specific industry and task under consideration. While in-situ videos have emerged as a valuable resource for real-time sensing of human behavior at work, there are limitations to effectively processing these videos and extracting posture information and scalability. To overcome these shortcomings in the methodology, we propose an automated framework (see Fig. 1). The framework utilizes CV techniques [17, 18]. It extracts a worker’s body features using posture estimation through landmark points from the task videos. These extracted features serve as evidence for posture estimation. Then, the framework utilizes statistical control charts, specifically, the Hotelling’s T2superscript𝑇2T^{2}italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT chart (Montgomery, 2019) [19]. Specifically, a CV tool named MediaPipe [18] was adopted to extract the human body’s landmarks. Out of the 33 body landmarks output by MediaPipe, we selected those associated with joint movements to be the “key features” for further motion quantification and analysis. A Hotelling’s T2superscript𝑇2T^{2}italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT control chart is built upon the “key features” to detect significant motions, termed “anomalies” in a control chart context. A correlation analysis is done between the detected anomalies and ground-truth motion information (from experiment designs) to validate the accuracy of motion detections (“anomalies”). The proposed framework can be implemented in real time to quantify and monitor worker motions.

This study has a variety of contributions. First, the proposed framework integrates open-access, high-performance CV tools with statistical control techniques to trace worker motions from in-situ videos of human work. The methodology can be implemented either offline or in real-time to generate accurate motion quantification and analytic results. Its effectiveness supplements the existing studies for in-situ human videos [20, 21, 22] and data-driven worker motion analysis. Second, the proposed framework helps set a new technique for using in-situ videos for human worker monitoring and performance evaluation, both in controlled-lab settings and industrial environments. Thirdly, the proposed framework contributes to advancing worker safety by providing a sophisticated tool for preemptive risk mitigation. Through the integration of CV and statistical control techniques, the system can detect anomalies in worker motions in real-time or offline. This proactive approach helps identification of potential safety risks, such as awkward postures or repetitive strain, allowing for timely intervention and the implementation of preventive measures. By enhancing the precision of safety risk evaluation, the framework contributes directly to creating safer work environments, reducing the likelihood of musculoskeletal injuries, and promoting overall worker well-being. Fourth, this study serves as an effort toward AI-aided worker training. By leveraging CV tools and real-time motion analysis, the framework offers a novel training interface. The automated accumulation and examination of motion information not only provide insights into the working postures of experienced workers but also offer a valuable resource for develo** AI-assisted training programs. This innovative approach would allow organizations to create tailored training modules that focus on efficient and safe work practices based on real-world movement patterns.

The rest of this paper is organized as follows. Section 2 will review state-of-the-art literature on worker behavioral analysis and data-science methods adopted in this field. Section 3 will provide the experimental designs and data collection procedure. Section 4 will elaborate on the technical details of the proposed framework. Section 5 will present the results and discuss new findings. Section 6 will wrap up the paper with concluding remarks and future directions.

II Literature Review

II-A Hierarchical Approaches to Learning Human Behavior

Recognizing human activities in videos is a challenging task that has been addressed by various approaches. Robertson and Reid (2006) [23] proposed a hierarchical approach that combined trajectory information and local motion descriptors. However, this method requires a large database and accurate motion evaluation, making it less practical. Bregler (1997) [24] utilized Expectation Maximization clustering, dynamical systems, and Hidden Markov Models to learn human dynamics in videos. Meanwhile, Duchenne et al. (2009) [25] introduced a weakly-supervised learning algorithm that extracts features from videos using scripts for weak supervision and clusters them based on feature similarity. A temporal action detector is then trained on these clusters. To detect moving humans, Han and Bhanu (2007) [26] employed a hierarchical genetic algorithm to find correspondence between colored and thermal images. Hoai et al. (2011) [27] introduced a framework for video segmentation and action recognition that leverages multi-class SVM and dynamic programming. Probabilistic methods were integrated into this domain by the initial frameworks [28]. Human video analysis faces a significant challenge due to the rapid movements of humans, which led to a substantial body of research focused on this aspect.

In our case, we use CV to extract feature landmarks in videos, treating it as a pose estimation problem.

II-B CV Approaches for Human Task Video Analysis

With the emergence of deep learning and GPU Computing, numerous architectures have been developed and utilized, enabling the flexibility to design networks tailored to specific tasks. Particularly in domains like human action recognition or human behavior understanding, various models have demonstrated cutting-edge performance [29, 30]. In the initial phases of deep learning, Ji et al. (2013) [31] used a 3D Convolutional Neural Network approach for recognizing human actions. This method is advantageous as it can extract spatial and temporal features, but it has a high computational cost during the training process. In contrast, Wang et al. (2018) [32] presented a Temporal Segment Networks architecture that samples segments out of videos, letting the model focus on important parts instead of the whole network to reduce redundancy. MVFNet [33] deals with the three-dimensional signals from video from different viewpoints to leverage the dynamics of video for recognizing activity. Charles et al. (2016) [34] proposed using a detector to annotate some frames in videos to detect posture. Kanazawa et al. (2019) [35] proposed a pre-trained detector and temporal encoder-based method to analyze videos for 3D humans. Recently, several models have been capable of detecting poses in real-time. Such models [17, 36, 37, 38] utilize deep learning architectures for pose estimation. Yu et al. (2018) [39] and Yu et al. (2019) [40] focused on action recognition and introduced the utilization of CV and insole pressure sensors for assessing the ergonomic workload of construction workers. In contrast, we suggest a technique for quantifying motion levels, assuming that the actions stem from a comparable distribution. In this paper, we propose leveraging a highly accurate model as the foundation for our techniques. Specifically, we opt for MediaPipe’s pose estimator [18], which employs BlazePose [17] to fulfill the task.

II-C Research Gaps in Worker Motion Analysis

Worker motion analysis plays a key role in ensuring safety and efficiency within diverse workplace environments. This section reviews the current state of literature, highlighting advancements, and, importantly, pinpointing the existing research gaps that motivate the proposed framework. Several studies have delved into the ergonomic aspects of worker motion, emphasizing the design of workspaces that accommodate the worker’s movements [7, 8]. Gouett’s (2010) [6] analysis of the continuous improvement of productivity in construction portrays the practicability of optimizing workflows to enhance worker productivity and well-being. While existing research has made significant strides in understanding worker motion about safety and ergonomics, there are notable gaps. Moran and Wallace (2007) [11] highlight the challenges in effectively monitoring and evaluating the physical performance of workers, indicating a need for more robust methodologies. Conventional observational methods, as discussed in Section 1, are acknowledged for their subjectivity and limitations. The literature recognizes the intrusive nature of direct measurement of internal causes, emphasizing the necessity for non-intrusive, yet effective, approaches [2, 3]. Moreover, the current literature lacks comprehensive solutions for real-time worker motion monitoring and analysis. Despite advancements in sensing technologies, there is still a gap in utilizing them to proactively mitigate risks in real-world work conditions [9]. The potential of AI-aided worker training, another identified gap, is hinted at by Ryu et al. (2018) [10], yet a comprehensive framework that integrates CV tools and real-time motion analysis is yet to be fully realized. In essence, the literature review exposes these gaps by illustrating the advancements made in understanding worker motion, ergonomic considerations, and the limitations of existing methodologies. The proposed framework aims to address these gaps by providing a sophisticated, real-time solution that integrates CV and statistical control techniques, setting a new standard for worker monitoring and safety evaluation.

III Data Collection and Description

TABLE I: Experimental Data Parameters
Parameters Description
Camera setup High-definition cameras at three angles
Frame rate 30 frames per second
Anonymization Protocols for data privacy and ethics compliance
Tasks 8 tasks that involve diverse working styles

The experimental design (see Table I) comprised three components, resulting in a total of eight tasks per participant (2 × 2 × 2). Participants engaged in tasks involving different sizes of objects, specifically, either large boxes (L) or small wooden cubes (S) (see Fig. 2). Additionally, they followed distinct techniques, either guided (G) with a numbering/alphabet system or unguided (U) with random placements. The participants performed actions involving the insertion (I) of objects into a cavity or box or the placement (P) of objects on top of a surface. The codes in parentheses (L, S, G, U, I, and P) were utilized as a standardized naming convention (see Fig. 2) for the corresponding videos associated with each combination of the variables. Video recordings from two healthy males (age 27 ± 1.41 years; height 1.74 ± 0.014 meters; arm length 0.70 ± 0.014 meters) were used in the study. From the task videos, 30 frames per second were retrieved and used for the processing. This amounts to around 5,000 frames of human posture estimation per participant. The Institutional Review Board (IRB) at Arizona State University approved the experimental protocol (STUDY 00016442). Before participating in the experiment, participants read and signed an IRB-approved informed consent form.

The experiment included guided and unguided techniques to test how explicit guidance affects participants’ performance. Guided tasks used a numbering system to see if structured instructions improved efficiency and accuracy. In contrast, unguided tasks relied on random placements to observe spontaneous decision-making processes and potential challenges without explicit guidance. The experimental space purposely did not have any obstacles to isolate the effects of object size, technique, and action type without other factors. This allowed for a clear examination and visual inclusion of the specific variables under investigation. Camera angles played an important role in capturing participants’ interactions with objects. Multiple angles were used to minimize potential blind spots and facilitating a nuanced analysis of participants’ task performance strategies. The videos recorded for each task were standardized in dimensions and resolution, and underwent detailed preprocessing (e.g., video time synchronization, denoising) for analysis.

IV Motion Amount Quantification and Data Analysis Methodology

IV-A Body Landmark Extraction with CV Tools

We utilize MediaPipe [18] for our CV-based analysis of landmark extraction from video frames. This process involves identifying and localizing key points, or landmarks, within the images or frames of a video sequence. These landmarks represent joint locations of interest across the upper and lower limbs. The difference in location between subsequent frames is used to calculate the motion amount. We perform this calculation as shown in equation (3) for steps (0, 2, 4) of frame progression. While the diversity of data points can be limited by choosing specific landmarks, we have observed increased sensitivity when using all the landmarks in our approach. We treat this data as a continuous stream of multivariate variate data as shown in equation (4) from the same distribution, as we are collecting data for a specific task, and the ideal movements should belong to the same distribution. Each video frame yields a feature vector representing the spatial information of the identified landmarks.

TABLE II: Landmark Points for S and L Tasks
Index Body Part Task Type
11 Left shoulder L
12 Right shoulder L
13 Left elbow S and L
14 Right elbow S and L
15 Left wrist S
16 Right wrist S
17 Left pinky S
18 Right pinky S
19 Left index S
20 Right index S
21 Left thumb S
22 Right thumb S
25 Left knee L
26 Right knee L
Refer to caption
Figure 2: (a) Standing assembly task with feet obscured. (b) Posture estimation with MediaPipe to detect obscured feet. (c) Sitting assembly task with trunk and lower limbs obscured. (d) Posture estimation with MediaPipe to detect obscured trunk and lower limbs. (e) L task, where the participant moves big cardboard boxes. (f) S task, where the participant moves small wooden cubes. (g) G task, where the participant follows an guided order as seen in the specific placement of the boxes. (h) U task, where the participant follows an unguided order as seen in the serial (random) placement of the boxes. (i) I task, where the participant inserts the box or cube inside the cavity. (j) P task, where the participant places the box or cube on the platform. The tasks involve guided or unguided insertion or placement of boxes and cubes.

The landmark points designated for S and L tasks (see Table II) are key to analyzing and understanding the specific joint movements involved in different assembly tasks. For S tasks, which involve more precision and detailed hand movements with smaller cubes, landmark points such as the left and right elbows (Indexes 13, 14), wrists (15, 16), and various finger joints including pinkies (17, 18), indexes (19, 20), and thumbs (21, 22) are used. These points are selected to capture the finer motor skills involved in handling smaller objects where accuracy plays a significant role. Conversely, L tasks, which are typically associated with broader movements involving larger boxes, utilize landmark points like the left and right shoulders (11, 12) and knees (25, 26). These points help in analyzing movements that require larger muscle groups and potentially involve lifting, carrying, or placing heavier items, thus focusing on the motion of larger limb movements that are important for tasks involving higher physical exertion. Understanding the differences in movement and strain between tasks involving small and large objects is vital for designing safer and more efficient work environments. By specifically targeting these different sets of landmark points, researchers and ergonomists can develop more targeted interventions aimed at reducing workplace injuries and improving task efficiency, tailored to the unique demands of tasks classified by the size and manipulation requirements of the task-based equipment involved.

To calculate the motion amount, we first define the position vector for each landmark in frame t𝑡titalic_t as shown in (1).

𝒗i,t=(xi,t,yi,t,zi,t)subscript𝒗𝑖𝑡subscript𝑥𝑖𝑡subscript𝑦𝑖𝑡subscript𝑧𝑖𝑡\boldsymbol{v}_{i,t}=(x_{i,t},y_{i,t},z_{i,t})bold_italic_v start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT = ( italic_x start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT ) (1)

where:

  • 𝒗i,tsubscript𝒗𝑖𝑡\boldsymbol{v}_{i,t}bold_italic_v start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT represents the position vector of landmark i𝑖iitalic_i at time of t𝑡titalic_t.

  • xi,tsubscript𝑥𝑖𝑡x_{i,t}italic_x start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT, yi,tsubscript𝑦𝑖𝑡y_{i,t}italic_y start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT, and zi,tsubscript𝑧𝑖𝑡z_{i,t}italic_z start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT are the Cartesian points of the landmark i𝑖iitalic_i in the 3D space at frame i𝑖iitalic_i.

The overall motion amount Mtsubscript𝑀𝑡M_{t}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in frame t𝑡titalic_t is computed as the sum of magnitudes of individual motions:

Mt=i𝒎i,tsubscript𝑀𝑡subscript𝑖normsubscript𝒎𝑖𝑡M_{t}=\sum_{i}\|\boldsymbol{m}_{i,t}\|italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ bold_italic_m start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT ∥ (2)

where:

  • Mtsubscript𝑀𝑡M_{t}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the total motion amount in frame t𝑡titalic_t.

  • 𝒎i,tsubscript𝒎𝑖𝑡\boldsymbol{m}_{i,t}bold_italic_m start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT is the motion vector for landmark i𝑖iitalic_i at frame t𝑡titalic_t, as shown in equation (3).

  • The sum is taken over all landmarks considered in the analysis.

The calculation of motion amount, as shown in equation (2) for each frame in a motion analysis study includes defining and computing vectors that represent the position and movement of designated landmarks on a subject’s body. To begin, we define the position vector for each landmark i𝑖iitalic_i within a specific frame t𝑡titalic_t using the Cartesian coordinates in three-dimensional space. The position vector, denoted as 𝒗i,tsubscript𝒗𝑖𝑡\boldsymbol{v}_{i,t}bold_italic_v start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT, is composed of xi,tsubscript𝑥𝑖𝑡x_{i,t}italic_x start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT, yi,tsubscript𝑦𝑖𝑡y_{i,t}italic_y start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT, and zi,tsubscript𝑧𝑖𝑡z_{i,t}italic_z start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT. These coordinates represent the landmark’s precise location along the x𝑥xitalic_x, y𝑦yitalic_y, and z𝑧zitalic_z axes, respectively, at any given frame t𝑡titalic_t. This approach allows for a detailed and quantifiable measurement of each landmark’s position within the recording space, capturing the nuances of human movement with high accuracy. The overall motion amount within each frame, represented by Mtsubscript𝑀𝑡M_{t}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, is computed as the sum of the magnitudes of individual motion vectors across all the landmarks being analyzed. Each motion vector, 𝒎i,tsubscript𝒎𝑖𝑡\boldsymbol{m}_{i,t}bold_italic_m start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT, signifies the change in position of a landmark i𝑖iitalic_i from one frame to the next. This vector denotes the extent of each landmark’s movement between consecutive frames. By summing the magnitudes of these motion vectors, Mtsubscript𝑀𝑡M_{t}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT effectively quantifies the total motion exhibited by all landmarks in the frame, providing the measure of the overall activity during the recorded task video’s frame sequence.

IV-B Motion Amount Monitoring and Warning

In this study, we propose the utilization of a control chart for systematic monitoring of worker demand statistics. Specifically, the Hotelling’s T2superscript𝑇2T^{2}italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT control chart [19] is recommended for its efficacy in tracking relevant features. This control chart incorporates a widely adopted upper control limits (UCL), serving as a benchmark for identifying outlier statistics indicative of a potential warning demand level. We utilize the Hotelling’s T2superscript𝑇2T^{2}italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT statistic for warnings based on the joint motion calculated, as further detailed in Section 3.3. Williams et al. (2006) [41] showed that using T2 statistics derived from the successive differences estimator results in an approximate distribution with higher accuracy for determining the UCL for individual observations during a Phase I analysis. This statistic provides a measure for monitoring the multivariate distribution of the selected features over the frames derived from the workers’ task videos.

The motion 𝒎i,tsubscript𝒎𝑖𝑡\boldsymbol{m}_{i,t}bold_italic_m start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT for each landmark is then quantified by the vector difference between consecutive frames:

𝒎i,t=𝒗i,t𝒗i,tstepsubscript𝒎𝑖𝑡subscript𝒗𝑖𝑡subscript𝒗𝑖𝑡𝑠𝑡𝑒𝑝\boldsymbol{m}_{i,t}=\boldsymbol{v}_{i,t}-\boldsymbol{v}_{i,t-step}bold_italic_m start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT = bold_italic_v start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT - bold_italic_v start_POSTSUBSCRIPT italic_i , italic_t - italic_s italic_t italic_e italic_p end_POSTSUBSCRIPT (3)

where:

  • 𝒎i,tsubscript𝒎𝑖𝑡\boldsymbol{m}_{i,t}bold_italic_m start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT is the motion vector of landmark i𝑖iitalic_i between frame tstep𝑡𝑠𝑡𝑒𝑝t-stepitalic_t - italic_s italic_t italic_e italic_p and frame t𝑡titalic_t.

  • 𝒗i,tsubscript𝒗𝑖𝑡\boldsymbol{v}_{i,t}bold_italic_v start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT and 𝒗i,tstepsubscript𝒗𝑖𝑡𝑠𝑡𝑒𝑝\boldsymbol{v}_{i,t-step}bold_italic_v start_POSTSUBSCRIPT italic_i , italic_t - italic_s italic_t italic_e italic_p end_POSTSUBSCRIPT are the position vectors of landmark i𝑖iitalic_i at frames t𝑡titalic_t and tstep𝑡𝑠𝑡𝑒𝑝t-stepitalic_t - italic_s italic_t italic_e italic_p, respectively.

In multivariate statistical quality control, the behavior of a continuous quality characteristic is represented by a multivariate normal distribution. The probability density function of the multivariate normal distribution is defined as:

f(𝒙i)=1(2π)p2|𝚺|12e12(𝒙i𝝁)T𝚺1(𝒙i𝝁)𝑓subscript𝒙𝑖1superscript2𝜋𝑝2superscript𝚺12superscript𝑒12superscriptsubscript𝒙𝑖𝝁𝑇superscript𝚺1subscript𝒙𝑖𝝁f(\boldsymbol{x}_{i})=\frac{1}{(2\pi)^{\frac{p}{2}}|\boldsymbol{\Sigma}|^{% \frac{1}{2}}}e^{-\frac{1}{2}(\boldsymbol{x}_{i}-\boldsymbol{\mu})^{T}% \boldsymbol{\Sigma}^{-1}(\boldsymbol{x}_{i}-\boldsymbol{\mu})}italic_f ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG ( 2 italic_π ) start_POSTSUPERSCRIPT divide start_ARG italic_p end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT | bold_Σ | start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_ARG italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_μ ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_μ ) end_POSTSUPERSCRIPT (4)
<𝒙i<,i=1,2,3,,pformulae-sequencesubscript𝒙𝑖𝑖123𝑝-\infty<\boldsymbol{x}_{i}<\infty,\quad i=1,2,3,\ldots,p- ∞ < bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < ∞ , italic_i = 1 , 2 , 3 , … , italic_p

where:

  • 𝒙𝒙\boldsymbol{x}bold_italic_x represents the vector of landmark point.

  • 𝝁𝝁\boldsymbol{\mu}bold_italic_μ is the mean vector, indicating the average value for each variable in the vector 𝒙𝒙\boldsymbol{x}bold_italic_x.

  • 𝚺𝚺\boldsymbol{\Sigma}bold_Σ is the covariance matrix, representing the covariance between each pair of elements within the vector 𝒙𝒙\boldsymbol{x}bold_italic_x.

  • p𝑝pitalic_p denotes the total number of landmark points.

  • (2π)p2|𝚺|12superscript2𝜋𝑝2superscript𝚺12(2\pi)^{\frac{p}{2}}|\boldsymbol{\Sigma}|^{\frac{1}{2}}( 2 italic_π ) start_POSTSUPERSCRIPT divide start_ARG italic_p end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT | bold_Σ | start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT is the normalization factor required for the probability density function to integrate to one.

We monitor p𝑝pitalic_p landmark points in each frame, represented as 𝑿=[𝒙1,𝒙2,,𝒙p]𝑿subscript𝒙1subscript𝒙2subscript𝒙𝑝\boldsymbol{X}=[\boldsymbol{x}_{1},\boldsymbol{x}_{2},\ldots,\boldsymbol{x}_{p}]bold_italic_X = [ bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_italic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ], focusing on landmarks that significantly contribute to body motion during tasks. For example, for monitoring hand movements, landmarks such as (x13,y13,z13),(x14,y14,z14),subscript𝑥13subscript𝑦13subscript𝑧13subscript𝑥14subscript𝑦14subscript𝑧14(x_{13},y_{13},z_{13}),(x_{14},y_{14},z_{14}),\ldots( italic_x start_POSTSUBSCRIPT 13 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 13 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 13 end_POSTSUBSCRIPT ) , ( italic_x start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT ) , … are considered.

IV-C Data Collection and Analysis

Consider the matrix 𝑿𝑿\boldsymbol{X}bold_italic_X as shown in equation (5) representing the positions of landmarks in a three-dimensional space over n𝑛nitalic_n frames, where each frame contains data for landmark indexes from 13 to 22. The matrix is defined as follows:

𝑿=[(x13,y13,z13)(x14,y14,z14)(x22,y22,z22)(x13,y13,z13)(x14,y14,z14)(x22,y22,z22)(x13,y13,z13)(x14,y14,z14)(x22,y22,z22)]n×p𝑿subscriptdelimited-[]subscript𝑥13subscript𝑦13subscript𝑧13subscript𝑥14subscript𝑦14subscript𝑧14subscript𝑥22subscript𝑦22subscript𝑧22subscript𝑥13subscript𝑦13subscript𝑧13subscript𝑥14subscript𝑦14subscript𝑧14subscript𝑥22subscript𝑦22subscript𝑧22subscript𝑥13subscript𝑦13subscript𝑧13subscript𝑥14subscript𝑦14subscript𝑧14subscript𝑥22subscript𝑦22subscript𝑧22𝑛𝑝\boldsymbol{X}=\left[\begin{array}[]{cccc}(x_{13},y_{13},z_{13})&(x_{14},y_{14% },z_{14})&\cdots&(x_{22},y_{22},z_{22})\\ (x_{13},y_{13},z_{13})&(x_{14},y_{14},z_{14})&\cdots&(x_{22},y_{22},z_{22})\\ \vdots&\vdots&\ddots&\vdots\\ (x_{13},y_{13},z_{13})&(x_{14},y_{14},z_{14})&\cdots&(x_{22},y_{22},z_{22})\\ \end{array}\right]_{n\times p}bold_italic_X = [ start_ARRAY start_ROW start_CELL ( italic_x start_POSTSUBSCRIPT 13 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 13 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 13 end_POSTSUBSCRIPT ) end_CELL start_CELL ( italic_x start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT ) end_CELL start_CELL ⋯ end_CELL start_CELL ( italic_x start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL ( italic_x start_POSTSUBSCRIPT 13 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 13 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 13 end_POSTSUBSCRIPT ) end_CELL start_CELL ( italic_x start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT ) end_CELL start_CELL ⋯ end_CELL start_CELL ( italic_x start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL ( italic_x start_POSTSUBSCRIPT 13 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 13 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 13 end_POSTSUBSCRIPT ) end_CELL start_CELL ( italic_x start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT ) end_CELL start_CELL ⋯ end_CELL start_CELL ( italic_x start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARRAY ] start_POSTSUBSCRIPT italic_n × italic_p end_POSTSUBSCRIPT

(5)

where each entry (xi,yi,zi)subscript𝑥𝑖subscript𝑦𝑖subscript𝑧𝑖(x_{i},y_{i},z_{i})( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) denotes the Cartesian coordinates of the i𝑖iitalic_i-th landmark in a given frame. The dimensions of the matrix are n×p𝑛𝑝n\times pitalic_n × italic_p, with n𝑛nitalic_n representing the number of frames and p𝑝pitalic_p representing the number of coordinates times the number of landmarks (from 13 to 22). The mean vector of the data is given in equation (6).

𝒎=1ni=1n𝒙i𝒎1𝑛superscriptsubscript𝑖1𝑛subscript𝒙𝑖\boldsymbol{m}=\frac{1}{n}\sum_{i=1}^{n}\boldsymbol{x}_{i}bold_italic_m = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (6)

where:

  • 𝒎𝒎\boldsymbol{m}bold_italic_m represents the sample mean vector.

  • 𝒙isubscript𝒙𝑖\boldsymbol{x}_{i}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are the landmark points detected.

  • n𝑛nitalic_n is the total number of frames.

The covariance matrix is defined as:

𝑺=1n1i=1n(𝒙i𝒎)(𝒙i𝒎)T𝑺1𝑛1superscriptsubscript𝑖1𝑛subscript𝒙𝑖𝒎superscriptsubscript𝒙𝑖𝒎𝑇\boldsymbol{S}=\frac{1}{n-1}\sum_{i=1}^{n}(\boldsymbol{x}_{i}-\boldsymbol{m})(% \boldsymbol{x}_{i}-\boldsymbol{m})^{T}bold_italic_S = divide start_ARG 1 end_ARG start_ARG italic_n - 1 end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_m ) ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_m ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT (7)

where:

  • 𝑺𝑺\boldsymbol{S}bold_italic_S denotes the covariance matrix.

  • 𝒙i𝒎subscript𝒙𝑖𝒎\boldsymbol{x}_{i}-\boldsymbol{m}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_m is the deviation of each data point from the mean.

  • n1𝑛1n-1italic_n - 1 is the degrees of freedom.

The monitoring statistic for each instance is calculated as:

ti=(𝒙i𝒙¯)𝑺1(𝒙i𝒙¯)Tsubscript𝑡𝑖subscript𝒙𝑖bold-¯𝒙superscript𝑺1superscriptsubscript𝒙𝑖bold-¯𝒙𝑇t_{i}=(\boldsymbol{x}_{i}-\boldsymbol{\bar{x}})\boldsymbol{S}^{-1}(\boldsymbol% {x}_{i}-\boldsymbol{\bar{x}})^{T}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - overbold_¯ start_ARG bold_italic_x end_ARG ) bold_italic_S start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - overbold_¯ start_ARG bold_italic_x end_ARG ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT (8)

where:

  • tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the monitoring statistic value for each landmark point i𝑖iitalic_i, assessing the worker motion’s extremeness or anomaly status relative to the distribution.

  • 𝒙isubscript𝒙𝑖\boldsymbol{x}_{i}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the value of the i𝑖iitalic_i-th landmark point.

  • 𝒙¯bold-¯𝒙\boldsymbol{\bar{x}}overbold_¯ start_ARG bold_italic_x end_ARG represents the sample mean.

  • 𝑺1superscript𝑺1\boldsymbol{S}^{-1}bold_italic_S start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT is the inverse of the covariance matrix, refer to equation (6).

Calculating a control value based on equation (8) for each instance in frames gives us the following vector:

𝑻=[t1t2tn]𝑻matrixsubscript𝑡1subscript𝑡2subscript𝑡𝑛\boldsymbol{T}=\begin{bmatrix}t_{1}\\ t_{2}\\ \vdots\\ t_{n}\end{bmatrix}bold_italic_T = [ start_ARG start_ROW start_CELL italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] (9)

where:

  • tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the monitoring statistic for the i𝑖iitalic_i-th instance.

  • n𝑛nitalic_n is the total number of frames.

It is not necessary to calculate the mean of the mean vectors since the data is not divided into subgroups. Thus, our control values are derived as above in equation (9). We now look forward to deriving control limits for Phase I, otherwise known as the retrospective analysis. The UCL and the lower control limits (LCL) are given by the equations (10) and (11) respectively.

UCL=p(n1)(np)f(x)UCL𝑝𝑛1𝑛𝑝𝑓𝑥\text{UCL}=\frac{p(n-1)}{(n-p)}f(x)UCL = divide start_ARG italic_p ( italic_n - 1 ) end_ARG start_ARG ( italic_n - italic_p ) end_ARG italic_f ( italic_x ) (10)
LCL=0LCL0\text{LCL}=0LCL = 0 (11)

where:

  • p𝑝pitalic_p is the number of landmark points.

  • n𝑛nitalic_n is the total number of frames.

  • f(x)𝑓𝑥f(x)italic_f ( italic_x ) is a function dependent on the distribution of the control chart data.

IV-D Control Limits

UCL and LCL are crucial for identifying significant variation decreases. The LCL is especially important as it helps us identify whether the process meets appropriate standards or if there is a significant decrease in variation. In the current case, the LCL represents the quality of the motion. The application of control chart techniques involves two phases: Phase I (offline) and Phase II (online) monitoring. To implement this approach, a lengthy video can be strategically partitioned into two parts. The first part is designated for offline training, which allows for the establishment of baseline patterns. The second part is allocated for online monitoring, enabling real-time evaluation of worker demand levels during operational activities.

IV-E Correlation Analysis to Validate Significant Motions

The analysis of motion data revealed that smaller objects (S tasks) involved higher Root Mean Square Deviations (RMSD) [42] due to more intricate and frequent movements, quantified by:

RMSDtask=1nt=1n(MtM¯)2subscriptRMSDtask1𝑛superscriptsubscript𝑡1𝑛superscriptsubscript𝑀𝑡¯𝑀2\text{RMSD}_{\text{task}}=\sqrt{\frac{1}{n}\sum_{t=1}^{n}(M_{t}-\overline{M})^% {2}}RMSD start_POSTSUBSCRIPT task end_POSTSUBSCRIPT = square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over¯ start_ARG italic_M end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (12)

where:

  • RMSDtasksubscriptRMSDtask\text{RMSD}_{\text{task}}RMSD start_POSTSUBSCRIPT task end_POSTSUBSCRIPT is the Root Mean Square Deviation for each task.

  • Mtsubscript𝑀𝑡M_{t}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the total motion amount in frame t𝑡titalic_t.

  • M¯¯𝑀\overline{M}over¯ start_ARG italic_M end_ARG is the mean motion amount across all frames.

  • n𝑛nitalic_n is the number of frames analyzed.

V Results

Refer to caption
Figure 3: Trajectories of joint motion based on landmark points in 3D space as a participant performed the four S tasks: (a) SUP (b) SGI (c) SGP (d) SUI.
Refer to caption
Figure 4: Trajectories of joint motion based on landmark points in 3D space as a participant performed the four L tasks: (a) LUP (b) LGI (c) LGP (d) LUI.
Refer to caption
Figure 5: Control charts for SGI: (a) Step 0 (b) Step 2 (c) Step 4 and SGP: (d) Step 0 (e) Step 2 (f) Step 4.
Refer to caption
Figure 6: Control charts for SUI: (a) Step 0 (b) Step 2 (c) Step 4 and SUP: (d) Step 0 (e) Step 2 (f) Step 4.
Refer to caption
Figure 7: Control charts for LGI: (a) Step 0 (b) Step 2 (c) Step 4 and LGP: (d) Step 0 (e) Step 2 (f) Step 4.
Refer to caption
Figure 8: Control charts for LUI: (a) Step 0 (b) Step 2 (c) Step 4 and LUP: (d) Step 0 (e) Step 2 (f) Step 4.

It was found that for S tasks, the RMSD, which is described in equation (12) for calculated aggregate joint motion amount (0.072 meters) was greater than that for L tasks (0.035 meters). This could be attributed to the lack of camera visibility and clarity in the definition of motion S tasks, which may have hindered the CV algorithm’s ability to efficiently decode posture change. The Pearson correlation coefficient (PCC) [43] was calculated for motion amount and the Hotelling’s T2superscript𝑇2T^{2}italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT statistic. It was found that this correlation was around 35% higher for S tasks than for L tasks. Perceived motion demand, rated on a scale of 0-100, was collected from the participants for all 8 tasks. The average for L and S tasks were 69.9 and 21.9, respectively. The perceived motion amount from the data indicates that the motion amount for L tasks was much larger than S tasks. This is supported by the trajectories plotted for the sample S (see Fig. 3) and L (see Fig. 4) tasks. For LGI, the landmark points (11, 12, 13, 14, 25, 26) show more deviation compared to SGI’s landmark points (13 through 22). The joint motion amount has been calculated, and subsequently used to calculate the Hotelling’s T2superscript𝑇2T^{2}italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Statistic between frames (for steps 0, 2, 4). In our study, motion trajectories have several purposes. They provide the data needed for the quantification of motion. By analyzing the trajectories, we measured the extent of movement for each task, including the total displacement, velocity, and acceleration of the landmark points. These metrics provide a more detailed understanding of the physical demands of each task. Additionally, motion trajectories plotted from videos showing standard human motions serve as benchmarks for evaluating future motion videos. By comparing the trajectories of new tasks to these benchmarks, we can assess the consistency and efficiency of task performance. This benchmarking is important for identifying deviations and areas for worker motion improvement. The trajectories also help in identifying specific movements that are pertinent to task performance. For example, in tasks requiring precision (S tasks, see Figure 3), the trajectories can highlight the smoothness and accuracy of hand movements. Similarly, for tasks involving larger body movements (L tasks, see Figure 4) the trajectories can reveal the coordination and balance required. Figure 3 presents the motion trajectories for the S tasks. The trajectories show the path of each landmark point over time, using which we analyzed the motion patterns in detail. By examining these trajectories, we can evaluate the consistency and precision of how the participant placed or inserted the cubes in the designated molds. The trajectories show the lifting motion, highlighting the elevation of the cube and the stabilization of the dominant upper limb. These details are important for assessing the ergonomic impact of the task. Figure 4 illustrates the motion trajectories for the L tasks. These tasks are characterized by their unique motion patterns to lift big boxes and insert or place them into each other. The box insertion task involves extending the arm and bending the body. The trajectories provide a clear visualization of the reach and the body’s balance. This analysis helps in understanding the ergonomic risks associated with the task. The analysis of motion trajectories is not limited to quantifying motion; it also serves as a diagnostic tool for identifying potential issues in task performance. By comparing the trajectories of different individuals performing the same task, we can detect variations and deviations that may indicate inefficiencies or risks. For instance, if the trajectories of a lifting task show significant variability between individuals, it may suggest differences in technique or strength. Addressing these variations through training or ergonomic interventions can improve task performance and reduce the risk of injury.

In the assessment of worker motion, quantitative analysis of velocity and motion amount provides information about worker behavior under different task conditions. At Step 0, the analysis includes every frame, showing a mean velocity (see Table IV) of approximately 0.0118 with a standard deviation of 0.0104, which indicates moderate variability in task initiation or modification. The highest velocity at this stage reaches about 0.1095, suggesting occasional rapid movements which may be necessary for dynamic tasks or sudden accelerations. When advancing to Step 2, the analysis steps over every second frame, leading to a mean velocity of 0.0032 and a standard deviation of 0.0041. At Step 4, where every fourth frame is analyzed, the mean velocity further decreases to approximately 0.0017 and the standard deviation to 0.0022. These trends indicate a reduction in fluctuations of observed velocity, which may be due to the less frequent analysis capturing fewer instances of rapid movement changes over longer intervals. At Step 0, the mean motion amount is about 0.0966 with a standard deviation of 0.0857, pointing to diverse and possibly strenuous movements. The maximum motion amount (see Table III) reaches up to 0.8997, indicating intense activity during this frame-by-frame analysis. At Step 2, the mean and standard deviation of motion amount decrease to 0.0525 and 0.0676 respectively, and by Step 4, these values adjust to a mean of 0.0564 and a standard deviation of 0.0718. This progression suggests a decrease in the variability and intensity of movements as the analysis skips more frames, reflecting more averaged and possibly subdued motion characteristics. At Step 2, a notable reduction in the standard deviation of acceleration (see Table V) to 0.0784 reflects a more controlled range of motion. This change is significant as it suggests an adjustment in task execution, potentially influenced by preliminary feedback or interventions aimed at reducing ergonomic risks. The moderation in both maximum and minimum acceleration values, now capped at 0.5529 and -0.6504 respectively, further supports the inference that task modifications or ergonomic interventions have been effective in mitigating extreme movements. These observed changes across the different steps indicate that step** over frames in the analysis can affect the detection of variability and extremes in motion data, potentially smoothing out short-term fluctuations and highlighting more gradual movement trends.

TABLE III: Statistics of Motion Amount Data at Different Steps
Statistic Step 0 Step 2 Step 4
Count 1,099 1,097 1,095
Mean 0.096586 0.052472 0.056417
Std. Deviation 0.085703 0.067579 0.071829
Minimum 0.001516 0.000032 0.000008
Maximum 0.899662 0.725200 0.801686
TABLE IV: Statistics of Velocity Data at Different Steps
Statistic Step 0 Step 2 Step 4
Count 1,099 1,097 1,095
Mean 0.011757 0.003193 0.001717
Std. Deviation 0.010432 0.004113 0.002186
Minimum 0.000185 0.000002 0.000000
Maximum 0.109509 0.044136 0.024396
TABLE V: Statistics of Acceleration Data at Different Steps
Statistic Step 0 Step 2 Step 4
Count 1,098 1,096 1,094
Mean 0.000681 0.000036 -0.000006
Std. Deviation 0.310759 0.078411 0.020015
Minimum -2.495035 -0.650380 -0.151858
Maximum 2.846712 0.552975 0.167224

The summary statistics derived from control chart data for the eight designated tasks which are SGI, SGP, SUI, SUP, LGI, LGP, LUI, and LUP, provide information about the joint motion dynamics associated with varying assembly tasks. These statistics (see Table VII) are particularly revealing when considering the impact of different variables such as object size (Large vs. Small), guidance (Guided vs. Unguided), and the nature of the activity (Insert vs. Place). The tasks involving small objects, which are SGI and SGP (see Fig. 5), SUI and SUP (see Fig. 6), the mean control chart values for motion, which range from 29.55 (SGI step 4) to 29.92 (SGP step 0), are consistently higher than those for large object tasks, which exhibit lower means such as 17.97 (LGP step 4) and 17.94 (LUP step 4). This indicates that small tasks, potentially due to their nature requiring precise movements, generally involve higher levels of activity. Additionally, the standard deviation for small tasks like SGP step 2, which is 16.34, suggests a high variability in how these tasks are performed by different workers, possibly due to individual techniques or task-specific challenges. In contrast, tasks involving large objects such as LGI and LGP (see Fig. 7), LUI and LUP (see Fig. 8), show some of the highest maximum motion values observed in the study, such as 392.99 (LGP step 2) and 231.44 (LGI step 0). These peaks might occur during actions that require significant effort or movement amplitude, such as lifting or carrying heavy items. These tasks also have substantial instances of threshold warnings, with LUI and LUP marking the highest counts at 39 and 30 respectively, across all steps. This could be indicative of critical ergonomic or safety issues that necessitate focused intervention to manage risks associated with high-motion activities. The frequency of threshold warnings is a pivotal metric in assessing potential ergonomic risks. Tasks like SGP (19 warnings at step 2) and LUI (39 warnings at step 0) that frequently exceed the motion thresholds could be flagged for ergonomic assessment and potential redesign to mitigate the risks associated with excessive or improper motions. These high frequencies suggest that these tasks either inherently involve risky movements or are being performed in a manner that deviates from ergonomic norms. The variability in motion, as quantified by the standard deviation, provides another layer of analysis. For instance, the standard deviation for LUI step 0 is notably higher at 14.18 compared to other tasks, highlighting significant variation in motion amount which could correlate with non-standardized task execution leading to increased physical strain and a higher risk of injury. Additionally, the comparison between guided (G) and unguided (U) tasks reveals significant differences in motion consistency and safety. Guided tasks, such as LGI with lower standard deviations and fewer threshold warnings, suggest that providing clear instructions or pathways can help reduce unnecessary motions, thereby optimizing performance and reducing ergonomic risk. Unguided tasks, such as LUI, show higher variability and more frequent threshold warnings, indicating that the lack of structured guidance leads to more varied and potentially unsafe worker motions. The data analyzed here not only aids in understanding the quantitative aspects of worker motions across different tasks but also highlights the critical need for targeted ergonomic interventions. By aligning the quantitative motion analysis with the specific task requirements and observed ergonomics, organizations can significantly improve productivity and worker safety.

TABLE VI: Statistics for Landmark Point Movement
Task Count Mean X Mean Y Mean Z Std Dev X Std Dev Y Std Dev Z Range X Range Y Range Z
SGP 668 0.611 0.527 0.023 0.011 0.020 0.047 0.595-0.645 0.473-0.548 -0.147-0.106
SGI 619 0.604 0.546 0.011 0.018 0.021 0.035 0.586-0.628 0.504-0.590 -0.110-0.056
SUI 668 0.609 0.523 0.015 0.012 0.022 0.050 0.590-0.630 0.480-0.570 -0.135-0.090
SUP 667 0.607 0.530 0.020 0.013 0.019 0.045 0.588-0.626 0.492-0.568 -0.130-0.100
LGI 600 0.534 0.378 -0.134 0.080 0.055 0.140 0.400-0.670 0.290-0.460 -0.350-0.210
LGP 600 0.537 0.385 -0.120 0.085 0.065 0.150 0.410-0.660 0.300-0.470 -0.360-0.220
LUI 600 0.532 0.370 -0.125 0.075 0.050 0.130 0.405-0.655 0.285-0.455 -0.340-0.200
LUP 605 0.529 0.365 -0.115 0.070 0.045 0.120 0.400-0.650 0.280-0.450 -0.330-0.190
TABLE VII: Statistics for Control Chart Data
Task Step Mean Median Max Min Std Dev Warnings
SGI 0 29.68 27.74 59.36 14.90 8.50 2
2 29.71 28.87 66.43 14.95 9.65 0
4 29.55 27.75 54.96 17.25 8.31 0
SGP 0 29.92 27.89 111.70 8.50 14.24 18
2 29.90 27.10 91.66 3.38 16.34 19
4 29.88 28.35 73.11 3.55 15.14 14
SUI 0 29.80 26.13 85.04 8.32 14.92 6
2 29.76 27.23 84.88 10.25 13.65 1
4 29.72 28.42 64.78 10.20 10.50 0
SUP 0 29.85 26.06 95.18 8.08 15.41 11
2 29.83 27.08 108.24 6.73 15.56 8
4 29.81 28.62 88.22 7.60 11.79 5
LGI 0 17.98 13.09 231.44 2.45 18.37 75
2 17.97 9.57 226.00 0.46 24.74 88
4 17.97 9.39 190.19 0.65 23.77 75
LGP 0 17.97 14.27 191.47 2.46 18.94 58
2 17.97 8.98 392.99 0.38 27.31 75
4 17.97 9.63 166.89 0.43 22.63 77
LUI 0 17.96 14.65 132.31 2.56 14.18 39
2 17.95 12.57 195.53 0.59 19.67 37
4 17.94 13.92 103.58 0.68 15.49 35
LUP 0 17.94 12.99 168.21 2.76 18.36 24
2 17.94 10.28 194.38 0.30 23.53 30
4 17.94 11.08 168.21 1.15 22.43 26

This ensures that the workplace is efficient and supportive of long-term health and safety objectives, facilitating the design of interventions that are both effective and tailored to the specific needs of the workspace. The tasks labeled SGP, SGI, SUI, SUP, LGI, LGP, LUI, and LUP provide a detailed compilation of 3D landmark point data (see Table VI), each offering unique insights into worker motion and ergonomics across various work settings. These tasks are categorized based on size, guidance, and action. The mean positions of landmark points vary significantly between the tasks, reflecting the different spatial demands of each activity. For small guided tasks like SGP and SGI, the mean X and Y coordinates are relatively high, suggesting that these tasks are performed close to the median workspace. Specifically, SGP shows mean positions of 0.611 for X and 0.527 for Y, while SGI shows 0.604 for X and 0.546 for Y. This indicates a concentration of movements around a central region, which is characteristic of tasks involving precise and controlled gestures. In contrast, large guided tasks such as LGI and LGP exhibit lower mean positions for X and Y coordinates. LGI has mean values of 0.534 for X and 0.378 for Y, while LGP has 0.537 for X and 0.385 for Y. These lower values suggest that the movements involved in large tasks are more dispersed and may require reaching further distances or navigating a broader area, particularly when interacting with larger objects. The variability in the positions of landmark points, as indicated by the standard deviations, differs across the tasks. For small unguided tasks such as SUI and SUP, the standard deviations for the X coordinate are relatively low, with SUI at 0.012 and SUP at 0.013. This low variability indicates that these tasks involve consistent horizontal movements. The standard deviations for the Y coordinate are slightly higher, with SUI at 0.022 and SUP at 0.019, highlighting moderate variability in vertical movements. Large unguided tasks such as LUI and LUP show higher standard deviations across all coordinates. For instance, LUI has a standard deviation of 0.075 for X, 0.050 for Y, and 0.130 for Z, indicating significant variability in all dimensions. LUP exhibits similar patterns with 0.070 for X, 0.045 for Y, and 0.120 for Z. This increased variability suggests that these tasks involve more extensive and varied movements, likely due to the complexity and physical demands of interacting with larger boxes. The range of movements, defined by the span between the minimum and maximum observed positions, further emphasizes the differences in task demands. For small guided tasks like SGP and SGI, the X coordinate ranges from approximately 0.586 to 0.645 and 0.586 to 0.628, respectively. The Y coordinate ranges from 0.473 to 0.548 for SGP and 0.504 to 0.590 for SGI. These ranges suggest that the movements in these tasks are relatively contained within a specific spatial boundary, reflecting controlled and repetitive actions. In comparison, large unguided tasks such as LGI and LUI show broader ranges. LGI ranges from 0.400 to 0.670 for X, 0.290 to 0.460 for Y, and -0.350 to -0.210 for Z. LUI ranges from 0.405 to 0.655 for X, 0.285 to 0.455 for Y, and -0.340 to -0.200 for Z. These broader ranges indicate that these tasks involve movements that cover a larger spatial area, which is consistent with the physical demands of larger tasks. Understanding the mean positions, variability, and range of movements in different tasks provides crucial insights into the ergonomic and safety considerations for each activity. Tasks with lower variability and contained ranges, such as small guided tasks (SGP, SGI), are likely to be less physically demanding and may pose fewer ergonomic risks. These tasks involve precise and repetitive motions, which can be managed with proper ergonomic design and interventions. Large unguided tasks (LGI, LUI) with higher variability and broader movement ranges require careful attention to safety protocols. The extensive and varied movements in these tasks suggest a higher risk of physical strain and potential for injury. Ergonomic interventions, such as adjustable support structures and proper training, are essential to mitigate these risks. The spatial characteristics of each task can inform the design of tools and workspaces. For instance, ensuring that tools and materials are within easy reach for small guided tasks can enhance efficiency and reduce physical strain. For large unguided tasks, designing workspaces that accommodate a wide range of movements and provide adequate support can improve safety and performance. The analysis of 3D landmark point movement statistics across various tasks highlights the distinct spatial and ergonomic demands of each activity. By analyzing the mean positions, standard deviations, and ranges, we can gain a deeper understanding of the movement patterns and potential risks associated with each task. These details are essential for develo** targeted ergonomic interventions, improving task design, and improving workplace safety and efficiency.

VI Conclusion and Future Work

The findings of this study can be useful for ergonomic practitioners and workplace designers [44] who aim to create environments that support healthy and efficient worker movements. By analyzing these tasks further, professionals can identify high-risk movements and design interventions tailored to the specific needs of the workplace. For example, if a particular task requires frequent reaching movements [45] that are identified as risky (noted from the Z-coordinate analysis), mechanical aids or rearranged workspace layouts could be introduced to reduce the need for such movements. Moreover, these motion tasks allow for the simulation of different workplace layouts and task assignments in a virtual environment [46], enabling the ergonomic assessment before physical changes are made to the workspace. Preemptive ergonomics can significantly reduce the risk of injury and improve productivity by designing workspaces with human motion in mind. Long-term tracking and analysis of motion patterns can help in monitoring the effectiveness of ergonomic solutions. Organizations that rely physical labor can continually assess motion data to see how changes in workspaces or processes affect worker movement and adjust their strategies accordingly. This ongoing evaluation is critical for physically demanding tasks or where high precision is required. Analyzing motion data from tasks not only helps understand current ergonomic conditions but also predicts and prevents potential future issues related to worker movement. A thorough examination of mean values, standard deviations, and ranges in all three coordinates provides a complete understanding of how workers interact with their environments, highlighting areas for improvement and intervention to ensure optimal worker health and productivity. The CV-based joint motion amount assessment is effective in identifying how workers perceive the motion demand for various tasks. The consistency observed in trajectories formed by the movements of landmark points validates this approach. However, the study has certain limitations. It was conducted retrospectively and with videos taken from a specific angle. Integrating videos taken from different angles can provide more comprehensive data for worker motion analysis. Video fusion is beyond the scope of this study. Future research will include designing more complicated human subject experiments and tasks to evaluate the CV method. In conclusion, the findings highlight the capability of the CV-based technique to provide reliable representations of motion and trajectories.

Acknowledgments

We are thankful for Boyang Xu’s participation in the project’s initialization and for Bryan Havens’s initial insights into data collection. N.M. acknowledges the Master’s Opportunity for Research in Engineering (MORE) program at ASU. We would also like to express our gratitude to the recruited participants who performed the experiments.

References

  • [1] X. Yang, Y. Yu, H. Li, X. Luo, and F. Wang, “Motion-based analysis for construction workers using biomechanical methods,” Frontiers of Engineering Management, vol. 4, no. 1, pp. 84–91, 2017.
  • [2] S. J. Ray and J. Teizer, “Real-time construction worker posture analysis for ergonomics training,” Advanced Engineering Informatics, vol. 26, no. 2, pp. 439–455, 2012.
  • [3] E. Valero, A. Sivanathan, F. Bosché, and M. Abdel-Wahab, “Analysis of construction trade worker body motions using a wearable and wireless motion sensor network,” Automation in Construction, vol. 83, pp. 48–55, 2017.
  • [4] A. Yassi, “Repetitive strain injuries,” The Lancet, vol. 349, no. 9056, pp. 943–947, 1997.
  • [5] S. Palikhe, M. Yirong, B. Y. Choi, and D.-E. Lee, “Analysis of musculoskeletal disorders and muscle stresses on construction workers’ awkward postures using simulation,” Sustainability, vol. 12, no. 14, p. 5693, 2020.
  • [6] M. C. Gouett, “Activity analysis for continuous productivity improvement in construction,” Master’s thesis, University of Waterloo, 2010.
  • [7] M. Bortolini, M. Faccio, M. Gamberi, and F. Pilati, “Motion analysis system (mas) for production and ergonomics assessment in the manufacturing processes,” Computers & Industrial Engineering, vol. 139, p. 105485, 2020.
  • [8] A. Golabchi, S. Han, J. Seo, S. Han, S. Lee, and M. Al-Hussein, “An automated biomechanical simulation approach to ergonomic job analysis for workplace design,” Journal of Construction Engineering and Management, vol. 141, no. 8, p. 04015020, 2015.
  • [9] H. Chen, X. Luo, Z. Zheng, and J. Ke, “A proactive workers’ safety risk evaluation framework based on position and posture data fusion,” Automation in Construction, vol. 98, pp. 275–288, 2019.
  • [10] J. Ryu, L. Zhang, C. T. Haas, and E. Abdel-Rahman, “Motion data based construction worker training support tool: Case study of masonry work,” in ISARC. Proceedings of the International Symposium on Automation and Robotics in Construction, vol. 35.   IAARC Publications, 2018, pp. 1–6.
  • [11] K. A. Moran and E. S. Wallace, “Eccentric loading and range of knee joint motion effects on performance enhancement in vertical jum**,” Human movement science, vol. 26, no. 6, pp. 824–840, 2007.
  • [12] R. Gonsalves and J. Teizer, “Human motion analysis using 3d range imaging technology,” in Int. Symp. on Automation and Robotics in Construction, 2009.
  • [13] W. Lao, J. Han, and P. H. De With, “Automatic video-based human motion analyzer for consumer surveillance system,” IEEE Transactions on Consumer Electronics, vol. 55, no. 2, pp. 591–598, 2009.
  • [14] R. Mehrizi, X. Peng, X. Xu, S. Zhang, D. Metaxas, and K. Li, “A computer vision based method for 3d posture estimation of symmetrical lifting,” Journal of biomechanics, vol. 69, pp. 40–46, 2018.
  • [15] B. Zheng, X. Jiang, G. Tien, A. Meneghetti, O. N. M. Panton, and M. S. Atkins, “Workload assessment of surgeons: correlation between nasa tlx and blinks,” Surgical endoscopy, vol. 26, no. 10, pp. 2746–2750, 2012.
  • [16] J. Seo, K. Yin, and S. Lee, “Automated postural ergonomic assessment using a computer vision-based posture classification,” in Construction research congress 2016, 2016, pp. 809–818.
  • [17] V. Bazarevsky, I. Grishchenko, K. Raveendran, T. Zhu, F. Zhang, and M. Grundmann, “Blazepose: On-device real-time body pose tracking,” arXiv preprint arXiv:2006.10204, 2020.
  • [18] C. Lugaresi, J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, F. Zhang, C.-L. Chang, M. G. Yong, J. Lee et al., “Mediapipe: A framework for building perception pipelines,” arXiv preprint arXiv:1906.08172, 2019.
  • [19] D. C. Montgomery, Introduction to statistical quality control.   John wiley & sons, 2019.
  • [20] M. Dicks, C. Button, and K. Davids, “Examination of gaze behaviors under in situ and video simulation task constraints reveals differences in information pickup for perception and action,” Attention, Perception, & Psychophysics, vol. 72, pp. 706–720, 2010.
  • [21] H. Cho, M. L. Komar, and D. Lindlbauer, “Realityreplay: Detecting and replaying temporal changes in situ using mixed reality,” Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 7, no. 3, pp. 1–25, 2023.
  • [22] P. Reipschläger, F. Brudy, R. Dachselt, J. Matejka, G. Fitzmaurice, and F. Anderson, “Avatar: An immersive analysis environment for human motion data combining interactive 3d avatars and trajectories,” in Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, 2022, pp. 1–15.
  • [23] N. Robertson and I. Reid, “A general method for human activity recognition in video,” Computer Vision and Image Understanding, vol. 104, no. 2-3, pp. 232–248, 2006.
  • [24] C. Bregler, “Learning and recognizing human dynamics in video sequences,” in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.   IEEE, 1997, pp. 568–574.
  • [25] O. Duchenne, I. Laptev, J. Sivic, F. Bach, and J. Ponce, “Automatic annotation of human actions in video,” in 2009 IEEE 12th International conference on computer vision.   IEEE, 2009, pp. 1491–1498.
  • [26] J. Han and B. Bhanu, “Fusion of color and infrared video for moving human detection,” Pattern Recognition, vol. 40, no. 6, pp. 1771–1784, 2007.
  • [27] M. Hoai, Z.-Z. Lan, and F. De la Torre, “Joint segmentation and classification of human actions in video,” in CVPR 2011.   IEEE, 2011, pp. 3265–3272.
  • [28] N. M. Oliver, B. Rosario, and A. P. Pentland, “A bayesian computer vision system for modeling human interactions,” IEEE transactions on pattern analysis and machine intelligence, vol. 22, no. 8, pp. 831–843, 2000.
  • [29] W. Huang, L. Zhang, W. Gao, F. Min, and J. He, “Shallow convolutional neural networks for human activity recognition using wearable sensors,” IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. 1–11, 2021.
  • [30] K. Maswadi, N. A. Ghani, S. Hamid, and M. B. Rasheed, “Human activity classification using decision tree and naive bayes classifiers,” Multimedia Tools and Applications, vol. 80, no. 14, pp. 21 709–21 726, 2021.
  • [31] S. Ji, W. Xu, M. Yang, and K. Yu, “3d convolutional neural networks for human action recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221–231, 2012.
  • [32] L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool, “Temporal segment networks for action recognition in videos,” IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 11, pp. 2740–2755, 2018.
  • [33] W. Wu, D. He, T. Lin, F. Li, C. Gan, and E. Ding, “Mvfnet: Multi-view fusion network for efficient video recognition,” in Proceedings of the AAAI conference on artificial intelligence, vol. 35, no. 4, 2021, pp. 2943–2951.
  • [34] J. Charles, T. Pfister, D. Magee, D. Hogg, and A. Zisserman, “Personalizing human video pose estimation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3063–3072.
  • [35] A. Kanazawa, J. Y. Zhang, P. Felsen, and J. Malik, “Learning 3d human dynamics from video,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5614–5623.
  • [36] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7291–7299.
  • [37] J. Carreira, P. Agrawal, K. Fragkiadaki, and J. Malik, “Human pose estimation with iterative error feedback,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4733–4742.
  • [38] A. Toshev and C. Szegedy, “Deeppose: Human pose estimation via deep neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1653–1660.
  • [39] Y. Yu, H. Li, X. Yang, and W. Umer, “Estimating construction workers’physical workload by fusing computer vision and smart insole technologies,” in ISARC. Proceedings of the International Symposium on Automation and Robotics in Construction, vol. 35.   IAARC Publications, 2018, pp. 1–8.
  • [40] Y. Yu, H. Li, W. Umer, C. Dong, X. Yang, M. Skitmore, and A. Y. Wong, “Automatic biomechanical workload estimation for construction workers by computer vision and smart insoles,” Journal of Computing in Civil Engineering, vol. 33, no. 3, p. 04019010, 2019.
  • [41] J. D. Williams, W. H. Woodall, J. B. Birch, and J. H. Sullivan, “Distribution of hotelling’s t 2 statistic based on the successive differences estimator,” Journal of Quality Technology, vol. 38, no. 3, pp. 217–229, 2006.
  • [42] O. Wada, T. Asai, Y. Hiyama, S. Nitta, and K. Mizuno, “Root mean square of lower trunk acceleration during walking in patients with unilateral total hip replacement,” Gait & posture, vol. 58, pp. 19–22, 2017.
  • [43] I. Cohen, Y. Huang, J. Chen, J. Benesty, J. Benesty, J. Chen, Y. Huang, and I. Cohen, “Pearson correlation coefficient,” Noise reduction in speech processing, pp. 1–4, 2009.
  • [44] R. Seim, O. Broberg, and V. Andersen, “Ergonomics in design processes: the journey from ergonomist toward workspace designer,” Human Factors and Ergonomics in Manufacturing & Service Industries, vol. 24, no. 6, pp. 656–670, 2014.
  • [45] A. d’Avella and F. Lacquaniti, “Control of reaching movements by muscle synergy combinations,” Frontiers in computational neuroscience, vol. 7, p. 42, 2013.
  • [46] H. Takemura and F. Kishino, “Cooperative work environment using virtual workspace,” in Proceedings of the 1992 ACM conference on Computer-supported cooperative work, 1992, pp. 226–232.