-
Quater-GCN: Enhancing 3D Human Pose Estimation with Orientation and Semi-supervised Training
Authors:
Xingyu Song,
Zhan Li,
Shi Chen,
Kazuyuki Demachi
Abstract:
3D human pose estimation is a vital task in computer vision, involving the prediction of human joint positions from images or videos to reconstruct a skeleton of a human in three-dimensional space. This technology is pivotal in various fields, including animation, security, human-computer interaction, and automotive safety, where it promotes both technological progress and enhanced human well-bein…
▽ More
3D human pose estimation is a vital task in computer vision, involving the prediction of human joint positions from images or videos to reconstruct a skeleton of a human in three-dimensional space. This technology is pivotal in various fields, including animation, security, human-computer interaction, and automotive safety, where it promotes both technological progress and enhanced human well-being. The advent of deep learning significantly advances the performance of 3D pose estimation by incorporating temporal information for predicting the spatial positions of human joints. However, traditional methods often fall short as they primarily focus on the spatial coordinates of joints and overlook the orientation and rotation of the connecting bones, which are crucial for a comprehensive understanding of human pose in 3D space. To address these limitations, we introduce Quater-GCN (Q-GCN), a directed graph convolutional network tailored to enhance pose estimation by orientation. Q-GCN excels by not only capturing the spatial dependencies among node joints through their coordinates but also integrating the dynamic context of bone rotations in 2D space. This approach enables a more sophisticated representation of human poses by also regressing the orientation of each bone in 3D space, moving beyond mere coordinate prediction. Furthermore, we complement our model with a semi-supervised training strategy that leverages unlabeled data, addressing the challenge of limited orientation ground truth data. Through comprehensive evaluations, Q-GCN has demonstrated outstanding performance against current state-of-the-art methods.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
An Animation-based Augmentation Approach for Action Recognition from Discontinuous Video
Authors:
Xingyu Song,
Zhan Li,
Shi Chen,
Xin-Qiang Cai,
Kazuyuki Demachi
Abstract:
Action recognition, an essential component of computer vision, plays a pivotal role in multiple applications. Despite significant improvements brought by Convolutional Neural Networks (CNNs), these models suffer performance declines when trained with discontinuous video frames, which is a frequent scenario in real-world settings. This decline primarily results from the loss of temporal continuity,…
▽ More
Action recognition, an essential component of computer vision, plays a pivotal role in multiple applications. Despite significant improvements brought by Convolutional Neural Networks (CNNs), these models suffer performance declines when trained with discontinuous video frames, which is a frequent scenario in real-world settings. This decline primarily results from the loss of temporal continuity, which is crucial for understanding the semantics of human actions. To overcome this issue, we introduce the 4A (Action Animation-based Augmentation Approach) pipeline, which employs a series of sophisticated techniques: starting with 2D human pose estimation from RGB videos, followed by Quaternion-based Graph Convolution Network for joint orientation and trajectory prediction, and Dynamic Skeletal Interpolation for creating smoother, diversified actions using game engine technology. This innovative approach generates realistic animations in varied game environments, viewed from multiple viewpoints. In this way, our method effectively bridges the domain gap between virtual and real-world data. In experimental evaluations, the 4A pipeline achieves comparable or even superior performance to traditional training approaches using real-world data, while requiring only 10% of the original data volume. Additionally, our approach demonstrates enhanced performance on In-the-wild videos, marking a significant advancement in the field of action recognition.
△ Less
Submitted 30 April, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
Respiratory motion forecasting with online learning of recurrent neural networks for safety enhancement in externally guided radiotherapy
Authors:
Michel Pohl,
Mitsuru Uesaka,
Hiroyuki Takahashi,
Kazuyuki Demachi,
Ritu Bhusal Chhatkuli
Abstract:
In lung radiotherapy, infrared cameras can record the location of reflective objects on the chest to infer the position of the tumor moving due to breathing, but treatment system latencies hinder radiation beam precision. Real-time recurrent learning (RTRL), is a potential solution as it can learn patterns within non-stationary respiratory data but has high complexity. This study assesses the capa…
▽ More
In lung radiotherapy, infrared cameras can record the location of reflective objects on the chest to infer the position of the tumor moving due to breathing, but treatment system latencies hinder radiation beam precision. Real-time recurrent learning (RTRL), is a potential solution as it can learn patterns within non-stationary respiratory data but has high complexity. This study assesses the capabilities of resource-efficient online RNN algorithms, namely unbiased online recurrent optimization (UORO), sparse-1 step approximation (SnAp-1), and decoupled neural interfaces (DNI) to forecast respiratory motion during radiotherapy treatment accurately. We use time series containing the 3D position of external markers on the chest of healthy subjects. We propose efficient implementations for SnAp-1 and DNI based on compression of the influence and immediate Jacobian matrices and an accurate update of the linear coefficients used in credit assignment estimation, respectively. The original sampling frequency was 10Hz; we performed resampling at 3.33Hz and 30Hz. We use UORO, SnAp-1, and DNI to forecast each marker's 3D position with horizons (the time interval in advance for which the prediction is made) h<=2.1s and compare them with RTRL, least mean squares, and linear regression. RNNs trained online achieved similar or better accuracy than most previous works using larger training databases and deep learning, even though we used only the first minute of each sequence to predict motion within that exact sequence. SnAp-1 had the lowest normalized root mean square errors (nRMSE) averaged over the horizon values considered, equal to 0.335 and 0.157, at 3.33Hz and 10.0Hz, respectively. Similarly, UORO had the highest accuracy at 30Hz, with an nRMSE of 0.0897. DNI's inference time, equal to 6.8ms per time step at 30Hz (Intel Core i7-13700 CPU), was the lowest among the RNN methods examined.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
GTAutoAct: An Automatic Datasets Generation Framework Based on Game Engine Redevelopment for Action Recognition
Authors:
Xingyu Song,
Zhan Li,
Shi Chen,
Kazuyuki Demachi
Abstract:
Current datasets for action recognition tasks face limitations stemming from traditional collection and generation methods, including the constrained range of action classes, absence of multi-viewpoint recordings, limited diversity, poor video quality, and labor-intensive manually collection. To address these challenges, we introduce GTAutoAct, a innovative dataset generation framework leveraging…
▽ More
Current datasets for action recognition tasks face limitations stemming from traditional collection and generation methods, including the constrained range of action classes, absence of multi-viewpoint recordings, limited diversity, poor video quality, and labor-intensive manually collection. To address these challenges, we introduce GTAutoAct, a innovative dataset generation framework leveraging game engine technology to facilitate advancements in action recognition. GTAutoAct excels in automatically creating large-scale, well-annotated datasets with extensive action classes and superior video quality. Our framework's distinctive contributions encompass: (1) it innovatively transforms readily available coordinate-based 3D human motion into rotation-orientated representation with enhanced suitability in multiple viewpoints; (2) it employs dynamic segmentation and interpolation of rotation sequences to create smooth and realistic animations of action; (3) it offers extensively customizable animation scenes; (4) it implements an autonomous video capture and processing pipeline, featuring a randomly navigating camera, with auto-trimming and labeling functionalities. Experimental results underscore the framework's robustness and highlights its potential to significantly improve action recognition model training.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Prediction of the motion of chest internal points using a recurrent neural network trained with real-time recurrent learning for latency compensation in lung cancer radiotherapy
Authors:
Michel Pohl,
Mitsuru Uesaka,
Kazuyuki Demachi,
Ritu Bhusal Chhatkuli
Abstract:
During the radiotherapy treatment of patients with lung cancer, the radiation delivered to healthy tissue around the tumor needs to be minimized, which is difficult because of respiratory motion and the latency of linear accelerator systems. In the proposed study, we first use the Lucas-Kanade pyramidal optical flow algorithm to perform deformable image registration of chest computed tomography sc…
▽ More
During the radiotherapy treatment of patients with lung cancer, the radiation delivered to healthy tissue around the tumor needs to be minimized, which is difficult because of respiratory motion and the latency of linear accelerator systems. In the proposed study, we first use the Lucas-Kanade pyramidal optical flow algorithm to perform deformable image registration of chest computed tomography scan images of four patients with lung cancer. We then track three internal points close to the lung tumor based on the previously computed deformation field and predict their position with a recurrent neural network (RNN) trained using real-time recurrent learning (RTRL) and gradient clip**. The breathing data is quite regular, sampled at approximately 2.5Hz, and includes artificial drift in the spine direction. The amplitude of the motion of the tracked points ranged from 12.0mm to 22.7mm. Finally, we propose a simple method for recovering and predicting 3D tumor images from the tracked points and the initial tumor image based on a linear correspondence model and Nadaraya-Watson non-linear regression. The root-mean-square error, maximum error, and jitter corresponding to the RNN prediction on the test set were smaller than the same performance measures obtained with linear prediction and least mean squares (LMS). In particular, the maximum prediction error associated with the RNN, equal to 1.51mm, is respectively 16.1% and 5.0% lower than the maximum error associated with linear prediction and LMS. The average prediction time per time step with RTRL is equal to 119ms, which is less than the 400ms marker position sampling time. The tumor position in the predicted images appears visually correct, which is confirmed by the high mean cross-correlation between the original and predicted images, equal to 0.955.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
Cross-Task Consistency Learning Framework for Multi-Task Learning
Authors:
Akihiro Nakano,
Shi Chen,
Kazuyuki Demachi
Abstract:
Multi-task learning (MTL) is an active field in deep learning in which we train a model to jointly learn multiple tasks by exploiting relationships between the tasks. It has been shown that MTL helps the model share the learned features between tasks and enhance predictions compared to when learning each task independently. We propose a new learning framework for 2-task MTL problem that uses the p…
▽ More
Multi-task learning (MTL) is an active field in deep learning in which we train a model to jointly learn multiple tasks by exploiting relationships between the tasks. It has been shown that MTL helps the model share the learned features between tasks and enhance predictions compared to when learning each task independently. We propose a new learning framework for 2-task MTL problem that uses the predictions of one task as inputs to another network to predict the other task. We define two new loss terms inspired by cycle-consistency loss and contrastive learning, alignment loss and cross-task consistency loss. Both losses are designed to enforce the model to align the predictions of multiple tasks so that the model predicts consistently. We theoretically prove that both losses help the model learn more efficiently and that cross-task consistency loss is better in terms of alignment with the straight-forward predictions. Experimental results also show that our proposed model achieves significant performance on the benchmark Cityscapes and NYU dataset.
△ Less
Submitted 28 November, 2021;
originally announced November 2021.
-
Prediction of the Position of External Markers Using a Recurrent Neural Network Trained With Unbiased Online Recurrent Optimization for Safe Lung Cancer Radiotherapy
Authors:
Michel Pohl,
Mitsuru Uesaka,
Hiroyuki Takahashi,
Kazuyuki Demachi,
Ritu Bhusal Chhatkuli
Abstract:
During lung radiotherapy, the position of infrared reflective objects on the chest can be recorded to estimate the tumor location. However, radiotherapy systems have a latency inherent to robot control limitations that impedes the radiation delivery precision. Prediction with online learning of recurrent neural networks (RNN) allows for adaptation to non-stationary respiratory signals, but classic…
▽ More
During lung radiotherapy, the position of infrared reflective objects on the chest can be recorded to estimate the tumor location. However, radiotherapy systems have a latency inherent to robot control limitations that impedes the radiation delivery precision. Prediction with online learning of recurrent neural networks (RNN) allows for adaptation to non-stationary respiratory signals, but classical methods such as RTRL and truncated BPTT are respectively slow and biased. This study investigates the capabilities of unbiased online recurrent optimization (UORO) to forecast respiratory motion and enhance safety in lung radiotherapy.
We used 9 observation records of the 3D position of 3 external markers on the chest and abdomen of healthy individuals breathing during intervals from 73s to 222s. The sampling frequency was 10Hz, and the amplitudes of the recorded trajectories range from 6mm to 40mm in the superior-inferior direction. We forecast the 3D location of each marker simultaneously with a horizon value between 0.1s and 2.0s, using an RNN trained with UORO. We compare its performance with an RNN trained with RTRL, LMS, and offline linear regression. We provide closed-form expressions for quantities involved in the loss gradient calculation in UORO, thereby making its implementation efficient. Training and cross-validation were performed during the first minute of each sequence.
On average over the horizon values considered and the 9 sequences, UORO achieves the lowest root-mean-square (RMS) error and maximum error among the compared algorithms. These errors are respectively equal to 1.3mm and 8.8mm, and the prediction time per time step was lower than 2.8ms (Dell Intel core i9-9900K 3.60 GHz). Linear regression has the lowest RMS error for the horizon values 0.1s and 0.2s, followed by LMS for horizon values between 0.3s and 0.5s, and UORO for horizon values greater than 0.6s.
△ Less
Submitted 21 November, 2022; v1 submitted 2 June, 2021;
originally announced June 2021.