-
A Machine Learning Approach to Predicting Single Event Upsets
Authors:
Archit Gupta,
Chong Yock Eng,
Deon Lim Meng Wee,
Rashna Analia Ahmed,
See Min Sim
Abstract:
A single event upset (SEU) is a critical soft error that occurs in semiconductor devices on exposure to ionising particles from space environments. SEUs cause bit flips in the memory component of semiconductors. This creates a multitude of safety hazards as stored information becomes less reliable. Currently, SEUs are only detected several hours after their occurrence. CREMER, the model presented…
▽ More
A single event upset (SEU) is a critical soft error that occurs in semiconductor devices on exposure to ionising particles from space environments. SEUs cause bit flips in the memory component of semiconductors. This creates a multitude of safety hazards as stored information becomes less reliable. Currently, SEUs are only detected several hours after their occurrence. CREMER, the model presented in this paper, predicts SEUs in advance using machine learning. CREMER uses only positional data to predict SEU occurrence, making it robust, inexpensive and scalable. Upon implementation, the improved reliability of memory devices will create a digitally safer environment onboard space vehicles.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Masked Autoencoder for Unsupervised Video Summarization
Authors:
Minho Shim,
Taeoh Kim,
**hyung Kim,
Dongyoon Wee
Abstract:
Summarizing a video requires a diverse understanding of the video, ranging from recognizing scenes to evaluating how much each frame is essential enough to be selected as a summary. Self-supervised learning (SSL) is acknowledged for its robustness and flexibility to multiple downstream tasks, but the video SSL has not shown its value for dense understanding tasks like video summarization. We claim…
▽ More
Summarizing a video requires a diverse understanding of the video, ranging from recognizing scenes to evaluating how much each frame is essential enough to be selected as a summary. Self-supervised learning (SSL) is acknowledged for its robustness and flexibility to multiple downstream tasks, but the video SSL has not shown its value for dense understanding tasks like video summarization. We claim an unsupervised autoencoder with sufficient self-supervised learning does not need any extra downstream architecture design or fine-tuning weights to be utilized as a video summarization model. The proposed method to evaluate the importance score of each frame takes advantage of the reconstruction score of the autoencoder's decoder. We evaluate the method in major unsupervised video summarization benchmarks to show its effectiveness under various experimental settings.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection
Authors:
Pilhyeon Lee,
Taeoh Kim,
Minho Shim,
Dongyoon Wee,
Hyeran Byun
Abstract:
Temporal action detection aims to predict the time intervals and the classes of action instances in the video. Despite the promising performance, existing two-stream models exhibit slow inference speed due to their reliance on computationally expensive optical flow. In this paper, we introduce a decomposed cross-modal distillation framework to build a strong RGB-based detector by transferring know…
▽ More
Temporal action detection aims to predict the time intervals and the classes of action instances in the video. Despite the promising performance, existing two-stream models exhibit slow inference speed due to their reliance on computationally expensive optical flow. In this paper, we introduce a decomposed cross-modal distillation framework to build a strong RGB-based detector by transferring knowledge of the motion modality. Specifically, instead of direct distillation, we propose to separately learn RGB and motion representations, which are in turn combined to perform action localization. The dual-branch design and the asymmetric training objectives enable effective motion knowledge transfer while preserving RGB information intact. In addition, we introduce a local attentive fusion to better exploit the multimodal complementarity. It is designed to preserve the local discriminability of the features that is important for action localization. Extensive experiments on the benchmarks verify the effectiveness of the proposed method in enhancing RGB-based action detectors. Notably, our framework is agnostic to backbones and detection heads, bringing consistent gains across different model combinations.
△ Less
Submitted 30 March, 2023;
originally announced March 2023.
-
You Only Train Once: Multi-Identity Free-Viewpoint Neural Human Rendering from Monocular Videos
Authors:
Jaehyeok Kim,
Dongyoon Wee,
Dan Xu
Abstract:
We introduce You Only Train Once (YOTO), a dynamic human generation framework, which performs free-viewpoint rendering of different human identities with distinct motions, via only one-time training from monocular videos. Most prior works for the task require individualized optimization for each input video that contains a distinct human identity, leading to a significant amount of time and resour…
▽ More
We introduce You Only Train Once (YOTO), a dynamic human generation framework, which performs free-viewpoint rendering of different human identities with distinct motions, via only one-time training from monocular videos. Most prior works for the task require individualized optimization for each input video that contains a distinct human identity, leading to a significant amount of time and resources for the deployment, thereby impeding the scalability and the overall application potential of the system. In this paper, we tackle this problem by proposing a set of learnable identity codes to expand the capability of the framework for multi-identity free-viewpoint rendering, and an effective pose-conditioned code query mechanism to finely model the pose-dependent non-rigid motions. YOTO optimizes neural radiance fields (NeRF) by utilizing designed identity codes to condition the model for learning various canonical T-pose appearances in a single shared volumetric representation. Besides, our joint learning of multiple identities within a unified model incidentally enables flexible motion transfer in high-quality photo-realistic renderings for all learned appearances. This capability expands its potential use in important applications, including Virtual Reality. We present extensive experimental results on ZJU-MoCap and PeopleSnapshot to clearly demonstrate the effectiveness of our proposed model. YOTO shows state-of-the-art performance on all evaluation metrics while showing significant benefits in training and inference efficiency as well as rendering quality. The code and model will be made publicly available soon.
△ Less
Submitted 10 March, 2023;
originally announced March 2023.
-
MEEV: Body Mesh Estimation On Egocentric Video
Authors:
Nicolas Monet,
Dongyoon Wee
Abstract:
This technical report introduces our solution, MEEV, proposed to the EgoBody Challenge at ECCV 2022. Captured from head-mounted devices, the dataset consists of human body shape and motion of interacting people. The EgoBody dataset has challenges such as occluded body or blurry image. In order to overcome the challenges, MEEV is designed to exploit multiscale features for rich spatial information.…
▽ More
This technical report introduces our solution, MEEV, proposed to the EgoBody Challenge at ECCV 2022. Captured from head-mounted devices, the dataset consists of human body shape and motion of interacting people. The EgoBody dataset has challenges such as occluded body or blurry image. In order to overcome the challenges, MEEV is designed to exploit multiscale features for rich spatial information. Besides, to overcome the limited size of dataset, the model is pre-trained with the dataset aggregated 2D and 3D pose estimation datasets. Achieving 82.30 for MPJPE and 92.93 for MPVPE, MEEV has won the EgoBody Challenge at ECCV 2022, which shows the effectiveness of the proposed method. The code is available at https://github.com/clovaai/meev
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
Exploring Temporally Dynamic Data Augmentation for Video Recognition
Authors:
Taeoh Kim,
**hyung Kim,
Minho Shim,
Sangdoo Yun,
Myunggu Kang,
Dongyoon Wee,
Sangyoun Lee
Abstract:
Data augmentation has recently emerged as an essential component of modern training recipes for visual recognition tasks. However, data augmentation for video recognition has been rarely explored despite its effectiveness. Few existing augmentation recipes for video recognition naively extend the image augmentation methods by applying the same operations to the whole video frames. Our main idea is…
▽ More
Data augmentation has recently emerged as an essential component of modern training recipes for visual recognition tasks. However, data augmentation for video recognition has been rarely explored despite its effectiveness. Few existing augmentation recipes for video recognition naively extend the image augmentation methods by applying the same operations to the whole video frames. Our main idea is that the magnitude of augmentation operations for each frame needs to be changed over time to capture the real-world video's temporal variations. These variations should be generated as diverse as possible using fewer additional hyper-parameters during training. Through this motivation, we propose a simple yet effective video data augmentation framework, DynaAugment. The magnitude of augmentation operations on each frame is changed by an effective mechanism, Fourier Sampling that parameterizes diverse, smooth, and realistic temporal variations. DynaAugment also includes an extended search space suitable for video for automatic data augmentation methods. DynaAugment experimentally demonstrates that there are additional performance rooms to be improved from static augmentations on diverse video models. Specifically, we show the effectiveness of DynaAugment on various video datasets and tasks: large-scale video recognition (Kinetics-400 and Something-Something-v2), small-scale video recognition (UCF- 101 and HMDB-51), fine-grained video recognition (Diving-48 and FineGym), video action segmentation on Breakfast, video action localization on THUMOS'14, and video object detection on MOT17Det. DynaAugment also enables video models to learn more generalized representation to improve the model robustness on the corrupted videos.
△ Less
Submitted 30 June, 2022;
originally announced June 2022.
-
Out of Sight, Out of Mind: A Source-View-Wise Feature Aggregation for Multi-View Image-Based Rendering
Authors:
Geonho Cha,
Chaehun Shin,
Sungroh Yoon,
Dongyoon Wee
Abstract:
To estimate the volume density and color of a 3D point in the multi-view image-based rendering, a common approach is to inspect the consensus existence among the given source image features, which is one of the informative cues for the estimation procedure. To this end, most of the previous methods utilize equally-weighted aggregation features. However, this could make it hard to check the consens…
▽ More
To estimate the volume density and color of a 3D point in the multi-view image-based rendering, a common approach is to inspect the consensus existence among the given source image features, which is one of the informative cues for the estimation procedure. To this end, most of the previous methods utilize equally-weighted aggregation features. However, this could make it hard to check the consensus existence when some outliers, which frequently occur by occlusions, are included in the source image feature set. In this paper, we propose a novel source-view-wise feature aggregation method, which facilitates us to find out the consensus in a robust way by leveraging local structures in the feature set. We first calculate the source-view-wise distance distribution for each source feature for the proposed aggregation. After that, the distance distribution is converted to several similarity distributions with the proposed learnable similarity map** functions. Finally, for each element in the feature set, the aggregation features are extracted by calculating the weighted means and variances, where the weights are derived from the similarity distributions. In experiments, we validate the proposed method on various benchmark datasets, including synthetic and real image scenes. The experimental results demonstrate that incorporating the proposed features improves the performance by a large margin, resulting in the state-of-the-art performance.
△ Less
Submitted 10 June, 2022;
originally announced June 2022.
-
Self-Supervised Depth Estimation with Isometric-Self-Sample-Based Learning
Authors:
Geonho Cha,
Ho-Deok Jang,
Dongyoon Wee
Abstract:
Managing the dynamic regions in the photometric loss formulation has been a main issue for handling the self-supervised depth estimation problem. Most previous methods have alleviated this issue by removing the dynamic regions in the photometric loss formulation based on the masks estimated from another module, making it difficult to fully utilize the training images. In this paper, to handle this…
▽ More
Managing the dynamic regions in the photometric loss formulation has been a main issue for handling the self-supervised depth estimation problem. Most previous methods have alleviated this issue by removing the dynamic regions in the photometric loss formulation based on the masks estimated from another module, making it difficult to fully utilize the training images. In this paper, to handle this problem, we propose an isometric self-sample-based learning (ISSL) method to fully utilize the training images in a simple yet effective way. The proposed method provides additional supervision during training using self-generated images that comply with pure static scene assumption. Specifically, the isometric self-sample generator synthesizes self-samples for each training image by applying random rigid transformations on the estimated depth. Thus both the generated self-samples and the corresponding training image always follow the static scene assumption. We show that plugging our ISSL module into several existing models consistently improves the performance by a large margin. In addition, it also boosts the depth accuracy over different types of scene, i.e., outdoor scenes (KITTI and Make3D) and indoor scene (NYUv2), validating its high effectiveness.
△ Less
Submitted 20 May, 2022;
originally announced May 2022.
-
Detection Recovery in Online Multi-Object Tracking with Sparse Graph Tracker
Authors:
Jeongseok Hyun,
Myunggu Kang,
Dongyoon Wee,
Dit-Yan Yeung
Abstract:
In existing joint detection and tracking methods, pairwise relational features are used to match previous tracklets to current detections. However, the features may not be discriminative enough for a tracker to identify a target from a large number of detections. Selecting only high-scored detections for tracking may lead to missed detections whose confidence score is low. Consequently, in the onl…
▽ More
In existing joint detection and tracking methods, pairwise relational features are used to match previous tracklets to current detections. However, the features may not be discriminative enough for a tracker to identify a target from a large number of detections. Selecting only high-scored detections for tracking may lead to missed detections whose confidence score is low. Consequently, in the online setting, this results in disconnections of tracklets which cannot be recovered. In this regard, we present Sparse Graph Tracker (SGT), a novel online graph tracker using higher-order relational features which are more discriminative by aggregating the features of neighboring detections and their relations. SGT converts video data into a graph where detections, their connections, and the relational features of two connected nodes are represented by nodes, edges, and edge features, respectively. The strong edge features allow SGT to track targets with tracking candidates selected by top-K scored detections with large K. As a result, even low-scored detections can be tracked, and the missed detections are also recovered. The robustness of K value is shown through the extensive experiments. In the MOT16/17/20 and HiEve Challenge, SGT outperforms the state-of-the-art trackers with real-time inference speed. Especially, a large improvement in MOTA is shown in the MOT20 and HiEve Challenge. Code is available at https://github.com/HYUNJS/SGT.
△ Less
Submitted 19 September, 2023; v1 submitted 2 May, 2022;
originally announced May 2022.
-
Frequency Selective Augmentation for Video Representation Learning
Authors:
**hyung Kim,
Taeoh Kim,
Minho Shim,
Dongyoon Han,
Dongyoon Wee,
Junmo Kim
Abstract:
Recent self-supervised video representation learning methods focus on maximizing the similarity between multiple augmented views from the same video and largely rely on the quality of generated views. However, most existing methods lack a mechanism to prevent representation learning from bias towards static information in the video. In this paper, we propose frequency augmentation (FreqAug), a spa…
▽ More
Recent self-supervised video representation learning methods focus on maximizing the similarity between multiple augmented views from the same video and largely rely on the quality of generated views. However, most existing methods lack a mechanism to prevent representation learning from bias towards static information in the video. In this paper, we propose frequency augmentation (FreqAug), a spatio-temporal data augmentation method in the frequency domain for video representation learning. FreqAug stochastically removes specific frequency components from the video so that learned representation captures essential features more from the remaining information for various downstream tasks. Specifically, FreqAug pushes the model to focus more on dynamic features rather than static features in the video via drop** spatial or temporal low-frequency components. To verify the generality of the proposed method, we experiment with FreqAug on multiple self-supervised learning frameworks along with standard augmentations. Transferring the improved representation to five video action recognition and two temporal action localization downstream tasks shows consistent improvements over baselines.
△ Less
Submitted 6 December, 2022; v1 submitted 8 April, 2022;
originally announced April 2022.
-
Electronic, vibrational and transport properties of pnictogen substituted ternary skutterudites
Authors:
Dmitri Volja,
Boris Kozinsky,
An Li,
Daehyun Wee,
Nicola Marzari,
Marco Fornari
Abstract:
First principles calculations are used to investigate electronic band structure and vibrational spectra of pnictogen substituted ternary skutterudites. We compare the results with the prototypical binary composition CoSb$_3$ to identify the effects of substitutions on the Sb site, and evaluate the potential of ternary skutterudites for thermoelectric applications. Electronic transport coefficients…
▽ More
First principles calculations are used to investigate electronic band structure and vibrational spectra of pnictogen substituted ternary skutterudites. We compare the results with the prototypical binary composition CoSb$_3$ to identify the effects of substitutions on the Sb site, and evaluate the potential of ternary skutterudites for thermoelectric applications. Electronic transport coefficients are computed within the Boltzmann transport formalism assuming a constant relaxation time, using a new methodology based on maximally localized Wannier function interpolation. Our results point to a large sensitivity of the electronic transport coefficients to carrier concentration and to scattering mechanisms associated with the enhanced polarity. The ionic character of the bonds is used to explain the detrimental effect on the thermoelectric properties.
△ Less
Submitted 7 December, 2011;
originally announced December 2011.