-
The Continuous Tensor Abstraction: Where Indices are Real
Authors:
Jaeyeon Won,
Willow Ahrens,
Joel S. Emer,
Saman Amarasinghe
Abstract:
This paper introduces the continuous tensor abstraction, allowing indices to take real-number values (e.g., A[3.14]), and provides a continuous loop construct that iterates over the infinitely large set of real numbers. This paper expands the existing tensor abstraction to include continuous tensors that exhibit a piecewise-constant property, enabling the transformation of an infinite amount of co…
▽ More
This paper introduces the continuous tensor abstraction, allowing indices to take real-number values (e.g., A[3.14]), and provides a continuous loop construct that iterates over the infinitely large set of real numbers. This paper expands the existing tensor abstraction to include continuous tensors that exhibit a piecewise-constant property, enabling the transformation of an infinite amount of computation into a finite amount. Additionally, we present a new tensor format abstraction for storing continuous tensors and a code generation technique that automatically generates kernels for the continuous tensor abstraction. Our approach introduces a novel method for loop-level reasoning in domains like computational geometry and computer graphics, traditionally unexplored in tensor programming models. Our approach demonstrates comparable performance to hand-optimized kernels in leading libraries across diverse applications. Compared to hand-implemented libraries on a CPU, our compiler-based implementation achieves an average speedup of 9.20x on 2D radius search with ~100x fewer lines of code (LoC), 1.22x on genomic interval overlap** queries (with ~26x LoC saving), and 1.69x on trilinear interpolation in Neural Radiance Field (with ~9x LoC saving).
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
A Spatiotemporal Illumination Model for 3D Image Fusion in Optical Coherence Tomography
Authors:
Stefan Ploner,
Jungeun Won,
Julia Schottenhamml,
Jessica Girgis,
Kenneth Lam,
Nadia Waheed,
James Fujimoto,
Andreas Maier
Abstract:
Optical coherence tomography (OCT) is a non-invasive, micrometer-scale imaging modality that has become a clinical standard in ophthalmology. By raster-scanning the retina, sequential cross-sectional image slices are acquired to generate volumetric data. In-vivo imaging suffers from discontinuities between slices that show up as motion and illumination artifacts. We present a new illumination mode…
▽ More
Optical coherence tomography (OCT) is a non-invasive, micrometer-scale imaging modality that has become a clinical standard in ophthalmology. By raster-scanning the retina, sequential cross-sectional image slices are acquired to generate volumetric data. In-vivo imaging suffers from discontinuities between slices that show up as motion and illumination artifacts. We present a new illumination model that exploits continuity in orthogonally raster-scanned volume data. Our novel spatiotemporal parametrization adheres to illumination continuity both temporally, along the imaged slices, as well as spatially, in the transverse directions. Yet, our formulation does not make inter-slice assumptions, which could have discontinuities. This is the first optimization of a 3D inverse model in an image reconstruction context in OCT. Evaluation in 68 volumes from eyes with pathology showed reduction of illumination artifacts in 88\% of the data, and only 6\% showed moderate residual illumination artifacts. The method enables the use of forward-warped motion corrected data, which is more accurate, and enables supersampling and advanced 3D image reconstruction in OCT.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Leveraging Human-Machine Interactions for Computer Vision Dataset Quality Enhancement
Authors:
Esla Timothy Anzaku,
Hyesoo Hong,
**-Woo Park,
Wonjun Yang,
Kangmin Kim,
JongBum Won,
Deshika Vinoshani Kumari Herath,
Arnout Van Messem,
Wesley De Neve
Abstract:
Large-scale datasets for single-label multi-class classification, such as \emph{ImageNet-1k}, have been instrumental in advancing deep learning and computer vision. However, a critical and often understudied aspect is the comprehensive quality assessment of these datasets, especially regarding potential multi-label annotation errors. In this paper, we introduce a lightweight, user-friendly, and sc…
▽ More
Large-scale datasets for single-label multi-class classification, such as \emph{ImageNet-1k}, have been instrumental in advancing deep learning and computer vision. However, a critical and often understudied aspect is the comprehensive quality assessment of these datasets, especially regarding potential multi-label annotation errors. In this paper, we introduce a lightweight, user-friendly, and scalable framework that synergizes human and machine intelligence for efficient dataset validation and quality enhancement. We term this novel framework \emph{Multilabelfy}. Central to Multilabelfy is an adaptable web-based platform that systematically guides annotators through the re-evaluation process, effectively leveraging human-machine interactions to enhance dataset quality. By using Multilabelfy on the ImageNetV2 dataset, we found that approximately $47.88\%$ of the images contained at least two labels, underscoring the need for more rigorous assessments of such influential datasets. Furthermore, our analysis showed a negative correlation between the number of potential labels per image and model top-1 accuracy, illuminating a crucial factor in model evaluation and selection. Our open-source framework, Multilabelfy, offers a convenient, lightweight solution for dataset enhancement, emphasizing multi-label proportions. This study tackles major challenges in dataset integrity and provides key insights into model performance evaluation. Moreover, it underscores the advantages of integrating human expertise with machine capabilities to produce more robust models and trustworthy data development. The source code for Multilabelfy will be available at https://github.com/esla/Multilabelfy.
\keywords{Computer Vision \and Dataset Quality Enhancement \and Dataset Validation \and Human-Computer Interaction \and Multi-label Annotation.}
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
On the Correctness of the Generalized Isotonic Recursive Partitioning Algorithm
Authors:
Joong-Ho Won,
Jihan Jung
Abstract:
This paper presents an in-depth analysis of the generalized isotonic recursive partitioning (GIRP) algorithm for fitting isotonic models under separable convex losses, proposed by Luss and Rosset [J. Comput. Graph. Statist., 23 (2014), pp. 192--201] for differentiable losses and extended by Painsky and Rosset [IEEE Trans. Pattern Anal. Mach. Intell., 38 (2016), pp. 308-321] for nondifferentiable l…
▽ More
This paper presents an in-depth analysis of the generalized isotonic recursive partitioning (GIRP) algorithm for fitting isotonic models under separable convex losses, proposed by Luss and Rosset [J. Comput. Graph. Statist., 23 (2014), pp. 192--201] for differentiable losses and extended by Painsky and Rosset [IEEE Trans. Pattern Anal. Mach. Intell., 38 (2016), pp. 308-321] for nondifferentiable losses. The GIRP algorithm poseses an attractive feature that in each step of the algorithm, the intermediate solution satisfies the isotonicity constraint. The paper begins with an example showing that the GIRP algorithm as described in the literature may fail to produce an isotonic model, suggesting that the existence and uniqueness of the solution to the isotonic regression problem must be carefully addressed. It proceeds with showing that, among possibly many solutions, there indeed exists a solution that can be found by recursive binary partitioning of the set of observed data. A small modification of the GIRP algorithm suffices to obtain a correct solution and preserve the desired property that all the intermediate solutions are isotonic. This proposed modification includes a proper choice of intermediate solutions and a simplification of the partitioning step from ternary to binary.
△ Less
Submitted 10 January, 2024; v1 submitted 9 January, 2024;
originally announced January 2024.
-
$t^3$-Variational Autoencoder: Learning Heavy-tailed Data with Student's t and Power Divergence
Authors:
Juno Kim,
Jaehyuk Kwon,
Mincheol Cho,
Hyunjong Lee,
Joong-Ho Won
Abstract:
The variational autoencoder (VAE) typically employs a standard normal prior as a regularizer for the probabilistic latent encoder. However, the Gaussian tail often decays too quickly to effectively accommodate the encoded points, failing to preserve crucial structures hidden in the data. In this paper, we explore the use of heavy-tailed models to combat over-regularization. Drawing upon insights f…
▽ More
The variational autoencoder (VAE) typically employs a standard normal prior as a regularizer for the probabilistic latent encoder. However, the Gaussian tail often decays too quickly to effectively accommodate the encoded points, failing to preserve crucial structures hidden in the data. In this paper, we explore the use of heavy-tailed models to combat over-regularization. Drawing upon insights from information geometry, we propose $t^3$VAE, a modified VAE framework that incorporates Student's t-distributions for the prior, encoder, and decoder. This results in a joint model distribution of a power form which we argue can better fit real-world datasets. We derive a new objective by reformulating the evidence lower bound as joint optimization of KL divergence between two statistical manifolds and replacing with $γ$-power divergence, a natural alternative for power families. $t^3$VAE demonstrates superior generation of low-density regions when trained on heavy-tailed synthetic data. Furthermore, we show that $t^3$VAE significantly outperforms other models on CelebA and imbalanced CIFAR-100 datasets.
△ Less
Submitted 3 March, 2024; v1 submitted 2 December, 2023;
originally announced December 2023.
-
MOCHA: Real-Time Motion Characterization via Context Matching
Authors:
Deok-Kyeong Jang,
Yuting Ye,
Jungdam Won,
Sung-Hee Lee
Abstract:
Transforming neutral, characterless input motions to embody the distinct style of a notable character in real time is highly compelling for character animation. This paper introduces MOCHA, a novel online motion characterization framework that transfers both motion styles and body proportions from a target character to an input source motion. MOCHA begins by encoding the input motion into a motion…
▽ More
Transforming neutral, characterless input motions to embody the distinct style of a notable character in real time is highly compelling for character animation. This paper introduces MOCHA, a novel online motion characterization framework that transfers both motion styles and body proportions from a target character to an input source motion. MOCHA begins by encoding the input motion into a motion feature that structures the body part topology and captures motion dependencies for effective characterization. Central to our framework is the Neural Context Matcher, which generates a motion feature for the target character with the most similar context to the input motion feature. The conditioned autoregressive model of the Neural Context Matcher can produce temporally coherent character features in each time frame. To generate the final characterized pose, our Characterizer network incorporates the characteristic aspects of the target motion feature into the input motion feature while preserving its context. This is achieved through a transformer model that introduces the adaptive instance normalization and context map**-based cross-attention, effectively injecting the character feature into the source feature. We validate the performance of our framework through comparisons with prior work and an ablation study. Our framework can easily accommodate various applications, including characterization with only sparse input and real-time characterization. Additionally, we contribute a high-quality motion dataset comprising six different characters performing a range of motions, which can serve as a valuable resource for future research.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
DROP: Dynamics Responses from Human Motion Prior and Projective Dynamics
Authors:
Yifeng Jiang,
Jungdam Won,
Yuting Ye,
C. Karen Liu
Abstract:
Synthesizing realistic human movements, dynamically responsive to the environment, is a long-standing objective in character animation, with applications in computer vision, sports, and healthcare, for motion prediction and data augmentation. Recent kinematics-based generative motion models offer impressive scalability in modeling extensive motion data, albeit without an interface to reason about…
▽ More
Synthesizing realistic human movements, dynamically responsive to the environment, is a long-standing objective in character animation, with applications in computer vision, sports, and healthcare, for motion prediction and data augmentation. Recent kinematics-based generative motion models offer impressive scalability in modeling extensive motion data, albeit without an interface to reason about and interact with physics. While simulator-in-the-loop learning approaches enable highly physically realistic behaviors, the challenges in training often affect scalability and adoption. We introduce DROP, a novel framework for modeling Dynamics Responses of humans using generative mOtion prior and Projective dynamics. DROP can be viewed as a highly stable, minimalist physics-based human simulator that interfaces with a kinematics-based generative motion prior. Utilizing projective dynamics, DROP allows flexible and simple integration of the learned motion prior as one of the projective energies, seamlessly incorporating control provided by the motion prior with Newtonian dynamics. Serving as a model-agnostic plug-in, DROP enables us to fully leverage recent advances in generative motion models for physics-based motion synthesis. We conduct extensive evaluations of our model across different motion tasks and various physical perturbations, demonstrating the scalability and diversity of responses.
△ Less
Submitted 24 September, 2023;
originally announced September 2023.
-
Wasserstein Geodesic Generator for Conditional Distributions
Authors:
Young-geun Kim,
Kyungbok Lee,
Youngwon Choi,
Joong-Ho Won,
Myunghee Cho Paik
Abstract:
Generating samples given a specific label requires estimating conditional distributions. We derive a tractable upper bound of the Wasserstein distance between conditional distributions to lay the theoretical groundwork to learn conditional distributions. Based on this result, we propose a novel conditional generation algorithm where conditional distributions are fully characterized by a metric spa…
▽ More
Generating samples given a specific label requires estimating conditional distributions. We derive a tractable upper bound of the Wasserstein distance between conditional distributions to lay the theoretical groundwork to learn conditional distributions. Based on this result, we propose a novel conditional generation algorithm where conditional distributions are fully characterized by a metric space defined by a statistical distance. We employ optimal transport theory to propose the Wasserstein geodesic generator, a new conditional generator that learns the Wasserstein geodesic. The proposed method learns both conditional distributions for observed domains and optimal transport maps between them. The conditional distributions given unobserved intermediate domains are on the Wasserstein geodesic between conditional distributions given two observed domain labels. Experiments on face images with light conditions as domain labels demonstrate the efficacy of the proposed method.
△ Less
Submitted 28 August, 2023; v1 submitted 19 August, 2023;
originally announced August 2023.
-
Physics-based Motion Retargeting from Sparse Inputs
Authors:
Daniele Reda,
Jungdam Won,
Yuting Ye,
Michiel van de Panne,
Alexander Winkler
Abstract:
Avatars are important to create interactive and immersive experiences in virtual worlds. One challenge in animating these characters to mimic a user's motion is that commercial AR/VR products consist only of a headset and controllers, providing very limited sensor data of the user's pose. Another challenge is that an avatar might have a different skeleton structure than a human and the map** bet…
▽ More
Avatars are important to create interactive and immersive experiences in virtual worlds. One challenge in animating these characters to mimic a user's motion is that commercial AR/VR products consist only of a headset and controllers, providing very limited sensor data of the user's pose. Another challenge is that an avatar might have a different skeleton structure than a human and the map** between them is unclear. In this work we address both of these challenges. We introduce a method to retarget motions in real-time from sparse human sensor data to characters of various morphologies. Our method uses reinforcement learning to train a policy to control characters in a physics simulator. We only require human motion capture data for training, without relying on artist-generated animations for each avatar. This allows us to use large motion capture datasets to train general policies that can track unseen users from real and sparse data in real-time. We demonstrate the feasibility of our approach on three characters with different skeleton structure: a dinosaur, a mouse-like creature and a human. We show that the avatar poses often match the user surprisingly well, despite having no sensor information of the lower body available. We discuss and ablate the important components in our framework, specifically the kinematic retargeting step, the imitation, contact and action reward as well as our asymmetric actor-critic observations. We further explore the robustness of our method in a variety of settings including unbalancing, dancing and sports motions.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
FedSelect: Customized Selection of Parameters for Fine-Tuning during Personalized Federated Learning
Authors:
Rishub Tamirisa,
John Won,
Chengjun Lu,
Ron Arel,
Andy Zhou
Abstract:
Recent advancements in federated learning (FL) seek to increase client-level performance by fine-tuning client parameters on local data or personalizing architectures for the local task. Existing methods for such personalization either prune a global model or fine-tune a global model on a local client distribution. However, these existing methods either personalize at the expense of retaining impo…
▽ More
Recent advancements in federated learning (FL) seek to increase client-level performance by fine-tuning client parameters on local data or personalizing architectures for the local task. Existing methods for such personalization either prune a global model or fine-tune a global model on a local client distribution. However, these existing methods either personalize at the expense of retaining important global knowledge, or predetermine network layers for fine-tuning, resulting in suboptimal storage of global knowledge within client models. Enlightened by the lottery ticket hypothesis, we first introduce a hypothesis for finding optimal client subnetworks to locally fine-tune while leaving the rest of the parameters frozen. We then propose a novel FL framework, FedSelect, using this procedure that directly personalizes both client subnetwork structure and parameters, via the simultaneous discovery of optimal parameters for personalization and the rest of parameters for global aggregation during training. We show that this method achieves promising results on CIFAR-10.
△ Less
Submitted 8 June, 2024; v1 submitted 22 June, 2023;
originally announced June 2023.
-
QuestEnvSim: Environment-Aware Simulated Motion Tracking from Sparse Sensors
Authors:
Sunmin Lee,
Sebastian Starke,
Yuting Ye,
Jungdam Won,
Alexander Winkler
Abstract:
Replicating a user's pose from only wearable sensors is important for many AR/VR applications. Most existing methods for motion tracking avoid environment interaction apart from foot-floor contact due to their complex dynamics and hard constraints. However, in daily life people regularly interact with their environment, e.g. by sitting on a couch or leaning on a desk. Using Reinforcement Learning,…
▽ More
Replicating a user's pose from only wearable sensors is important for many AR/VR applications. Most existing methods for motion tracking avoid environment interaction apart from foot-floor contact due to their complex dynamics and hard constraints. However, in daily life people regularly interact with their environment, e.g. by sitting on a couch or leaning on a desk. Using Reinforcement Learning, we show that headset and controller pose, if combined with physics simulation and environment observations can generate realistic full-body poses even in highly constrained environments. The physics simulation automatically enforces the various constraints necessary for realistic poses, instead of manually specifying them as in many kinematic approaches. These hard constraints allow us to achieve high-quality interaction motions without typical artifacts such as penetration or contact sliding. We discuss three features, the environment representation, the contact reward and scene randomization, crucial to the performance of the method. We demonstrate the generality of the approach through various examples, such as sitting on chairs, a couch and boxes, step** over boxes, rocking a chair and turning an office chair. We believe these are some of the highest-quality results achieved for motion tracking from sparse sensor with scene interaction.
△ Less
Submitted 9 June, 2023;
originally announced June 2023.
-
Bidirectional GaitNet: A Bidirectional Prediction Model of Human Gait and Anatomical Conditions
Authors:
Jungnam Park,
Moon Seok Park,
Jehee Lee,
Jungdam Won
Abstract:
We present a novel generative model, called Bidirectional GaitNet, that learns the relationship between human anatomy and its gait. The simulation model of human anatomy is a comprehensive, full-body, simulation-ready, musculoskeletal model with 304 Hill-type musculotendon units. The Bidirectional GaitNet consists of forward and backward models. The forward model predicts a gait pattern of a perso…
▽ More
We present a novel generative model, called Bidirectional GaitNet, that learns the relationship between human anatomy and its gait. The simulation model of human anatomy is a comprehensive, full-body, simulation-ready, musculoskeletal model with 304 Hill-type musculotendon units. The Bidirectional GaitNet consists of forward and backward models. The forward model predicts a gait pattern of a person with specific physical conditions, while the backward model estimates the physical conditions of a person when his/her gait pattern is provided. Our simulation-based approach first learns the forward model by distilling the simulation data generated by a state-of-the-art predictive gait simulator and then constructs a Variational Autoencoder (VAE) with the learned forward model as its decoder. Once it is learned its encoder serves as the backward model. We demonstrate our model on a variety of healthy/impaired gaits and validate it in comparison with physical examination data of real patients.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Simulation and Retargeting of Complex Multi-Character Interactions
Authors:
Yunbo Zhang,
Deepak Gopinath,
Yuting Ye,
Jessica Hodgins,
Greg Turk,
Jungdam Won
Abstract:
We present a method for reproducing complex multi-character interactions for physically simulated humanoid characters using deep reinforcement learning. Our method learns control policies for characters that imitate not only individual motions, but also the interactions between characters, while maintaining balance and matching the complexity of reference data. Our approach uses a novel reward for…
▽ More
We present a method for reproducing complex multi-character interactions for physically simulated humanoid characters using deep reinforcement learning. Our method learns control policies for characters that imitate not only individual motions, but also the interactions between characters, while maintaining balance and matching the complexity of reference data. Our approach uses a novel reward formulation based on an interaction graph that measures distances between pairs of interaction landmarks. This reward encourages control policies to efficiently imitate the character's motion while preserving the spatial relationships of the interactions in the reference motion. We evaluate our method on a variety of activities, from simple interactions such as a high-five greeting to more complex interactions such as gymnastic exercises, Salsa dancing, and box carrying and throwing. This approach can be used to ``clean-up'' existing motion capture data to produce physically plausible interactions or to retarget motion to new characters with different sizes, kinematics or morphologies while maintaining the interactions in the original data.
△ Less
Submitted 31 May, 2023;
originally announced May 2023.
-
ACE: Adversarial Correspondence Embedding for Cross Morphology Motion Retargeting from Human to Nonhuman Characters
Authors:
Tianyu Li,
Jungdam Won,
Alexander Clegg,
Jeonghwan Kim,
Akshara Rai,
Sehoon Ha
Abstract:
Motion retargeting is a promising approach for generating natural and compelling animations for nonhuman characters. However, it is challenging to translate human movements into semantically equivalent motions for target characters with different morphologies due to the ambiguous nature of the problem. This work presents a novel learning-based motion retargeting framework, Adversarial Corresponden…
▽ More
Motion retargeting is a promising approach for generating natural and compelling animations for nonhuman characters. However, it is challenging to translate human movements into semantically equivalent motions for target characters with different morphologies due to the ambiguous nature of the problem. This work presents a novel learning-based motion retargeting framework, Adversarial Correspondence Embedding (ACE), to retarget human motions onto target characters with different body dimensions and structures. Our framework is designed to produce natural and feasible robot motions by leveraging generative-adversarial networks (GANs) while preserving high-level motion semantics by introducing an additional feature loss. In addition, we pretrain a robot motion prior that can be controlled in a latent embedding space and seek to establish a compact correspondence. We demonstrate that the proposed framework can produce retargeted motions for three different characters -- a quadrupedal robot with a manipulator, a crab character, and a wheeled manipulator. We further validate the design choices of our framework by conducting baseline comparisons and a user study. We also showcase sim-to-real transfer of the retargeted motions by transferring them to a real Spot robot.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
PMP: Learning to Physically Interact with Environments using Part-wise Motion Priors
Authors:
**seok Bae,
Jungdam Won,
Donggeun Lim,
Cheol-Hui Min,
Young Min Kim
Abstract:
We present a method to animate a character incorporating multiple part-wise motion priors (PMP). While previous works allow creating realistic articulated motions from reference data, the range of motion is largely limited by the available samples. Especially for the interaction-rich scenarios, it is impractical to attempt acquiring every possible interacting motion, as the combination of physical…
▽ More
We present a method to animate a character incorporating multiple part-wise motion priors (PMP). While previous works allow creating realistic articulated motions from reference data, the range of motion is largely limited by the available samples. Especially for the interaction-rich scenarios, it is impractical to attempt acquiring every possible interacting motion, as the combination of physical parameters increases exponentially. The proposed PMP allows us to assemble multiple part skills to animate a character, creating a diverse set of motions with different combinations of existing data. In our pipeline, we can train an agent with a wide range of part-wise priors. Therefore, each body part can obtain a kinematic insight of the style from the motion captures, or at the same time extract dynamics-related information from the additional part-specific simulation. For example, we can first train a general interaction skill, e.g. gras**, only for the dexterous part, and then combine the expert trajectories from the pre-trained agent with the kinematic priors of other limbs. Eventually, our whole-body agent learns a novel physical interaction skill even with the absence of the object trajectories in the reference motion sequence.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
Unsupervised detection of small hyperreflective features in ultrahigh resolution optical coherence tomography
Authors:
Marcel Reimann,
Jungeun Won,
Hiroyuki Takahashi,
Antonio Yaghy,
Yunchan Hwang,
Stefan Ploner,
Junhong Lin,
Jessica Girgis,
Kenneth Lam,
Siyu Chen,
Nadia K. Waheed,
Andreas Maier,
James G. Fujimoto
Abstract:
Recent advances in optical coherence tomography such as the development of high speed ultrahigh resolution scanners and corresponding signal processing techniques may reveal new potential biomarkers in retinal diseases. Newly visible features are, for example, small hyperreflective specks in age-related macular degeneration. Identifying these new markers is crucial to investigate potential associa…
▽ More
Recent advances in optical coherence tomography such as the development of high speed ultrahigh resolution scanners and corresponding signal processing techniques may reveal new potential biomarkers in retinal diseases. Newly visible features are, for example, small hyperreflective specks in age-related macular degeneration. Identifying these new markers is crucial to investigate potential association with disease progression and treatment outcomes. Therefore, it is necessary to reliably detect these features in 3D volumetric scans. Because manual labeling of entire volumes is infeasible a need for automatic detection arises. Labeled datasets are often not publicly available and there are usually large variations in scan protocols and scanner types. Thus, this work focuses on an unsupervised approach that is based on local peak-detection and random walker segmentation to detect small features on each B-scan of the volume.
△ Less
Submitted 26 March, 2023;
originally announced March 2023.
-
Self-Supervised Predictive Coding with Multimodal Fusion for Patient Deterioration Prediction in Fine-grained Time Resolution
Authors:
Kwanhyung Lee,
John Won,
Heejung Hyun,
Sangchul Hahn,
Edward Choi,
Joohyung Lee
Abstract:
Accurate time prediction of patients' critical events is crucial in urgent scenarios where timely decision-making is important. Though many studies have proposed automatic prediction methods using Electronic Health Records (EHR), their coarse-grained time resolutions limit their practical usage in urgent environments such as the emergency department (ED) and intensive care unit (ICU). Therefore, i…
▽ More
Accurate time prediction of patients' critical events is crucial in urgent scenarios where timely decision-making is important. Though many studies have proposed automatic prediction methods using Electronic Health Records (EHR), their coarse-grained time resolutions limit their practical usage in urgent environments such as the emergency department (ED) and intensive care unit (ICU). Therefore, in this study, we propose an hourly prediction method based on self-supervised predictive coding and multi-modal fusion for two critical tasks: mortality and vasopressor need prediction. Through extensive experiments, we prove significant performance gains from both multi-modal fusion and self-supervised predictive regularization, most notably in far-future prediction, which becomes especially important in practice. Our uni-modal/bi-modal/bi-modal self-supervision scored 0.846/0.877/0.897 (0.824/0.855/0.886) and 0.817/0.820/0.858 (0.807/0.81/0.855) with mortality (far-future mortality) and with vasopressor need (far-future vasopressor need) prediction data in AUROC, respectively.
△ Less
Submitted 13 April, 2023; v1 submitted 29 October, 2022;
originally announced October 2022.
-
Leveraging Demonstrations with Latent Space Priors
Authors:
Jonas Gehring,
Deepak Gopinath,
Jungdam Won,
Andreas Krause,
Gabriel Synnaeve,
Nicolas Usunier
Abstract:
Demonstrations provide insight into relevant state or action space regions, bearing great potential to boost the efficiency and practicality of reinforcement learning agents. In this work, we propose to leverage demonstration datasets by combining skill learning and sequence modeling. Starting with a learned joint latent space, we separately train a generative model of demonstration sequences and…
▽ More
Demonstrations provide insight into relevant state or action space regions, bearing great potential to boost the efficiency and practicality of reinforcement learning agents. In this work, we propose to leverage demonstration datasets by combining skill learning and sequence modeling. Starting with a learned joint latent space, we separately train a generative model of demonstration sequences and an accompanying low-level policy. The sequence model forms a latent space prior over plausible demonstration behaviors to accelerate learning of high-level policies. We show how to acquire such priors from state-only motion capture demonstrations and explore several methods for integrating them into policy learning on transfer tasks. Our experimental results confirm that latent space priors provide significant gains in learning speed and final performance. We benchmark our approach on a set of challenging sparse-reward environments with a complex, simulated humanoid, and on offline RL benchmarks for navigation and object manipulation. Videos, source code and pre-trained models are available at the corresponding project website at https://facebookresearch.github.io/latent-space-priors .
△ Less
Submitted 13 March, 2023; v1 submitted 26 October, 2022;
originally announced October 2022.
-
QuestSim: Human Motion Tracking from Sparse Sensors with Simulated Avatars
Authors:
Alexander Winkler,
Jungdam Won,
Yuting Ye
Abstract:
Real-time tracking of human body motion is crucial for interactive and immersive experiences in AR/VR. However, very limited sensor data about the body is available from standalone wearable devices such as HMDs (Head Mounted Devices) or AR glasses. In this work, we present a reinforcement learning framework that takes in sparse signals from an HMD and two controllers, and simulates plausible and p…
▽ More
Real-time tracking of human body motion is crucial for interactive and immersive experiences in AR/VR. However, very limited sensor data about the body is available from standalone wearable devices such as HMDs (Head Mounted Devices) or AR glasses. In this work, we present a reinforcement learning framework that takes in sparse signals from an HMD and two controllers, and simulates plausible and physically valid full body motions. Using high quality full body motion as dense supervision during training, a simple policy network can learn to output appropriate torques for the character to balance, walk, and jog, while closely following the input signals. Our results demonstrate surprisingly similar leg motions to ground truth without any observations of the lower body, even when the input is only the 6D transformations of the HMD. We also show that a single policy can be robust to diverse locomotion styles, different body sizes, and novel environments.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
A Spatiotemporal Model for Precise and Efficient Fully-automatic 3D Motion Correction in OCT
Authors:
Stefan Ploner,
Siyu Chen,
Jungeun Won,
Lennart Husvogt,
Katharina Breininger,
Julia Schottenhamml,
James Fujimoto,
Andreas Maier
Abstract:
Optical coherence tomography (OCT) is a micrometer-scale, volumetric imaging modality that has become a clinical standard in ophthalmology. OCT instruments image by raster-scanning a focused light spot across the retina, acquiring sequential cross-sectional images to generate volumetric data. Patient eye motion during the acquisition poses unique challenges: Non-rigid, discontinuous distortions ca…
▽ More
Optical coherence tomography (OCT) is a micrometer-scale, volumetric imaging modality that has become a clinical standard in ophthalmology. OCT instruments image by raster-scanning a focused light spot across the retina, acquiring sequential cross-sectional images to generate volumetric data. Patient eye motion during the acquisition poses unique challenges: Non-rigid, discontinuous distortions can occur, leading to gaps in data and distorted topographic measurements. We present a new distortion model and a corresponding fully-automatic, reference-free optimization strategy for computational motion correction in orthogonally raster-scanned, retinal OCT volumes. Using a novel, domain-specific spatiotemporal parametrization of forward-war** displacements, eye motion can be corrected continuously for the first time. Parameter estimation with temporal regularization improves robustness and accuracy over previous spatial approaches. We correct each A-scan individually in 3D in a single map**, including repeated acquisitions used in OCT angiography protocols. Specialized 3D forward image war** reduces median runtime to < 9 s, fast enough for clinical use. We present a quantitative evaluation on 18 subjects with ocular pathology and demonstrate accurate correction during microsaccades. Transverse correction is limited only by ocular tremor, whereas submicron repeatability is achieved axially (0.51 um median of medians), representing a dramatic improvement over previous work. This allows assessing longitudinal changes in focal retinal pathologies as a marker of disease progression or treatment response, and promises to enable multiple new capabilities such as supersampled/super-resolution volume reconstruction and analysis of pathological eye motion occuring in neurological diseases.
△ Less
Submitted 15 September, 2022;
originally announced September 2022.
-
LAVOLUTION: Measurement of Non-target Structural Displacement Calibrated by Structured Light
Authors:
Jongbin Won,
Minhyuk Song,
Gunhee Kim,
Jong-Woong Park,
Haemin Jeon
Abstract:
Displacement is an important measurement for the assessment of structural conditions, but its field measurement is often hindered by difficulties associated with sensor installation and measurement accuracy. To overcome the disadvantages of conventional displacement measurement, computer vision (CV)-based methods have been implemented due to their remote sensing capabilities and accuracy. This pap…
▽ More
Displacement is an important measurement for the assessment of structural conditions, but its field measurement is often hindered by difficulties associated with sensor installation and measurement accuracy. To overcome the disadvantages of conventional displacement measurement, computer vision (CV)-based methods have been implemented due to their remote sensing capabilities and accuracy. This paper presents a strategy for non-target structural displacement measurement that makes use of CV to avoid the need to install a target on the structure while calibrating the displacement using structured light. The proposed system called as LAVOLUTION calculates the relative position of the camera with regard to the structure using four equally spaced beams of structured light and obtains a scale factor to convert pixel movement into structural displacement. A jig for the four beams of structured light is designed and a corresponding alignment process is proposed. A method for calculating the scale factor using the designed jig for tunable structured-light is proposed and validated via numerical simulations and lab-scale experiments. To confirm the feasibility of the proposed displacement measurement process, experiments on a shaking table and a full-scale bridge are conducted and the accuracy of the proposed method is compared with that of a reference laser doppler vibrometer.
△ Less
Submitted 15 September, 2022;
originally announced September 2022.
-
The Sparse Abstract Machine
Authors:
Olivia Hsu,
Maxwell Strange,
Ritvik Sharma,
Jaeyeon Won,
Kunle Olukotun,
Joel Emer,
Mark Horowitz,
Fredrik Kjolstad
Abstract:
We propose the Sparse Abstract Machine (SAM), an abstract machine model for targeting sparse tensor algebra to reconfigurable and fixed-function spatial dataflow accelerators. SAM defines a streaming dataflow abstraction with sparse primitives that encompass a large space of scheduled tensor algebra expressions. SAM dataflow graphs naturally separate tensor formats from algorithms and are expressi…
▽ More
We propose the Sparse Abstract Machine (SAM), an abstract machine model for targeting sparse tensor algebra to reconfigurable and fixed-function spatial dataflow accelerators. SAM defines a streaming dataflow abstraction with sparse primitives that encompass a large space of scheduled tensor algebra expressions. SAM dataflow graphs naturally separate tensor formats from algorithms and are expressive enough to incorporate arbitrary iteration orderings and many hardware-specific optimizations. We also present Custard, a compiler from a high-level language to SAM that demonstrates SAM's usefulness as an intermediate representation. We automatically bind from SAM to a streaming dataflow simulator. We evaluate the generality and extensibility of SAM, explore the performance space of sparse tensor algebra optimizations using SAM, and show SAM's ability to represent dataflow hardware.
△ Less
Submitted 23 March, 2023; v1 submitted 30 August, 2022;
originally announced August 2022.
-
Abnormal Local Clustering in Federated Learning
Authors:
Jihwan Won
Abstract:
Federated learning is a model for privacy without revealing private data by transfer models instead of personal and private data from local client devices. While, in the global model, it's crucial to recognize each local data is normal. This paper suggests one method to separate normal locals and abnormal locals by Euclidean similarity clustering of vectors extracted by inputting dummy data in loc…
▽ More
Federated learning is a model for privacy without revealing private data by transfer models instead of personal and private data from local client devices. While, in the global model, it's crucial to recognize each local data is normal. This paper suggests one method to separate normal locals and abnormal locals by Euclidean similarity clustering of vectors extracted by inputting dummy data in local models. In a federated classification model, this method divided locals into normal and abnormal.
△ Less
Submitted 26 August, 2022;
originally announced August 2022.
-
Statistical inference with implicit SGD: proximal Robbins-Monro vs. Polyak-Ruppert
Authors:
Yoonhyung Lee,
Sungdong Lee,
Joong-Ho Won
Abstract:
The implicit stochastic gradient descent (ISGD), a proximal version of SGD, is gaining interest in the literature due to its stability over (explicit) SGD. In this paper, we conduct an in-depth analysis of the two modes of ISGD for smooth convex functions, namely proximal Robbins-Monro (proxRM) and proximal Poylak-Ruppert (proxPR) procedures, for their use in statistical inference on model paramet…
▽ More
The implicit stochastic gradient descent (ISGD), a proximal version of SGD, is gaining interest in the literature due to its stability over (explicit) SGD. In this paper, we conduct an in-depth analysis of the two modes of ISGD for smooth convex functions, namely proximal Robbins-Monro (proxRM) and proximal Poylak-Ruppert (proxPR) procedures, for their use in statistical inference on model parameters. Specifically, we derive non-asymptotic point estimation error bounds of both proxRM and proxPR iterates and their limiting distributions, and propose on-line estimators of their asymptotic covariance matrices that require only a single run of ISGD. The latter estimators are used to construct valid confidence intervals for the model parameters. Our analysis is free of the generalized linear model assumption that has limited the preceding analyses, and employs feasible procedures. Our on-line covariance matrix estimators appear to be the first of this kind in the ISGD literature.
△ Less
Submitted 28 June, 2022; v1 submitted 25 June, 2022;
originally announced June 2022.
-
Scalable and Robust Self-Learning for Skill Routing in Large-Scale Conversational AI Systems
Authors:
Mohammad Kachuee,
**seok Nam,
Sarthak Ahuja,
**-Myung Won,
Sung** Lee
Abstract:
Skill routing is an important component in large-scale conversational systems. In contrast to traditional rule-based skill routing, state-of-the-art systems use a model-based approach to enable natural conversations. To provide supervision signal required to train such models, ideas such as human annotation, replication of a rule-based system, relabeling based on user paraphrases, and bandit-based…
▽ More
Skill routing is an important component in large-scale conversational systems. In contrast to traditional rule-based skill routing, state-of-the-art systems use a model-based approach to enable natural conversations. To provide supervision signal required to train such models, ideas such as human annotation, replication of a rule-based system, relabeling based on user paraphrases, and bandit-based learning were suggested. However, these approaches: (a) do not scale in terms of the number of skills and skill on-boarding, (b) require a very costly expert annotation/rule-design, (c) introduce risks in the user experience with each model update. In this paper, we present a scalable self-learning approach to explore routing alternatives without causing abrupt policy changes that break the user experience, learn from the user interaction, and incrementally improve the routing via frequent model refreshes. To enable such robust frequent model updates, we suggest a simple and effective approach that ensures controlled policy updates for individual domains, followed by an off-policy evaluation for making deployment decisions without any need for lengthy A/B experimentation. We conduct various offline and online A/B experiments on a commercial large-scale conversational system to demonstrate the effectiveness of the proposed method in real-world production settings.
△ Less
Submitted 14 April, 2022;
originally announced April 2022.
-
Transformer Inertial Poser: Real-time Human Motion Reconstruction from Sparse IMUs with Simultaneous Terrain Generation
Authors:
Yifeng Jiang,
Yuting Ye,
Deepak Gopinath,
Jungdam Won,
Alexander W. Winkler,
C. Karen Liu
Abstract:
Real-time human motion reconstruction from a sparse set of (e.g. six) wearable IMUs provides a non-intrusive and economic approach to motion capture. Without the ability to acquire position information directly from IMUs, recent works took data-driven approaches that utilize large human motion datasets to tackle this under-determined problem. Still, challenges remain such as temporal consistency,…
▽ More
Real-time human motion reconstruction from a sparse set of (e.g. six) wearable IMUs provides a non-intrusive and economic approach to motion capture. Without the ability to acquire position information directly from IMUs, recent works took data-driven approaches that utilize large human motion datasets to tackle this under-determined problem. Still, challenges remain such as temporal consistency, drifting of global and joint motions, and diverse coverage of motion types on various terrains. We propose a novel method to simultaneously estimate full-body motion and generate plausible visited terrain from only six IMU sensors in real-time. Our method incorporates 1. a conditional Transformer decoder model giving consistent predictions by explicitly reasoning prediction history, 2. a simple yet general learning target named "stationary body points" (SBPs) which can be stably predicted by the Transformer model and utilized by analytical routines to correct joint and global drifting, and 3. an algorithm to generate regularized terrain height maps from noisy SBP predictions which can in turn correct noisy global motion estimation. We evaluate our framework extensively on synthesized and real IMU data, and with real-time live demos, and show superior performance over strong baseline methods.
△ Less
Submitted 8 December, 2022; v1 submitted 29 March, 2022;
originally announced March 2022.
-
Conditional Motion In-betweening
Authors:
Jihoon Kim,
Taehyun Byun,
Seungyoun Shin,
Jungdam Won,
Sungjoon Choi
Abstract:
Motion in-betweening (MIB) is a process of generating intermediate skeletal movement between the given start and target poses while preserving the naturalness of the motion, such as periodic footstep motion while walking. Although state-of-the-art MIB methods are capable of producing plausible motions given sparse key-poses, they often lack the controllability to generate motions satisfying the se…
▽ More
Motion in-betweening (MIB) is a process of generating intermediate skeletal movement between the given start and target poses while preserving the naturalness of the motion, such as periodic footstep motion while walking. Although state-of-the-art MIB methods are capable of producing plausible motions given sparse key-poses, they often lack the controllability to generate motions satisfying the semantic contexts required in practical applications. We focus on the method that can handle pose or semantic conditioned MIB tasks using a unified model. We also present a motion augmentation method to improve the quality of pose-conditioned motion generation via defining a distribution over smooth trajectories. Our proposed method outperforms the existing state-of-the-art MIB method in pose prediction errors while providing additional controllability.
△ Less
Submitted 6 October, 2022; v1 submitted 9 February, 2022;
originally announced February 2022.
-
FastMimic: Model-based Motion Imitation for Agile, Diverse and Generalizable Quadrupedal Locomotion
Authors:
Tianyu Li,
Jungdam Won,
Sehoon Ha,
Akshara Rai
Abstract:
Robots operating in human environments need various skills, like slow and fast walking, turning, side-step**, and many more. However, building robot controllers that can exhibit such a large range of behaviors is a challenging problem that requires tedious investigation for every task. We present a unified model-based control algorithm for imitating different animal gaits without expensive simul…
▽ More
Robots operating in human environments need various skills, like slow and fast walking, turning, side-step**, and many more. However, building robot controllers that can exhibit such a large range of behaviors is a challenging problem that requires tedious investigation for every task. We present a unified model-based control algorithm for imitating different animal gaits without expensive simulation training or real-world fine-tuning. Our method consists of stance and swing leg controllers using a centroidal dynamics model augmented with online adaptation techniques. We also develop a whole-body trajectory optimization procedure to fix the kinematic infeasibility of the reference animal motions. We demonstrate that our universal data-driven model-based controller can seamlessly imitate various motor skills, including trotting, pacing, turning, and side-step**. It also shows better tracking capabilities in simulation and the real world against several baselines, including another model-based imitation controller and a learning-based motion imitation technique.
△ Less
Submitted 25 February, 2022; v1 submitted 27 September, 2021;
originally announced September 2021.
-
DistStat.jl: Towards Unified Programming for High-Performance Statistical Computing Environments in Julia
Authors:
Seyoon Ko,
Hua Zhou,
** Zhou,
Joong-Ho Won
Abstract:
The demand for high-performance computing (HPC) is ever-increasing for everyday statistical computing purposes. The downside is that we need to write specialized code for each HPC environment. CPU-level parallelization needs to be explicitly coded for effective use of multiple nodes in cluster supercomputing environments. Acceleration via graphics processing units (GPUs) requires to write kernel c…
▽ More
The demand for high-performance computing (HPC) is ever-increasing for everyday statistical computing purposes. The downside is that we need to write specialized code for each HPC environment. CPU-level parallelization needs to be explicitly coded for effective use of multiple nodes in cluster supercomputing environments. Acceleration via graphics processing units (GPUs) requires to write kernel code. The Julia software package DistStat.jl implements a data structure for distributed arrays that work on both multi-node CPU clusters and multi-GPU environments transparently. This package paves a way to develo** high-performance statistical software in various HPC environments simultaneously. As a demonstration of the transparency and scalability of the package, we provide applications to large-scale nonnegative matrix factorization, multidimensional scaling, and $\ell_1$-regularized Cox proportional hazards model on an 8-GPU workstation and a 720-CPU-core virtual cluster in Amazon Web Services (AWS) cloud. As a case in point, we analyze the on-set of type-2 diabetes from the UK Biobank with 400,000 subjects and 500,000 single nucleotide polymorphisms using the $\ell_1$-regularized Cox proportional hazards model. Fitting a half-million-variate regression model took less than 50 minutes on AWS.
△ Less
Submitted 30 October, 2020;
originally announced October 2020.
-
Drift with Devil: Security of Multi-Sensor Fusion based Localization in High-Level Autonomous Driving under GPS Spoofing (Extended Version)
Authors:
Junjie Shen,
Jun Yeon Won,
Zeyuan Chen,
Qi Alfred Chen
Abstract:
For high-level Autonomous Vehicles (AV), localization is highly security and safety critical. One direct threat to it is GPS spoofing, but fortunately, AV systems today predominantly use Multi-Sensor Fusion (MSF) algorithms that are generally believed to have the potential to practically defeat GPS spoofing. However, no prior work has studied whether today's MSF algorithms are indeed sufficiently…
▽ More
For high-level Autonomous Vehicles (AV), localization is highly security and safety critical. One direct threat to it is GPS spoofing, but fortunately, AV systems today predominantly use Multi-Sensor Fusion (MSF) algorithms that are generally believed to have the potential to practically defeat GPS spoofing. However, no prior work has studied whether today's MSF algorithms are indeed sufficiently secure under GPS spoofing, especially in AV settings. In this work, we perform the first study to fill this critical gap. As the first study, we focus on a production-grade MSF with both design and implementation level representativeness, and identify two AV-specific attack goals, off-road and wrong-way attacks.
To systematically understand the security property, we first analyze the upper-bound attack effectiveness, and discover a take-over effect that can fundamentally defeat the MSF design principle. We perform a cause analysis and find that such vulnerability only appears dynamically and non-deterministically. Leveraging this insight, we design FusionRipper, a novel and general attack that opportunistically captures and exploits take-over vulnerabilities. We evaluate it on 6 real-world sensor traces, and find that FusionRipper can achieve at least 97% and 91.3% success rates in all traces for off-road and wrong-way attacks respectively. We also find that it is highly robust to practical factors such as spoofing inaccuracies. To improve the practicality, we further design an offline method that can effectively identify attack parameters with over 80% average success rates for both attack goals, with the cost of at most half a day. We also discuss promising defense directions.
△ Less
Submitted 12 August, 2020; v1 submitted 18 June, 2020;
originally announced June 2020.
-
Principled learning method for Wasserstein distributionally robust optimization with local perturbations
Authors:
Yongchan Kwon,
Wonyoung Kim,
Joong-Ho Won,
Myunghee Cho Paik
Abstract:
Wasserstein distributionally robust optimization (WDRO) attempts to learn a model that minimizes the local worst-case risk in the vicinity of the empirical data distribution defined by Wasserstein ball. While WDRO has received attention as a promising tool for inference since its introduction, its theoretical understanding has not been fully matured. Gao et al. (2017) proposed a minimizer based on…
▽ More
Wasserstein distributionally robust optimization (WDRO) attempts to learn a model that minimizes the local worst-case risk in the vicinity of the empirical data distribution defined by Wasserstein ball. While WDRO has received attention as a promising tool for inference since its introduction, its theoretical understanding has not been fully matured. Gao et al. (2017) proposed a minimizer based on a tractable approximation of the local worst-case risk, but without showing risk consistency. In this paper, we propose a minimizer based on a novel approximation theorem and provide the corresponding risk consistency results. Furthermore, we develop WDRO inference for locally perturbed data that include the Mixup (Zhang et al., 2017) as a special case. We show that our approximation and risk consistency results naturally extend to the cases when data are locally perturbed. Numerical experiments demonstrate robustness of the proposed method using image classification datasets. Our results show that the proposed method achieves significantly higher accuracy than baseline models on noisy datasets.
△ Less
Submitted 22 June, 2020; v1 submitted 5 June, 2020;
originally announced June 2020.
-
Non-target Structural Displacement Measurement Using Reference Frame Based Deepflow
Authors:
Jongbin Won,
Jong-Woong Park,
Do-Soo Moon
Abstract:
Structural displacement is crucial for structural health monitoring, although it is very challenging to measure in field conditions. Most existing displacement measurement methods are costly, labor intensive, and insufficiently accurate for measuring small dynamic displacements. Computer vision (CV) based methods incorporate optical devices with advanced image processing algorithms to accurately,…
▽ More
Structural displacement is crucial for structural health monitoring, although it is very challenging to measure in field conditions. Most existing displacement measurement methods are costly, labor intensive, and insufficiently accurate for measuring small dynamic displacements. Computer vision (CV) based methods incorporate optical devices with advanced image processing algorithms to accurately, cost-effectively, and remotely measure structural displacement with easy installation. However, non-target based CV methods are still limited by insufficient feature points, incorrect feature point detection, occlusion, and drift induced by tracking error accumulation. This paper presents a reference frame based Deepflow algorithm integrated with masking and signal filtering for non-target based displacement measurements. The proposed method allows the user to select points of interest for images with a low gradient for displacement tracking and directly calculate displacement without drift accumulated by measurement error. The proposed method is experimentally validated on a cantilevered beam under ambient and occluded test conditions. The accuracy of the proposed method is compared with that of a reference laser displacement sensor for validation. The significant advantage of the proposed method is its flexibility in extracting structural displacement in any region on structures that do not have distinct natural features.
△ Less
Submitted 21 March, 2019;
originally announced March 2019.
-
Co-occurrence matrix analysis-based semi-supervised training for object detection
Authors:
Min-Kook Choi,
Jaehyeong Park,
Jihun Jung,
Heechul Jung,
**-Hee Lee,
Woong Jae Won,
Woo Young Jung,
**cheol Kim,
Soon Kwon
Abstract:
One of the most important factors in training object recognition networks using convolutional neural networks (CNNs) is the provision of annotated data accompanying human judgment. Particularly, in object detection or semantic segmentation, the annotation process requires considerable human effort. In this paper, we propose a semi-supervised learning (SSL)-based training methodology for object det…
▽ More
One of the most important factors in training object recognition networks using convolutional neural networks (CNNs) is the provision of annotated data accompanying human judgment. Particularly, in object detection or semantic segmentation, the annotation process requires considerable human effort. In this paper, we propose a semi-supervised learning (SSL)-based training methodology for object detection, which makes use of automatic labeling of un-annotated data by applying a network previously trained from an annotated dataset. Because an inferred label by the trained network is dependent on the learned parameters, it is often meaningless for re-training the network. To transfer a valuable inferred label to the unlabeled data, we propose a re-alignment method based on co-occurrence matrix analysis that takes into account one-hot-vector encoding of the estimated label and the correlation between the objects in the image. We used an MS-COCO detection dataset to verify the performance of the proposed SSL method and deformable neural networks (D-ConvNets) as an object detector for basic training. The performance of the existing state-of-the-art detectors (DConvNets, YOLO v2, and single shot multi-box detector (SSD)) can be improved by the proposed SSL method without using the additional model parameter or modifying the network architecture.
△ Less
Submitted 19 February, 2018;
originally announced February 2018.
-
The YLI-MED Corpus: Characteristics, Procedures, and Plans
Authors:
Julia Bernd,
Damian Borth,
Benjamin Elizalde,
Gerald Friedland,
Heather Gallagher,
Luke Gottlieb,
Adam Janin,
Sara Karabashlieva,
Jocelyn Takahashi,
Jennifer Won
Abstract:
The YLI Multimedia Event Detection corpus is a public-domain index of videos with annotations and computed features, specialized for research in multimedia event detection (MED), i.e., automatically identifying what's happening in a video by analyzing the audio and visual content. The videos indexed in the YLI-MED corpus are a subset of the larger YLI feature corpus, which is being developed by th…
▽ More
The YLI Multimedia Event Detection corpus is a public-domain index of videos with annotations and computed features, specialized for research in multimedia event detection (MED), i.e., automatically identifying what's happening in a video by analyzing the audio and visual content. The videos indexed in the YLI-MED corpus are a subset of the larger YLI feature corpus, which is being developed by the International Computer Science Institute and Lawrence Livermore National Laboratory based on the Yahoo Flickr Creative Commons 100 Million (YFCC100M) dataset. The videos in YLI-MED are categorized as depicting one of ten target events, or no target event, and are annotated for additional attributes like language spoken and whether the video has a musical score. The annotations also include degree of annotator agreement and average annotator confidence scores for the event categorization of each video. Version 1.0 of YLI-MED includes 1823 "positive" videos that depict the target events and 48,138 "negative" videos, as well as 177 supplementary videos that are similar to event videos but are not positive examples. Our goal in producing YLI-MED is to be as open about our data and procedures as possible. This report describes the procedures used to collect the corpus; gives detailed descriptive statistics about the corpus makeup (and how video attributes affected annotators' judgments); discusses possible biases in the corpus introduced by our procedural choices and compares it with the most similar existing dataset, TRECVID MED's HAVIC corpus; and gives an overview of our future plans for expanding the annotation effort.
△ Less
Submitted 13 March, 2015;
originally announced March 2015.
-
On the mathematic modeling of non-parametric curves based on cubic Bézier curves
Authors:
Ha Jong Won,
Choe Chun Hwa,
Li Kum Song
Abstract:
Bézier splines are widely available in various systems with the curves and surface designs. In general, the Bézier spline can be specified with the Bézier curve segments and a Bézier curve segment can be fitted to any number of control points. The number of control points determines the degree of the Bézier polynomial. This paper presents a method which determines control points for Bézier curves…
▽ More
Bézier splines are widely available in various systems with the curves and surface designs. In general, the Bézier spline can be specified with the Bézier curve segments and a Bézier curve segment can be fitted to any number of control points. The number of control points determines the degree of the Bézier polynomial. This paper presents a method which determines control points for Bézier curves approximating segments of obtained image outline(non-parametric curve) by using the properties of cubic Bézier curves. Proposed method is a technique to determine the control points that has generality and reduces the error of the Bézier curve approximation. Main advantage of proposed method is that it has higher accuracy and compression rate than previous methods. The cubic Bézier spline is obtained from cubic Bézier curve segments. To demonstrate the various performances of the proposed algorithm, experimental results are compared.
△ Less
Submitted 24 November, 2014;
originally announced November 2014.
-
Definition of Visual Speech Element and Research on a Method of Extracting Feature Vector for Korean Lip-Reading
Authors:
Ha Jong Won,
Li Gwang Chol,
Kim Hyok Chol,
Li Kum Song
Abstract:
In this paper, we defined the viseme (visual speech element) and described about the method of extracting visual feature vector. We defined the 10 visemes based on vowel by analyzing of Korean utterance and proposed the method of extracting the 20-dimensional visual feature vector, combination of static features and dynamic features. Lastly, we took an experiment in recognizing words based on 3-vi…
▽ More
In this paper, we defined the viseme (visual speech element) and described about the method of extracting visual feature vector. We defined the 10 visemes based on vowel by analyzing of Korean utterance and proposed the method of extracting the 20-dimensional visual feature vector, combination of static features and dynamic features. Lastly, we took an experiment in recognizing words based on 3-viseme HMM and evaluated the efficiency.
△ Less
Submitted 15 November, 2014;
originally announced November 2014.