-
Learning to Generate Conditional Tri-plane for 3D-aware Expression Controllable Portrait Animation
Authors:
Taekyung Ki,
Dongchan Min,
Gyeongsu Chae
Abstract:
In this paper, we present Export3D, a one-shot 3D-aware portrait animation method that is able to control the facial expression and camera view of a given portrait image. To achieve this, we introduce a tri-plane generator that directly generates a tri-plane of 3D prior by transferring the expression parameter of 3DMM into the source image. The tri-plane is then decoded into the image of different…
▽ More
In this paper, we present Export3D, a one-shot 3D-aware portrait animation method that is able to control the facial expression and camera view of a given portrait image. To achieve this, we introduce a tri-plane generator that directly generates a tri-plane of 3D prior by transferring the expression parameter of 3DMM into the source image. The tri-plane is then decoded into the image of different view through a differentiable volume rendering. Existing portrait animation methods heavily rely on image war** to transfer the expression in the motion space, challenging on disentanglement of appearance and expression. In contrast, we propose a contrastive pre-training framework for appearance-free expression parameter, eliminating undesirable appearance swap when transferring a cross-identity expression. Extensive experiments show that our pre-training framework can learn the appearance-free expression representation hidden in 3DMM, and our model can generate 3D-aware expression controllable portrait image without appearance swap in the cross-identity manner.
△ Less
Submitted 2 April, 2024; v1 submitted 31 March, 2024;
originally announced April 2024.
-
High-precision and low-noise dielectric tensor tomography using a micro-electromechanical system mirror
Authors:
Juheon Lee,
Byung Gyu Chae,
Hyuneui Kim,
MinSung Yoon,
Herve Hugonnet,
YongKeun Park
Abstract:
Dielectric tensor tomography is an imaging technique for map** three-dimensional distributions of dielectric properties in transparent materials. This work introduces an enhanced illumination strategy employing a micro-electromechanical system mirror to achieve high precision and reduced noise in imaging. This illumination approach allows for precise manipulation of light, significantly improvin…
▽ More
Dielectric tensor tomography is an imaging technique for map** three-dimensional distributions of dielectric properties in transparent materials. This work introduces an enhanced illumination strategy employing a micro-electromechanical system mirror to achieve high precision and reduced noise in imaging. This illumination approach allows for precise manipulation of light, significantly improving the accuracy of angle control and minimizing diffraction noise compared to traditional beam steering approaches. Our experiments have successfully reconstructed the dielectric properties of liquid crystal droplets, which are known for their anisotropic structures, while demonstrating a notable reduction in background noise of the imag-es. Additionally, the technique has been applied to more complex samples, revealing its capability to achieve a high signal-to-noise ratio. This development represents a significant step forward in the field of birefringence imaging, offering a powerful tool for detailed study of materials with anisotropic properties.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Incomplete Contrastive Multi-View Clustering with High-Confidence Guiding
Authors:
Guoqing Chao,
Yi Jiang,
Dianhui Chu
Abstract:
Incomplete multi-view clustering becomes an important research problem, since multi-view data with missing values are ubiquitous in real-world applications. Although great efforts have been made for incomplete multi-view clustering, there are still some challenges: 1) most existing methods didn't make full use of multi-view information to deal with missing values; 2) most methods just employ the c…
▽ More
Incomplete multi-view clustering becomes an important research problem, since multi-view data with missing values are ubiquitous in real-world applications. Although great efforts have been made for incomplete multi-view clustering, there are still some challenges: 1) most existing methods didn't make full use of multi-view information to deal with missing values; 2) most methods just employ the consistent information within multi-view data but ignore the complementary information; 3) For the existing incomplete multi-view clustering methods, incomplete multi-view representation learning and clustering are treated as independent processes, which leads to performance gap. In this work, we proposed a novel Incomplete Contrastive Multi-View Clustering method with high-confidence guiding (ICMVC). Firstly, we proposed a multi-view consistency relation transfer plus graph convolutional network to tackle missing values problem. Secondly, instance-level attention fusion and high-confidence guiding are proposed to exploit the complementary information while instance-level contrastive learning for latent representation is designed to employ the consistent information. Thirdly, an end-to-end framework is proposed to integrate multi-view missing values handling, multi-view representation learning and clustering assignment for joint optimization. Experiments compared with state-of-the-art approaches demonstrated the effectiveness and superiority of our method. Our code is publicly available at https://github.com/liunian-Jay/ICMVC.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
Spatial resolution enhancement in holographic imaging via angular spectrum expansion
Authors:
Byung Gyu Chae
Abstract:
Digital holography numerically restores three-dimensional image information using optically captured diffractive waves. The required bandwidth is larger than that of hologram pixel at a closer distance in the Fresnel diffraction regime, which results in the formation of aliased replica patterns in digital hologram. From the analysis of sampling phenomenon, the replica functions are revealed to be…
▽ More
Digital holography numerically restores three-dimensional image information using optically captured diffractive waves. The required bandwidth is larger than that of hologram pixel at a closer distance in the Fresnel diffraction regime, which results in the formation of aliased replica patterns in digital hologram. From the analysis of sampling phenomenon, the replica functions are revealed to be the components of higher angular spectra of hologram. Undersampled hologram consists of the moire patterns formed by the modulation of original function by complex exponential function. There is a one-to-one correspondence between the replicas in both real and Fourier spaces. The possibility to acquire high-resolution images over a wide field view is explored in terms of the expansion process of angular spectrum by using replicas. Only a low-NA hologram captured over a wide field restores a high-resolution image when using an optimization algorithm. Numerical simulations and optical experiments are performed to investigate the proposed scheme.
△ Less
Submitted 10 January, 2024; v1 submitted 24 August, 2023;
originally announced August 2023.
-
Disentangling Multi-view Representations Beyond Inductive Bias
Authors:
Guanzhou Ke,
Yang Yu,
Guoqing Chao,
Xiaoli Wang,
Chenyang Xu,
Shengfeng He
Abstract:
Multi-view (or -modality) representation learning aims to understand the relationships between different view representations. Existing methods disentangle multi-view representations into consistent and view-specific representations by introducing strong inductive biases, which can limit their generalization ability. In this paper, we propose a novel multi-view representation disentangling method…
▽ More
Multi-view (or -modality) representation learning aims to understand the relationships between different view representations. Existing methods disentangle multi-view representations into consistent and view-specific representations by introducing strong inductive biases, which can limit their generalization ability. In this paper, we propose a novel multi-view representation disentangling method that aims to go beyond inductive biases, ensuring both interpretability and generalizability of the resulting representations. Our method is based on the observation that discovering multi-view consistency in advance can determine the disentangling information boundary, leading to a decoupled learning objective. We also found that the consistency can be easily extracted by maximizing the transformation invariance and clustering consistency between views. These observations drive us to propose a two-stage framework. In the first stage, we obtain multi-view consistency by training a consistent encoder to produce semantically-consistent representations across views as well as their corresponding pseudo-labels. In the second stage, we disentangle specificity from comprehensive representations by minimizing the upper bound of mutual information between consistent and comprehensive representations. Finally, we reconstruct the original data by concatenating pseudo-labels and view-specific representations. Our experiments on four multi-view datasets demonstrate that our proposed method outperforms 12 comparison methods in terms of clustering and classification performance. The visualization results also show that the extracted consistency and specificity are compact and interpretable. Our code can be found at \url{https://github.com/Guanzhou-Ke/DMRIB}.
△ Less
Submitted 4 August, 2023; v1 submitted 3 August, 2023;
originally announced August 2023.
-
Viewing-angle expansion of holographic image using enhanced-NA Fresnel hologram
Authors:
Byung Gyu Chae
Abstract:
The expansion of viewing angle is a crucial factor in holographic displays implemented with a spatial light modulator having a finite space-bandwidth. The enhanced-NA Fresnel hologram reconstructs a holographic image at an angle larger than the diffraction angle by a hologram pixel, where it has a difficulty in achieving this without an interference of high-order noises. This study presents the th…
▽ More
The expansion of viewing angle is a crucial factor in holographic displays implemented with a spatial light modulator having a finite space-bandwidth. The enhanced-NA Fresnel hologram reconstructs a holographic image at an angle larger than the diffraction angle by a hologram pixel, where it has a difficulty in achieving this without an interference of high-order noises. This study presents the theoretical foundation for optimizing the enhanced-NA Fresnel hologram to recover the low space-bandwidth. The higher spectrum components of the digital hologram beyond the bandwidth exists in the form of their replications. The expansion of angular spectrum by its repetition during optimization procedure increases the image resolution, resulting in a viewing angle that is dependent on the hologram numerical aperture. We numerically and experimentally verify our strategy to expand a viewing angle of holographic image.
△ Less
Submitted 22 August, 2023; v1 submitted 9 May, 2023;
originally announced May 2023.
-
Good Neighbors Are All You Need for Chinese Grapheme-to-Phoneme Conversion
Authors:
Jungjun Kim,
Chang** Han,
Gyuhyeon Nam,
Gyeongsu Chae
Abstract:
Most Chinese Grapheme-to-Phoneme (G2P) systems employ a three-stage framework that first transforms input sequences into character embeddings, obtains linguistic information using language models, and then predicts the phonemes based on global context about the entire input sequence. However, linguistic knowledge alone is often inadequate. Language models frequently encode overly general structure…
▽ More
Most Chinese Grapheme-to-Phoneme (G2P) systems employ a three-stage framework that first transforms input sequences into character embeddings, obtains linguistic information using language models, and then predicts the phonemes based on global context about the entire input sequence. However, linguistic knowledge alone is often inadequate. Language models frequently encode overly general structures of a sentence and fail to cover specific cases needed to use phonetic knowledge. Also, a handcrafted post-processing system is needed to address the problems relevant to the tone of the characters. However, the system exhibits inconsistency in the segmentation of word boundaries which consequently degrades the performance of the G2P system. To address these issues, we propose the Reinforcer that provides strong inductive bias for language models by emphasizing the phonological information between neighboring characters to help disambiguate pronunciations. Experimental results show that the Reinforcer boosts the cutting-edge architectures by a large margin. We also combine the Reinforcer with a large-scale pre-trained model and demonstrate the validity of using neighboring context in knowledge transfer scenarios.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions
Authors:
Geumbyeol Hwang,
Sunwon Hong,
Seunghyun Lee,
Sungwoo Park,
Gyeongsu Chae
Abstract:
For realistic talking head generation, creating natural head motion while maintaining accurate lip synchronization is essential. To fulfill this challenging task, we propose DisCoHead, a novel method to disentangle and control head pose and facial expressions without supervision. DisCoHead uses a single geometric transformation as a bottleneck to isolate and extract head motion from a head-driving…
▽ More
For realistic talking head generation, creating natural head motion while maintaining accurate lip synchronization is essential. To fulfill this challenging task, we propose DisCoHead, a novel method to disentangle and control head pose and facial expressions without supervision. DisCoHead uses a single geometric transformation as a bottleneck to isolate and extract head motion from a head-driving video. Either an affine or a thin-plate spline transformation can be used and both work well as geometric bottlenecks. We enhance the efficiency of DisCoHead by integrating a dense motion estimator and the encoder of a generator which are originally separate modules. Taking a step further, we also propose a neural mix approach where dense motion is estimated and applied implicitly by the encoder. After applying the disentangled head motion to a source identity, DisCoHead controls the mouth region according to speech audio, and it blinks eyes and moves eyebrows following a separate driving video of the eye region, via the weight modulation of convolutional neural networks. The experiments using multiple datasets show that DisCoHead successfully generates realistic audio-and-video-driven talking heads and outperforms state-of-the-art methods. Project page: https://deepbrainai-research.github.io/discohead/
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
Implications of Personality on Cognitive Workload, Affect, and Task Performance in Remote Robot Control
Authors:
Go-Eum Cha,
Wonse Jo,
Byung-Cheol Min
Abstract:
This paper explores how the personality traits of robot operators can influence their task performance during remote control of robots. It is essential to explore the impact of personal dispositions on information processing, both directly and indirectly, when working with robots on specific tasks. To investigate this relationship, we utilize the open-access multi-modal dataset MOCAS to examine th…
▽ More
This paper explores how the personality traits of robot operators can influence their task performance during remote control of robots. It is essential to explore the impact of personal dispositions on information processing, both directly and indirectly, when working with robots on specific tasks. To investigate this relationship, we utilize the open-access multi-modal dataset MOCAS to examine the robot operator's personality traits, affect, cognitive load, and task performance. Our objective is to confirm if personality traits have a total effect, including both direct and indirect effects, that could significantly impact the performance levels of operators. Specifically, we examine the relationship between personality traits such as extroversion, conscientiousness, and agreeableness, and task performance. We conduct a correlation analysis between cognitive load, self-ratings of workload and affect, and quantified individual personality traits along with their experimental scores. The findings show that personality traits do not have a total effect on task performance.
△ Less
Submitted 1 August, 2023; v1 submitted 8 March, 2023;
originally announced March 2023.
-
A Clustering-guided Contrastive Fusion for Multi-view Representation Learning
Authors:
Guanzhou Ke,
Guoqing Chao,
Xiaoli Wang,
Chenyang Xu,
Yongqi Zhu,
Yang Yu
Abstract:
The past two decades have seen increasingly rapid advances in the field of multi-view representation learning due to it extracting useful information from diverse domains to facilitate the development of multi-view applications. However, the community faces two challenges: i) how to learn robust representations from a large amount of unlabeled data to against noise or incomplete views setting, and…
▽ More
The past two decades have seen increasingly rapid advances in the field of multi-view representation learning due to it extracting useful information from diverse domains to facilitate the development of multi-view applications. However, the community faces two challenges: i) how to learn robust representations from a large amount of unlabeled data to against noise or incomplete views setting, and ii) how to balance view consistency and complementary for various downstream tasks. To this end, we utilize a deep fusion network to fuse view-specific representations into the view-common representation, extracting high-level semantics for obtaining robust representation. In addition, we employ a clustering task to guide the fusion network to prevent it from leading to trivial solutions. For balancing consistency and complementary, then, we design an asymmetrical contrastive strategy that aligns the view-common representation and each view-specific representation. These modules are incorporated into a unified method known as CLustering-guided cOntrastiVE fusioN (CLOVEN). We quantitatively and qualitatively evaluate the proposed method on five datasets, demonstrating that CLOVEN outperforms 11 competitive multi-view learning methods in clustering and classification. In the incomplete view scenario, our proposed method resists noise interference better than those of our competitors. Furthermore, the visualization analysis shows that CLOVEN can preserve the intrinsic structure of view-specific representation while also improving the compactness of view-commom representation. Our source code will be available soon at https://github.com/guanzhou-ke/cloven.
△ Less
Submitted 4 August, 2023; v1 submitted 28 December, 2022;
originally announced December 2022.
-
Out of Sight, Out of Mind: A Source-View-Wise Feature Aggregation for Multi-View Image-Based Rendering
Authors:
Geonho Cha,
Chaehun Shin,
Sungroh Yoon,
Dongyoon Wee
Abstract:
To estimate the volume density and color of a 3D point in the multi-view image-based rendering, a common approach is to inspect the consensus existence among the given source image features, which is one of the informative cues for the estimation procedure. To this end, most of the previous methods utilize equally-weighted aggregation features. However, this could make it hard to check the consens…
▽ More
To estimate the volume density and color of a 3D point in the multi-view image-based rendering, a common approach is to inspect the consensus existence among the given source image features, which is one of the informative cues for the estimation procedure. To this end, most of the previous methods utilize equally-weighted aggregation features. However, this could make it hard to check the consensus existence when some outliers, which frequently occur by occlusions, are included in the source image feature set. In this paper, we propose a novel source-view-wise feature aggregation method, which facilitates us to find out the consensus in a robust way by leveraging local structures in the feature set. We first calculate the source-view-wise distance distribution for each source feature for the proposed aggregation. After that, the distance distribution is converted to several similarity distributions with the proposed learnable similarity map** functions. Finally, for each element in the feature set, the aggregation features are extracted by calculating the weighted means and variances, where the weights are derived from the similarity distributions. In experiments, we validate the proposed method on various benchmark datasets, including synthetic and real image scenes. The experimental results demonstrate that incorporating the proposed features improves the performance by a large margin, resulting in the state-of-the-art performance.
△ Less
Submitted 10 June, 2022;
originally announced June 2022.
-
Self-Supervised Depth Estimation with Isometric-Self-Sample-Based Learning
Authors:
Geonho Cha,
Ho-Deok Jang,
Dongyoon Wee
Abstract:
Managing the dynamic regions in the photometric loss formulation has been a main issue for handling the self-supervised depth estimation problem. Most previous methods have alleviated this issue by removing the dynamic regions in the photometric loss formulation based on the masks estimated from another module, making it difficult to fully utilize the training images. In this paper, to handle this…
▽ More
Managing the dynamic regions in the photometric loss formulation has been a main issue for handling the self-supervised depth estimation problem. Most previous methods have alleviated this issue by removing the dynamic regions in the photometric loss formulation based on the masks estimated from another module, making it difficult to fully utilize the training images. In this paper, to handle this problem, we propose an isometric self-sample-based learning (ISSL) method to fully utilize the training images in a simple yet effective way. The proposed method provides additional supervision during training using self-generated images that comply with pure static scene assumption. Specifically, the isometric self-sample generator synthesizes self-samples for each training image by applying random rigid transformations on the estimated depth. Thus both the generated self-samples and the corresponding training image always follow the static scene assumption. We show that plugging our ISSL module into several existing models consistently improves the performance by a large margin. In addition, it also boosts the depth accuracy over different types of scene, i.e., outdoor scenes (KITTI and Make3D) and indoor scene (NYUv2), validating its high effectiveness.
△ Less
Submitted 20 May, 2022;
originally announced May 2022.
-
Correlation between Unconscious Mouse Actions and Human Cognitive Workload
Authors:
Go-Eum Cha,
Byung-Cheol Min
Abstract:
Unconscious behaviors are one of the indicators of the human perception process from a psychological perspective. As a result of perception responses, hand gestures show behavioral responses from given stimuli. Mouse usages in Human-Computer Interaction (HCI) show hand gestures that individuals perceive information processing. This paper presents an investigation of the correlation between unconsc…
▽ More
Unconscious behaviors are one of the indicators of the human perception process from a psychological perspective. As a result of perception responses, hand gestures show behavioral responses from given stimuli. Mouse usages in Human-Computer Interaction (HCI) show hand gestures that individuals perceive information processing. This paper presents an investigation of the correlation between unconscious mouse actions and human cognitive workload. We extracted mouse behaviors from a Robot Operating System (ROS) file-based dataset that user responses are reproducible. We analyzed redundant mouse movements to complete a dual $n$-back game by solely pressing the left and right buttons. Starting from a hypothesis that unconscious mouse behaviors predict different levels of cognitive loads, we statistically analyzed mouse movements. We also validated mouse behaviors with other modalities in the dataset, including self-questionnaire and eye blinking results. As a result, we found that mouse behaviors that occur unconsciously and human cognitive workload correlate.
△ Less
Submitted 18 April, 2022;
originally announced April 2022.
-
Expansion of image space in enhanced-NA Fresnel holographic display
Authors:
Byung Gyu Chae
Abstract:
The enhanced-NA Fresnel hologram reconstructs a holographic image at a viewing angle larger than the diffraction angle of a hologram pixel. The image space is limited by the bandwidth of a digital hologram. In this study, we investigate the property of image formation in the extended image space beyond a diffraction zone. A numerical simulation using the phase Fresnel hologram is carried out to ob…
▽ More
The enhanced-NA Fresnel hologram reconstructs a holographic image at a viewing angle larger than the diffraction angle of a hologram pixel. The image space is limited by the bandwidth of a digital hologram. In this study, we investigate the property of image formation in the extended image space beyond a diffraction zone. A numerical simulation using the phase Fresnel hologram is carried out to observe an extension of image space and the effect of this on the changes in the angular field of view. The phase Fresnel hologram, synthesized by restricting the angular view range to a diffraction angle, can reconstruct a uniform image without high-order noises within the primary viewing zone, which is well confirmed by optical experiments. On the other hand, the overlap** of high-order images is inevitable when the viewing angle depends on the hologram numerical aperture. The high-order images are distributed in the direction cosine space, which could be effectively removed through an angular low-pass filter. We discuss the development of method for expanding the image space while maintaining the viewing angle of a holographic image.
△ Less
Submitted 26 June, 2022; v1 submitted 13 March, 2022;
originally announced March 2022.
-
PriorityCut: Occlusion-guided Regularization for Warp-based Image Animation
Authors:
Wai Ting Cheung,
Gyeongsu Chae
Abstract:
Image animation generates a video of a source image following the motion of a driving video. State-of-the-art self-supervised image animation approaches warp the source image according to the motion of the driving video and recover the war** artifacts by inpainting. These approaches mostly use vanilla convolution for inpainting, and vanilla convolution does not distinguish between valid and inva…
▽ More
Image animation generates a video of a source image following the motion of a driving video. State-of-the-art self-supervised image animation approaches warp the source image according to the motion of the driving video and recover the war** artifacts by inpainting. These approaches mostly use vanilla convolution for inpainting, and vanilla convolution does not distinguish between valid and invalid pixels. As a result, visual artifacts are still noticeable after inpainting. CutMix is a state-of-the-art regularization strategy that cuts and mixes patches of images and is widely studied in different computer vision tasks. Among the remaining computer vision tasks, warp-based image animation is one of the fields that the effects of CutMix have yet to be studied. This paper first presents a preliminary study on the effects of CutMix on warp-based image animation. We observed in our study that CutMix helps improve only pixel values, but disturbs the spatial relationships between pixels. Based on such observation, we propose PriorityCut, a novel augmentation approach that uses the top-k percent occluded pixels of the foreground to regularize warp-based image animation. By leveraging the domain knowledge in warp-based image animation, PriorityCut significantly reduces the war** artifacts in state-of-the-art warp-based image animation models on diverse datasets.
△ Less
Submitted 22 March, 2021;
originally announced March 2021.
-
KoDF: A Large-scale Korean DeepFake Detection Dataset
Authors:
Patrick Kwon,
Jaeseong You,
Gyuhyeon Nam,
Sungwoo Park,
Gyeongsu Chae
Abstract:
A variety of effective face-swap and face-reenactment methods have been publicized in recent years, democratizing the face synthesis technology to a great extent. Videos generated as such have come to be called deepfakes with a negative connotation, for various social problems they have caused. Facing the emerging threat of deepfakes, we have built the Korean DeepFake Detection Dataset (KoDF), a l…
▽ More
A variety of effective face-swap and face-reenactment methods have been publicized in recent years, democratizing the face synthesis technology to a great extent. Videos generated as such have come to be called deepfakes with a negative connotation, for various social problems they have caused. Facing the emerging threat of deepfakes, we have built the Korean DeepFake Detection Dataset (KoDF), a large-scale collection of synthesized and real videos focused on Korean subjects. In this paper, we provide a detailed description of methods used to construct the dataset, experimentally show the discrepancy between the distributions of KoDF and existing deepfake detection datasets, and underline the importance of using multiple datasets for real-world generalization. KoDF is publicly available at https://moneybrain-research.github.io/kodf in its entirety (i.e. real clips, synthesized clips, clips with adversarial attack, and metadata).
△ Less
Submitted 23 August, 2021; v1 submitted 18 March, 2021;
originally announced March 2021.
-
GAN Vocoder: Multi-Resolution Discriminator Is All You Need
Authors:
Jaeseong You,
Dalhyun Kim,
Gyuhyeon Nam,
Geumbyeol Hwang,
Gyeongsu Chae
Abstract:
Several of the latest GAN-based vocoders show remarkable achievements, outperforming autoregressive and flow-based competitors in both qualitative and quantitative measures while synthesizing orders of magnitude faster. In this work, we hypothesize that the common factor underlying their success is the multi-resolution discriminating framework, not the minute details in architecture, loss function…
▽ More
Several of the latest GAN-based vocoders show remarkable achievements, outperforming autoregressive and flow-based competitors in both qualitative and quantitative measures while synthesizing orders of magnitude faster. In this work, we hypothesize that the common factor underlying their success is the multi-resolution discriminating framework, not the minute details in architecture, loss function, or training strategy. We experimentally test the hypothesis by evaluating six different generators paired with one shared multi-resolution discriminating framework. For all evaluative measures with respect to text-to-speech syntheses and for all perceptual metrics, their performances are not distinguishable from one another, which supports our hypothesis.
△ Less
Submitted 23 August, 2021; v1 submitted 9 March, 2021;
originally announced March 2021.
-
Axial Residual Networks for CycleGAN-based Voice Conversion
Authors:
Jaeseong You,
Gyuhyeon Nam,
Dalhyun Kim,
Gyeongsu Chae
Abstract:
We propose a novel architecture and improved training objectives for non-parallel voice conversion. Our proposed CycleGAN-based model performs a shape-preserving transformation directly on a high frequency-resolution magnitude spectrogram, converting its style (i.e. speaker identity) while preserving the speech content. Throughout the entire conversion process, the model does not resort to compres…
▽ More
We propose a novel architecture and improved training objectives for non-parallel voice conversion. Our proposed CycleGAN-based model performs a shape-preserving transformation directly on a high frequency-resolution magnitude spectrogram, converting its style (i.e. speaker identity) while preserving the speech content. Throughout the entire conversion process, the model does not resort to compressed intermediate representations of any sort (e.g. mel spectrogram, low resolution spectrogram, decomposed network feature). We propose an efficient axial residual block architecture to support this expensive procedure and various modifications to the CycleGAN losses to stabilize the training process. We demonstrate via experiments that our proposed model outperforms Scyclone and shows a comparable or better performance to that of CycleGAN-VC2 even without employing a neural vocoder.
△ Less
Submitted 24 August, 2021; v1 submitted 16 February, 2021;
originally announced February 2021.
-
Viewing angle analysis of reconstructed image from digital Fresnel hologram with enhanced numerical aperture
Authors:
Byung Gyu Chae
Abstract:
The viewing-angle enlargement of a holographic image is a crucial factor for realizing the holographic display. The numerical aperture (NA) of digital hologram other than a pixel specification has been known to determine the angular field extent of image. Here, we provide a valid foundation for the dependence of viewing angle on the hologram numerical aperture by investigating mathematically the i…
▽ More
The viewing-angle enlargement of a holographic image is a crucial factor for realizing the holographic display. The numerical aperture (NA) of digital hologram other than a pixel specification has been known to determine the angular field extent of image. Here, we provide a valid foundation for the dependence of viewing angle on the hologram numerical aperture by investigating mathematically the internal structure of the sampled point spread function showing a self-similarity of its modulating curves and especially, analyzing this scheme on the basis of quantum mechanical framework. The enhanced-NA Fresnel hologram generates the multiple images with a high resolution, which can lead to the higher viewing angle represented as the NA of whole aperture of hologram. Optical experiment shows the consistent result with quantum mechanical description of viewing angle of holographic images. Finally, we discuss the method for enlarging viewing angle of holographic image without sacrificing image size by using this scheme.
△ Less
Submitted 25 March, 2021; v1 submitted 30 November, 2020;
originally announced December 2020.
-
The ELFIN Mission
Authors:
V. Angelopoulos,
E. Tsai,
L. Bingley,
C. Shaffer,
D. L. Turner,
A. Runov,
W. Li,
J. Liu,
A. V. Artemyev,
X. -J. Zhang,
R. J. Strangeway,
R. E. Wirz,
Y. Y. Shprits,
V. A. Sergeev,
R. P. Caron,
M. Chung,
P. Cruce,
W. Greer,
E. Grimes,
K. Hector,
M. J. Lawson,
D. Leneman,
E. V. Masongsong,
C. L. Russell,
C. Wilkins
, et al. (57 additional authors not shown)
Abstract:
The Electron Loss and Fields Investigation with a Spatio-Temporal Ambiguity-Resolving option (ELFIN-STAR, or simply: ELFIN) mission comprises two identical 3-Unit (3U) CubeSats on a polar (~93deg inclination), nearly circular, low-Earth (~450 km altitude) orbit. Launched on September 15, 2018, ELFIN is expected to have a >2.5 year lifetime. Its primary science objective is to resolve the mechanism…
▽ More
The Electron Loss and Fields Investigation with a Spatio-Temporal Ambiguity-Resolving option (ELFIN-STAR, or simply: ELFIN) mission comprises two identical 3-Unit (3U) CubeSats on a polar (~93deg inclination), nearly circular, low-Earth (~450 km altitude) orbit. Launched on September 15, 2018, ELFIN is expected to have a >2.5 year lifetime. Its primary science objective is to resolve the mechanism of storm-time relativistic electron precipitation, for which electromagnetic ion cyclotron (EMIC) waves are a prime candidate. From its ionospheric vantage point, ELFIN uses its unique pitch-angle-resolving capability to determine whether measured relativistic electron pitch-angle and energy spectra within the loss cone bear the characteristic signatures of scattering by EMIC waves or whether such scattering may be due to other processes. Pairing identical ELFIN satellites with slowly-variable along-track separation allows disambiguation of spatial and temporal evolution of the precipitation over minutes-to-tens-of-minutes timescales, faster than the orbit period of a single low-altitude satellite (~90min). Each satellite carries an energetic particle detector for electrons (EPDE) that measures 50keV to 5MeV electrons with deltaE/E<40% and a fluxgate magnetometer (FGM) on a ~72cm boom that measures magnetic field waves (e.g., EMIC waves) in the range from DC to 5Hz Nyquist (nominally) with <0.3nT/sqrt(Hz) noise at 1Hz. The spinning satellites (T_spin~3s) are equipped with magnetorquers that permit spin-up/down and reorientation maneuvers. The spin axis is placed normal to the orbit plane, allowing full pitch-angle resolution twice per spin. An energetic particle detector for ions (EPDI) measures 250keV-5MeV ions, addressing secondary science. Funded initially by CalSpace and the University Nanosat Program, ELFIN was selected for flight with joint support from NSF and NASA between 2014 and 2018.
△ Less
Submitted 16 June, 2020; v1 submitted 13 June, 2020;
originally announced June 2020.
-
ROSbag-based Multimodal Affective Dataset for Emotional and Cognitive States
Authors:
Wonse Jo,
Shyam Sundar Kannan,
Go-Eum Cha,
Ahreum Lee,
Byung-Cheol Min
Abstract:
This paper introduces a new ROSbag-based multimodal affective dataset for emotional and cognitive states generated using Robot Operating System (ROS). We utilized images and sounds from the International Affective Pictures System (IAPS) and the International Affective Digitized Sounds (IADS) to stimulate targeted emotions (happiness, sadness, anger, fear, surprise, disgust, and neutral), and a dua…
▽ More
This paper introduces a new ROSbag-based multimodal affective dataset for emotional and cognitive states generated using Robot Operating System (ROS). We utilized images and sounds from the International Affective Pictures System (IAPS) and the International Affective Digitized Sounds (IADS) to stimulate targeted emotions (happiness, sadness, anger, fear, surprise, disgust, and neutral), and a dual N-back game to stimulate different levels of cognitive workload. 30 human subjects participated in the user study; their physiological data was collected using the latest commercial wearable sensors, behavioral data was collected using hardware devices such as cameras, and subjective assessments were carried out through questionnaires. All data was stored in single ROSbag files rather than in conventional Comma-separated values (CSV) files. This not only ensures synchronization of signals and videos in a data set, but also allows researchers to easily analyze and verify their algorithms by connecting directly to this dataset through ROS. The generated affective dataset consists of 1,602 ROSbag files, and size of the dataset is about 787GB. The dataset is made publicly available. We expect that our dataset can be great resource for many researchers in the fields of affective computing, HCI, and HRI.
△ Less
Submitted 20 October, 2020; v1 submitted 9 June, 2020;
originally announced June 2020.
-
A ROS-based Framework for Monitoring Human and Robot Conditions in a Human-Multi-robot Team
Authors:
Wonse Jo,
Shyam Sundar Kannan,
Go-Eum Cha,
Ahreum Lee,
Byung-Cheol Min
Abstract:
This paper presents a framework for monitoring human and robot conditions in human multi-robot interactions. The proposed framework consists of four modules: 1) human and robot conditions monitoring interface, 2) synchronization time filter, 3) data feature extraction interface, and 4) condition monitoring interface. The framework is based on Robot Operating System (ROS), and it supports physiolog…
▽ More
This paper presents a framework for monitoring human and robot conditions in human multi-robot interactions. The proposed framework consists of four modules: 1) human and robot conditions monitoring interface, 2) synchronization time filter, 3) data feature extraction interface, and 4) condition monitoring interface. The framework is based on Robot Operating System (ROS), and it supports physiological and behavioral sensors and devices and robot systems, as well as custom programs. Furthermore, it allows synchronizing the monitoring conditions and sharing them simultaneously. In order to validate the proposed framework, we present experiment results and analysis obtained from the user study where 30 human subjects participated and simulated robot experiments.
△ Less
Submitted 6 June, 2020;
originally announced June 2020.
-
Double-Side Cocatalytic Activation of Anodic TiO$_2$ Nanotube Membranes with Sputter-Coated Pt for Photocatalytic H$_2$ Generation from Water-Ethanol Mixtures
Authors:
Gihoon Cha,
Marco Altomare,
Nhat Truong Nguyen,
Nicola Taccardi,
Kiyoung Lee,
Patrik Schmuki
Abstract:
Self-standing TiO$_2$ nanotube layers in the form of membranes are fabricated by self-organizing anodization of Ti metal and a potential shock technique. The membranes were then decorated by sputtering different Pt amounts i) only at the top, ii) only at the bottom or iii) at both top and bottom of the tube layers. The Pt-decorated membranes are transferred either in tube top up or in tube top dow…
▽ More
Self-standing TiO$_2$ nanotube layers in the form of membranes are fabricated by self-organizing anodization of Ti metal and a potential shock technique. The membranes were then decorated by sputtering different Pt amounts i) only at the top, ii) only at the bottom or iii) at both top and bottom of the tube layers. The Pt-decorated membranes are transferred either in tube top up or in tube top down configuration onto FTO slides and investigated after crystallization as photocatalysts for H$_2$ generation using either front or back-side light irradiation. Double-side Pt-decoration of the tube membranes leads to higher H$_2$ generation rates (independent of tube and light irradiation configuration) compared to membranes decorated at only one side with similar overall Pt amounts. The results suggest that this effect is not ascribed to the overall amount of Pt cocatalyst as such but rather to its distribution at both tube extremities. This leads to optimized light absorption and electron diffusion/transfer dynamics: the central part of the membranes act as light harvesting zone and electrons therein generated can diffuse towards the Pt/TiO$_2$ active zones (tube extremities) where they can react with the environment and generate H$_2$ gas.
△ Less
Submitted 14 April, 2020;
originally announced June 2020.
-
TiO$_2$ nanotubes with different spacing, Fe$_2$O$_3$ decoration and their evaluation for Li-ion battery application
Authors:
Selda Ozkan,
Gihoon Cha,
Anca Mazare,
Patrik Schmuki
Abstract:
In present work, we report on the use of organized TiO$_2$ nanotube layers with a regular intertube spacing for the growth of highly defined $α$-Fe$_2$O$_3$ nano-needles in the interspace. These $α$-Fe$_2$O$_3$ decorated TiO$_2$ NTs are then explored for Li-ion battery applications and compared to classic close-packed NTs that are both decorated with various amounts of nanoscale $α$-Fe$_2$O$_3$. W…
▽ More
In present work, we report on the use of organized TiO$_2$ nanotube layers with a regular intertube spacing for the growth of highly defined $α$-Fe$_2$O$_3$ nano-needles in the interspace. These $α$-Fe$_2$O$_3$ decorated TiO$_2$ NTs are then explored for Li-ion battery applications and compared to classic close-packed NTs that are both decorated with various amounts of nanoscale $α$-Fe$_2$O$_3$. We show that nanotubes with tube-to-tube spacing allow a uniform decoration of individual nanotubes with regular arrangements of hematite nano-needles. The tube spacing also facilitates the electrolyte penetration as well as yields better ion diffusion. While bare close-packed NTs show higher capacitance, e.g., 71 $μ$Ah cm-2 than bare spaced NTs with e.g., 54 $μ$Ah cm-2, the hierarchical decoration with secondary metal oxide, $α$-Fe$_2$O$_3$, remarkably enhances the Li-ion battery performance. Namely, spaced nanotubes with $α$-Fe$_2$O$_3$ decoration have an areal capacitance of 477 $μ$Ah cm-2, i.e., show up to nearly ~8 times higher capacitance. However, the areal capacitance of close-packed NTs with $α$-Fe$_2$O$_3$ decoration saturates at 208 $μ$Ah cm-2, i.e., is limited to ~3 times increase.
△ Less
Submitted 14 April, 2020;
originally announced April 2020.
-
Wide viewing-angle holographic display based on enhanced-NA Fresnel hologram
Authors:
Byung Gyu Chae
Abstract:
The viewing-angle enlargement of a holographic image is a crucial factor for realizing the holographic display. The numerical aperture (NA) of digital hologram other than a pixel specification has been known to determine the angular field extent of image. Here, we provide a valid foundation for the dependence of viewing angle on the hologram numerical aperture by investigating mathematically the i…
▽ More
The viewing-angle enlargement of a holographic image is a crucial factor for realizing the holographic display. The numerical aperture (NA) of digital hologram other than a pixel specification has been known to determine the angular field extent of image. Here, we provide a valid foundation for the dependence of viewing angle on the hologram numerical aperture by investigating mathematically the internal structure of the sampled point spread function showing a self-similarity of its modulating curves. The enhanced-NA Fresnel hologram reconstructs the images at a viewing angle larger than a diffraction angle by a hologram pixel pitch where its angle value is expressed in terms of the NA of whole hologram aperture, which is systematically observed by optical hologram imaging. Finally, we found that the aliased replica noises generated in the enhanced-NA Fresnel diffraction regime are effectively suppressed within the diffraction scope by a digitized pixel. This characteristic enables us to overcome the image reduction and to remove the interference of high-order images, which leads to the wide viewing-angle holographic display. Optical experiments are shown to be consistent with the results of numerical simulation.
△ Less
Submitted 5 December, 2021; v1 submitted 26 April, 2020;
originally announced April 2020.
-
Analysis on image recovery for digital Fresnel hologram with aliased fringe generated from self-similarity of point spread function
Authors:
Byung Gyu Chae
Abstract:
We analyze the aliasing phenomenon for digital Fresnel hologram with an enhanced numerical aperture (NA). The enhanced-NA digital hologram acquired computationally or optically at a closer distance from the object has an aliased fringe generated by undersampling process of the Fresnel prefactor. The point spread function known as Fresnel factor reveals a self-similar envelope when being sampled, w…
▽ More
We analyze the aliasing phenomenon for digital Fresnel hologram with an enhanced numerical aperture (NA). The enhanced-NA digital hologram acquired computationally or optically at a closer distance from the object has an aliased fringe generated by undersampling process of the Fresnel prefactor. The point spread function known as Fresnel factor reveals a self-similar envelope when being sampled, which becomes a crucial mechanism in making this type of aliasing fringe of hologram. We describe that as the enhanced-NA hologram involves already the complementary aliased fringe that might come up in the reconstruction process, the robust recovery of object image can be realized. These behaviors are confirmed through numerical simulation. Based on the analysis of aliased hologram fringe, we provide a method for reconstructing the object image from the enhanced-NA digital Fresnel hologram without the shrinkage of image size.
△ Less
Submitted 26 May, 2020; v1 submitted 18 November, 2019;
originally announced November 2019.
-
Performance of Recommender Systems: Based on Content Navigator and Collaborative Filtering
Authors:
Keum Gang Cha,
Soo-Ryeon Lee,
Jung-Woo Lee,
Seung Bin Baik
Abstract:
In the world of big data, many people find it difficult to access the information they need quickly and accurately. In order to overcome this, research on the system that recommends information accurately to users is continuously conducted. Collaborative Filtering is one of the famous algorithms among the most used in the industry. However, collaborative filtering is difficult to use in online sys…
▽ More
In the world of big data, many people find it difficult to access the information they need quickly and accurately. In order to overcome this, research on the system that recommends information accurately to users is continuously conducted. Collaborative Filtering is one of the famous algorithms among the most used in the industry. However, collaborative filtering is difficult to use in online systems because user recommendation is highly volatile in recommendation quality and requires computation using large matrices. To overcome this problem, this paper proposes a method similar to database queries and a clustering method (Contents Navigator) originating from a complex network.
△ Less
Submitted 18 September, 2019;
originally announced September 2019.
-
DeepCopy: Grounded Response Generation with Hierarchical Pointer Networks
Authors:
Semih Yavuz,
Abhinav Rastogi,
Guan-Lin Chao,
Dilek Hakkani-Tur
Abstract:
Recent advances in neural sequence-to-sequence models have led to promising results for several language generation-based tasks, including dialogue response generation, summarization, and machine translation. However, these models are known to have several problems, especially in the context of chit-chat based dialogue systems: they tend to generate short and dull responses that are often too gene…
▽ More
Recent advances in neural sequence-to-sequence models have led to promising results for several language generation-based tasks, including dialogue response generation, summarization, and machine translation. However, these models are known to have several problems, especially in the context of chit-chat based dialogue systems: they tend to generate short and dull responses that are often too generic. Furthermore, these models do not ground conversational responses on knowledge and facts, resulting in turns that are not accurate, informative and engaging for the users. In this paper, we propose and experiment with a series of response generation models that aim to serve in the general scenario where in addition to the dialogue context, relevant unstructured external knowledge in the form of text is also assumed to be available for models to harness. Our proposed approach extends pointer-generator networks (See et al., 2017) by allowing the decoder to hierarchically attend and copy from external knowledge in addition to the dialogue context. We empirically show the effectiveness of the proposed model compared to several baselines including (Ghazvininejad et al., 2018; Zhang et al., 2018) through both automatic evaluation metrics and human evaluation on CONVAI2 dataset.
△ Less
Submitted 28 August, 2019;
originally announced August 2019.
-
Methods for extending viewing-angle of holographic image by using digital hologram with high numerical aperture
Authors:
Byung Gyu Chae
Abstract:
We investigate the angular field of view (AFOV) of a holographic image reconstructed from the digital Fresnel hologram in holographic display. The theoretical analysis reveals that the AFOV of a holographic image is fundamentally determined by the hologram numerical aperture (HNA) other than a diffraction angle of pixel pitch of a pixelated modulator. This property is proved for various types of t…
▽ More
We investigate the angular field of view (AFOV) of a holographic image reconstructed from the digital Fresnel hologram in holographic display. The theoretical analysis reveals that the AFOV of a holographic image is fundamentally determined by the hologram numerical aperture (HNA) other than a diffraction angle of pixel pitch of a pixelated modulator. This property is proved for various types of the digital holograms by using a numerical simulation and optical experiments. The high-HNA hologram reconstructs the image with a high viewing-angle, although the image contraction is inevitable due to the Nyquist sampling criterion. We propose the method for extending the viewing-angle of a holographic image in the manner of increasing the object size during the high-HNA hologram synthesis and removing the high-order aliasing images.
△ Less
Submitted 2 February, 2020; v1 submitted 25 August, 2019;
originally announced August 2019.
-
Learning Question-Guided Video Representation for Multi-Turn Video Question Answering
Authors:
Guan-Lin Chao,
Abhinav Rastogi,
Semih Yavuz,
Dilek Hakkani-Tür,
**dong Chen,
Ian Lane
Abstract:
Understanding and conversing about dynamic scenes is one of the key capabilities of AI agents that navigate the environment and convey useful information to humans. Video question answering is a specific scenario of such AI-human interaction where an agent generates a natural language response to a question regarding the video of a dynamic scene. Incorporating features from multiple modalities, wh…
▽ More
Understanding and conversing about dynamic scenes is one of the key capabilities of AI agents that navigate the environment and convey useful information to humans. Video question answering is a specific scenario of such AI-human interaction where an agent generates a natural language response to a question regarding the video of a dynamic scene. Incorporating features from multiple modalities, which often provide supplementary information, is one of the challenging aspects of video question answering. Furthermore, a question often concerns only a small segment of the video, hence encoding the entire video sequence using a recurrent neural network is not computationally efficient. Our proposed question-guided video representation module efficiently generates the token-level video summary guided by each word in the question. The learned representations are then fused with the question to generate the answer. Through empirical evaluation on the Audio Visual Scene-aware Dialog (AVSD) dataset, our proposed models in single-turn and multi-turn question answering achieve state-of-the-art performance on several automatic natural language generation evaluation metrics.
△ Less
Submitted 30 July, 2019;
originally announced July 2019.
-
BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer
Authors:
Guan-Lin Chao,
Ian Lane
Abstract:
An important yet rarely tackled problem in dialogue state tracking (DST) is scalability for dynamic ontology (e.g., movie, restaurant) and unseen slot values. We focus on a specific condition, where the ontology is unknown to the state tracker, but the target slot value (except for none and dontcare), possibly unseen during training, can be found as word segment in the dialogue context. Prior appr…
▽ More
An important yet rarely tackled problem in dialogue state tracking (DST) is scalability for dynamic ontology (e.g., movie, restaurant) and unseen slot values. We focus on a specific condition, where the ontology is unknown to the state tracker, but the target slot value (except for none and dontcare), possibly unseen during training, can be found as word segment in the dialogue context. Prior approaches often rely on candidate generation from n-gram enumeration or slot tagger outputs, which can be inefficient or suffer from error propagation. We propose BERT-DST, an end-to-end dialogue state tracker which directly extracts slot values from the dialogue context. We use BERT as dialogue context encoder whose contextualized language representations are suitable for scalable DST to identify slot values from their semantic context. Furthermore, we employ encoder parameter sharing across all slots with two advantages: (1) Number of parameters does not grow linearly with the ontology. (2) Language representation knowledge can be transferred among slots. Empirical evaluation shows BERT-DST with cross-slot parameter sharing outperforms prior work on the benchmark scalable DST datasets Sim-M and Sim-R, and achieves competitive performance on the standard DSTC2 and WOZ 2.0 datasets.
△ Less
Submitted 5 July, 2019;
originally announced July 2019.
-
Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments
Authors:
Guan-Lin Chao,
William Chan,
Ian Lane
Abstract:
Speech recognition in cocktail-party environments remains a significant challenge for state-of-the-art speech recognition systems, as it is extremely difficult to extract an acoustic signal of an individual speaker from a background of overlap** speech with similar frequency and temporal characteristics. We propose the use of speaker-targeted acoustic and audio-visual models for this task. We co…
▽ More
Speech recognition in cocktail-party environments remains a significant challenge for state-of-the-art speech recognition systems, as it is extremely difficult to extract an acoustic signal of an individual speaker from a background of overlap** speech with similar frequency and temporal characteristics. We propose the use of speaker-targeted acoustic and audio-visual models for this task. We complement the acoustic features in a hybrid DNN-HMM model with information of the target speaker's identity as well as visual features from the mouth region of the target speaker. Experimentation was performed using simulated cocktail-party data generated from the GRID audio-visual corpus by overlap** two speakers's speech on a single acoustic channel. Our audio-only baseline achieved a WER of 26.3%. The audio-visual model improved the WER to 4.4%. Introducing speaker identity information had an even more pronounced effect, improving the WER to 3.6%. Combining both approaches, however, did not significantly improve performance further. Our work demonstrates that speaker-targeted models can significantly improve the speech recognition in cocktail party environments.
△ Less
Submitted 13 June, 2019;
originally announced June 2019.
-
Importance of vdW and long-range exchange interactions to DFT-predicted docking energies between plumbagin and cyclodextrins
Authors:
Tom Ichibha,
Ornin Srihakulung,
Guo Chao,
Adie Tri Hanindriyo,
Luckhana Lawtrakul,
Kenta Hongo,
Ryo Maezono
Abstract:
We calculated the docking energies between plumbagin and cyclodextrins, using density functional theory (DFT) with several functionals and some semi-empirical methods. Our DFT results revealed that GD3 dispersion force correction significantly improves the reliability of prediction. Also sufficient amount of long-range exchange is important to make it reliable further, agreeing with the previous w…
▽ More
We calculated the docking energies between plumbagin and cyclodextrins, using density functional theory (DFT) with several functionals and some semi-empirical methods. Our DFT results revealed that GD3 dispersion force correction significantly improves the reliability of prediction. Also sufficient amount of long-range exchange is important to make it reliable further, agreeing with the previous work on argon dimer. In the semi-empirical methods, PM6 and PM7 qualitatively reproduce the stabilization by docking , yet under- and over-estimating the docking energies by ~10 kcal/mol, respectively.
△ Less
Submitted 4 April, 2019;
originally announced April 2019.
-
Supervised Nonnegative Matrix Factorization to Predict ICU Mortality Risk
Authors:
Guoqing Chao,
Chengsheng Mao,
Fei Wang,
Yuan Zhao,
Yuan Luo
Abstract:
ICU mortality risk prediction is a tough yet important task. On one hand, due to the complex temporal data collected, it is difficult to identify the effective features and interpret them easily; on the other hand, good prediction can help clinicians take timely actions to prevent the mortality. These correspond to the interpretability and accuracy problems. Most existing methods lack of the inter…
▽ More
ICU mortality risk prediction is a tough yet important task. On one hand, due to the complex temporal data collected, it is difficult to identify the effective features and interpret them easily; on the other hand, good prediction can help clinicians take timely actions to prevent the mortality. These correspond to the interpretability and accuracy problems. Most existing methods lack of the interpretability, but recently Subgraph Augmented Nonnegative Matrix Factorization (SANMF) has been successfully applied to time series data to provide a path to interpret the features well. Therefore, we adopted this approach as the backbone to analyze the patient data. One limitation of the raw SANMF method is its poor prediction ability due to its unsupervised nature. To deal with this problem, we proposed a supervised SANMF algorithm by integrating the logistic regression loss function into the NMF framework and solved it with an alternating optimization procedure. We used the simulation data to verify the effectiveness of this method, and then we applied it to ICU mortality risk prediction and demonstrated its superiority over other conventional supervised NMF methods.
△ Less
Submitted 8 October, 2018; v1 submitted 27 September, 2018;
originally announced September 2018.
-
A Study on Deep Learning Based Sauvegrain Method for Measurement of Puberty Bone Age
Authors:
Seung Bin Baik,
Keum Gang Cha
Abstract:
This study applies a technique to expand the number of images to a level that allows deep learning. And the applicability of the Sauvegrain method through deep learning with relatively few elbow X-rays is studied. The study was composed of processes similar to the physicians' bone age assessment procedures. The selected reference images were learned without being included in the evaluation data, a…
▽ More
This study applies a technique to expand the number of images to a level that allows deep learning. And the applicability of the Sauvegrain method through deep learning with relatively few elbow X-rays is studied. The study was composed of processes similar to the physicians' bone age assessment procedures. The selected reference images were learned without being included in the evaluation data, and at the same time, the data was extended to accommodate the number of cases. In addition, we adjusted the X-ray images to better images using U-Net and selected the ROI with RPN + so as to be able to perform bone age estimation through CNN. The mean absolute error of the Sauvegrain method based on deep learning is 2.8 months and the Mean Absolute Percentage Error (MAPE) is 0.018. This result shows that X - ray analysis using the Sauvegrain method shows higher accuracy than that of the age group of puberty even in the deep learning base. This means that deep learning of the Suvegrain method can be measured at a level similar to that of an expert, based on the extended X-ray image with the image data extension technique. Finally, we applied the Sauvegrain method to deep learning for accurate measurement of bone age at puberty. As a result, the present study is based on deep learning, and compared with the evaluation results of experts, it is possible to overcome limitations of the method of measuring bone age based on machine learning which was in TW3 or Greulich & Pyle due to lack of X- I confirmed the fact. And we also presented the Sauvegrain method, which is applicable to adolescents as well.
△ Less
Submitted 18 September, 2018;
originally announced September 2018.
-
Interactive Text2Pickup Network for Natural Language based Human-Robot Collaboration
Authors:
Hyemin Ahn,
Sungjoon Choi,
Nuri Kim,
Geonho Cha,
Songhwai Oh
Abstract:
In this paper, we propose the Interactive Text2Pickup (IT2P) network for human-robot collaboration which enables an effective interaction with a human user despite the ambiguity in user's commands. We focus on the task where a robot is expected to pick up an object instructed by a human, and to interact with the human when the given instruction is vague. The proposed network understands the comman…
▽ More
In this paper, we propose the Interactive Text2Pickup (IT2P) network for human-robot collaboration which enables an effective interaction with a human user despite the ambiguity in user's commands. We focus on the task where a robot is expected to pick up an object instructed by a human, and to interact with the human when the given instruction is vague. The proposed network understands the command from the human user and estimates the position of the desired object first. To handle the inherent ambiguity in human language commands, a suitable question which can resolve the ambiguity is generated. The user's answer to the question is combined with the initial command and given back to the network, resulting in more accurate estimation. The experiment results show that given unambiguous commands, the proposed method can estimate the position of the requested object with an accuracy of 98.49% based on our test dataset. Given ambiguous language commands, we show that the accuracy of the pick up task increases by 1.94 times after incorporating the information obtained from the interaction.
△ Less
Submitted 28 May, 2018;
originally announced May 2018.
-
Deep Pose Consensus Networks
Authors:
Geonho Cha,
Minsik Lee,
Jungchan Cho,
Songhwai Oh
Abstract:
In this paper, we address the problem of estimating a 3D human pose from a single image, which is important but difficult to solve due to many reasons, such as self-occlusions, wild appearance changes, and inherent ambiguities of 3D estimation from a 2D cue. These difficulties make the problem ill-posed, which have become requiring increasingly complex estimators to enhance the performance. On the…
▽ More
In this paper, we address the problem of estimating a 3D human pose from a single image, which is important but difficult to solve due to many reasons, such as self-occlusions, wild appearance changes, and inherent ambiguities of 3D estimation from a 2D cue. These difficulties make the problem ill-posed, which have become requiring increasingly complex estimators to enhance the performance. On the other hand, most existing methods try to handle this problem based on a single complex estimator, which might not be good solutions. In this paper, to resolve this issue, we propose a multiple-partial-hypothesis-based framework for the problem of estimating 3D human pose from a single image, which can be fine-tuned in an end-to-end fashion. We first select several joint groups from a human joint model using the proposed sampling scheme, and estimate the 3D poses of each joint group separately based on deep neural networks. After that, they are aggregated to obtain the final 3D poses using the proposed robust optimization formula. The overall procedure can be fine-tuned in an end-to-end fashion, resulting in better performance. In the experiments, the proposed framework shows the state-of-the-art performances on popular benchmark data sets, namely Human3.6M and HumanEva, which demonstrate the effectiveness of the proposed framework.
△ Less
Submitted 7 October, 2019; v1 submitted 21 March, 2018;
originally announced March 2018.
-
Multi-View Sparse Vector Decomposition to Deal With Missing Values in Alcohol Dependence Study
Authors:
Guoqing Chao
Abstract:
Due to the heterogeneity of the phenotype defined by Diagnostic and Statistical Manual of Mental Disorders (DSM) IV, it is not an optimal option to identify the genetic variation that underlies the risk for alcohol dependence (AD) and identifying subtypes of AD becomes an important topic. Traditional unsupervised cluster analysis and latent class analysis are the most commonly used methods to obta…
▽ More
Due to the heterogeneity of the phenotype defined by Diagnostic and Statistical Manual of Mental Disorders (DSM) IV, it is not an optimal option to identify the genetic variation that underlies the risk for alcohol dependence (AD) and identifying subtypes of AD becomes an important topic. Traditional unsupervised cluster analysis and latent class analysis are the most commonly used methods to obtain the subtypes, but without the guidance of the genetic information, all these methods may lead to subtypes of little utility in genetic analysis. Recently, some multi-view co-clustering methods are proposed to ameliorate this drawback. However, these new methods did not take the missing values inside the data into consideration. To get around this limitation, we extended one of the multi-view methods to dealing with the missing values and clustering simultaneously. We applied this method to 2230 European-American sample and found that the well-known generic variant rs1229984 (in the ADH1B candidate gene) for the subtype is more significant than that corresponding to case-control association test. Finally, we verify it on the 1707 replication sample and find it significant, too.
△ Less
Submitted 29 May, 2018; v1 submitted 26 December, 2017;
originally announced December 2017.
-
A Survey on Multi-View Clustering
Authors:
Guoqing Chao,
Shiliang Sun,
**bo Bi
Abstract:
With advances in information acquisition technologies, multi-view data become ubiquitous. Multi-view learning has thus become more and more popular in machine learning and data mining fields. Multi-view unsupervised or semi-supervised learning, such as co-training, co-regularization has gained considerable attention. Although recently, multi-view clustering (MVC) methods have been developed rapidl…
▽ More
With advances in information acquisition technologies, multi-view data become ubiquitous. Multi-view learning has thus become more and more popular in machine learning and data mining fields. Multi-view unsupervised or semi-supervised learning, such as co-training, co-regularization has gained considerable attention. Although recently, multi-view clustering (MVC) methods have been developed rapidly, there has not been a survey to summarize and analyze the current progress. Therefore, this paper reviews the common strategies for combining multiple views of data and based on this summary we propose a novel taxonomy of the MVC approaches. We further discuss the relationships between MVC and multi-view representation, ensemble clustering, multi-task clustering, multi-view supervised and semi-supervised learning. Several representative real-world applications are elaborated. To promote future development of MVC, we envision several open problems that may require further investigation and thorough examination.
△ Less
Submitted 3 April, 2018; v1 submitted 17 December, 2017;
originally announced December 2017.
-
Neural network image reconstruction for magnetic particle imaging
Authors:
Byung Gyu Chae
Abstract:
We investigate neural network image reconstruction for magnetic particle imaging. The network performance depends strongly on the convolution effects of the spectrum input data. The larger convolution effect appearing at a relatively smaller nanoparticle size obstructs the network training. The trained single-layer network reveals the weighting matrix consisted of a basis vector in the form of Che…
▽ More
We investigate neural network image reconstruction for magnetic particle imaging. The network performance depends strongly on the convolution effects of the spectrum input data. The larger convolution effect appearing at a relatively smaller nanoparticle size obstructs the network training. The trained single-layer network reveals the weighting matrix consisted of a basis vector in the form of Chebyshev polynomials of the second kind. The weighting matrix corresponds to an inverse system matrix, where an incoherency of basis vectors due to a low convolution effects as well as a nonlinear activation function plays a crucial role in retrieving the matrix elements. Test images are well reconstructed through trained networks having an inverse kernel matrix. We also confirm that a multi-layer network with one hidden layer improves the performance. The architecture of a neural network overcoming the low incoherence of the inverse kernel through the classification property will become a better tool for image reconstruction.
△ Less
Submitted 21 September, 2017;
originally announced September 2017.
-
Free standing membranes to study the optical properties of anodic TiO2 nanotube layers
Authors:
Gihoon Cha,
Patrik Schmuki,
Marco Altomare
Abstract:
In the present work we investigate various optical properties (such as light absorption and reflectance) of anodic TiO2 nanotubes layers directly transferred as self-standing membranes onto quartz substrates. This allows investigation in a transmission geometry which provides significantly more reliable data than measurements on the metallic Ti substrate. Light transmission and reflectance measure…
▽ More
In the present work we investigate various optical properties (such as light absorption and reflectance) of anodic TiO2 nanotubes layers directly transferred as self-standing membranes onto quartz substrates. This allows investigation in a transmission geometry which provides significantly more reliable data than measurements on the metallic Ti substrate. Light transmission and reflectance measurements were carried out for layers of thicknesses varying from 1.8 to 50 micrometer, and the layers were investigated in their amorphous and crystalline form. A series of wavelength-dependent light attenuation coefficients are extrapolated and found to match the photocurrent vs. irradiation wavelength behavior. However, a feature specific to anodic nanotubes is that their intrinsic carbon content causes a sub-bandgap response that is proportional to the carbon contamination content in the TiO2 nanotubes. Overall the extracted data provide valuable basis and understanding for the design of photo-electrochemical devices based on TiO2 nanotubes.
△ Less
Submitted 16 October, 2016;
originally announced October 2016.
-
A Comparison of Anodic TiO2 Nanotube Membranes used for Front-side Illuminated Dye-Sensitized Solar Cells
Authors:
Fatemeh Mohammadpour,
Mahmood Moradi,
Gihoon Cha,
Seulgi So,
Kiyoung Lee,
Marco Altomare,
Patrik Schmuki
Abstract:
In the present work we compare TiO2 nanotube lift-off strategies for the construction of front-side illuminated dye-sensitized solar cells (DSSCs). Anodic nanotube layers were detached from the metallic back contact by using different techniques and transferred onto an FTO substrate. We show that if we use an optimized potential step treatment to fabricate membranes, DSSC cell efficiencies can be…
▽ More
In the present work we compare TiO2 nanotube lift-off strategies for the construction of front-side illuminated dye-sensitized solar cells (DSSCs). Anodic nanotube layers were detached from the metallic back contact by using different techniques and transferred onto an FTO substrate. We show that if we use an optimized potential step treatment to fabricate membranes, DSSC cell efficiencies can be significantly increased (>8%). This improved efficiency is ascribed to higher specific dye-loading and enhanced electron transport properties of optimally fabricated TiO2 nanotube membranes.
△ Less
Submitted 16 October, 2016;
originally announced October 2016.
-
Enhanced Performance of Dye-Sensitized Solar Cells based on TiO2 Nanotube Membranes using Optimized Annealing Profile
Authors:
F. Mohammadpour,
M. Moradi,
K. Lee,
G. Cha,
S. So,
A. Kahnt,
D. M. Guldi,
M. Altomare,
P. Schmuki
Abstract:
We use free-standing TiO2 nanotube membranes that are transferred onto FTO slides in front-side illuminated dye-sensitized solar cells (DSSCs). We investigate the key parameters for solar cell arrangement of self-ordered anodic TiO2 nanotube layers on the FTO substrate and namely the influence of the annealing procedure on the DSSC light conversion efficiency. The results show that using an optima…
▽ More
We use free-standing TiO2 nanotube membranes that are transferred onto FTO slides in front-side illuminated dye-sensitized solar cells (DSSCs). We investigate the key parameters for solar cell arrangement of self-ordered anodic TiO2 nanotube layers on the FTO substrate and namely the influence of the annealing procedure on the DSSC light conversion efficiency. The results show that using an optimal temperature annealing profile can significantly enhance the DSSC efficiency (in our case 9.8 %), as it leads to a markedly lower density of trap** states in the tube oxide, and thus to strongly improved electron transport properties.
△ Less
Submitted 14 October, 2016;
originally announced October 2016.
-
Topographical study of TiO2 nanostructure surface for photocatalytic hydrogen production
Authors:
Gihoon Cha,
Kiyoung Lee,
JeongEun Yoo,
Manuela S. Killian,
Patrik Schmuki
Abstract:
In the present work we investigate the photocatalytic hydrogen production (water splitting) activity of different Pt loaded TiO2 nanotube layers. Therefore, we fabricate free standing membranes and fix them in four different configurations: top up (with initiation layer grass or without) and bottom up (bottom closed or open), then decorate the tubes with various amounts of Pt and measure the open-…
▽ More
In the present work we investigate the photocatalytic hydrogen production (water splitting) activity of different Pt loaded TiO2 nanotube layers. Therefore, we fabricate free standing membranes and fix them in four different configurations: top up (with initiation layer grass or without) and bottom up (bottom closed or open), then decorate the tubes with various amounts of Pt and measure the open-circuit photocatalytic H2 production rate. We find a strong influence of the configuration with the open top morphology showing the highest photocatalytic hydrogen production efficiency, these nanotubes yield 3.5 times more H2 than the least efficient structure (bottom closed). The work therefore provides valuable guidelines for optimizing TiO2 nanotube layers for photocatalytic applications.
△ Less
Submitted 13 October, 2016;
originally announced October 2016.
-
City-Identification of Flickr Videos Using Semantic Acoustic Features
Authors:
Benjamin Elizalde,
Guan-Lin Chao,
Ming Zeng,
Ian Lane
Abstract:
City-identification of videos aims to determine the likelihood of a video belonging to a set of cities. In this paper, we present an approach using only audio, thus we do not use any additional modality such as images, user-tags or geo-tags. In this manner, we show to what extent the city-location of videos correlates to their acoustic information. Success in this task suggests improvements can be…
▽ More
City-identification of videos aims to determine the likelihood of a video belonging to a set of cities. In this paper, we present an approach using only audio, thus we do not use any additional modality such as images, user-tags or geo-tags. In this manner, we show to what extent the city-location of videos correlates to their acoustic information. Success in this task suggests improvements can be made to complement the other modalities. In particular, we present a method to compute and use semantic acoustic features to perform city-identification and the features show semantic evidence of the identification. The semantic evidence is given by a taxonomy of urban sounds and expresses the potential presence of these sounds in the city- soundtracks. We used the MediaEval Placing Task set, which contains Flickr videos labeled by city. In addition, we used the UrbanSound8K set containing audio clips labeled by sound- type. Our method improved the state-of-the-art performance and provides a novel semantic approach to this task
△ Less
Submitted 12 July, 2016;
originally announced July 2016.
-
Wide viewing angle realization for sampled hologram by collecting high-order diffraction beams
Authors:
Byung Gyu Chae
Abstract:
We propose that viewing angle expansion of the holographic image can be realized by using high-order diffraction beams caused by the pixel structure sampling the hologram data. The diffractive beam propagating to new optical axis direction plays a role in a modulated carrier similar to a carrier signal of the off-axis holography, which makes new viewing zone of the reconstruction image. The recons…
▽ More
We propose that viewing angle expansion of the holographic image can be realized by using high-order diffraction beams caused by the pixel structure sampling the hologram data. The diffractive beam propagating to new optical axis direction plays a role in a modulated carrier similar to a carrier signal of the off-axis holography, which makes new viewing zone of the reconstruction image. The reconstructed image in the Fresnel hologram is deformed along new viewing direction, whereas the Fourier hologram enables to retrieve three-dimensional image with other perspective. High resolution hologram fringe is imaged on the image plane through an imaging system, and thus, only collection of diffracted beams increases a viewing zone angle. We verify our proposal through the numerical analysis for the sampled hologram showing high-order diffraction beams with various viewing zones.
△ Less
Submitted 23 December, 2013; v1 submitted 17 March, 2013;
originally announced March 2013.
-
Charge pairing by quantum entanglement in strongly correlated electron systems
Authors:
Byung Gyu Chae
Abstract:
Various charge pairings in strongly correlated electron systems are interpreted as quantum entanglement of a composite system. Particles in the intermediate phase have a tendency to form the coherent superposition state of the localized state and the itinerant state, which induces the entanglement of both particles in the bipartite subsystems for increasing the entropy of the system. The correctio…
▽ More
Various charge pairings in strongly correlated electron systems are interpreted as quantum entanglement of a composite system. Particles in the intermediate phase have a tendency to form the coherent superposition state of the localized state and the itinerant state, which induces the entanglement of both particles in the bipartite subsystems for increasing the entropy of the system. The correction to the entropic Coulomb force becomes an immediate cause of charge pairing.
△ Less
Submitted 16 April, 2012; v1 submitted 5 March, 2012;
originally announced March 2012.
-
Quantum decoherence in strongly correlated electron systems
Authors:
Byung Gyu Chae
Abstract:
Complexity in strongly correlated electron systems is analyzed by considering decoherence process between the localized state, |L> and the itinerant state, |I>. The coherent superposition state of a|I> + b|L> decoheres to the pointer states in the proximity of both extremes of the correlation where the symmetry-breaking ground states of the charge pairing emerge. For maximizing the entropy of the…
▽ More
Complexity in strongly correlated electron systems is analyzed by considering decoherence process between the localized state, |L> and the itinerant state, |I>. The coherent superposition state of a|I> + b|L> decoheres to the pointer states in the proximity of both extremes of the correlation where the symmetry-breaking ground states of the charge pairing emerge. For maximizing the entropy of the system, the superconducting pairing and the spin density wave coexist within the uncertainty principle, which invokes the metastable states as like pseudogap phase and electronic inhomogenity.
△ Less
Submitted 3 January, 2011; v1 submitted 8 November, 2010;
originally announced November 2010.
-
Itinerancy-localization duality of quasiparticles revealed by strong correlation
Authors:
Byung Gyu Chae
Abstract:
The strong interaction between electrons reveals the duality of the itinerancy and the localization of quasiparticles. The physical phenomena corresponding to each component of the duality could be realized and coexist within the category of the uncertainty principle of the carrier dynamics, which can be a strong reason of the complexity appearing in the strongly correlated system. A possible me…
▽ More
The strong interaction between electrons reveals the duality of the itinerancy and the localization of quasiparticles. The physical phenomena corresponding to each component of the duality could be realized and coexist within the category of the uncertainty principle of the carrier dynamics, which can be a strong reason of the complexity appearing in the strongly correlated system. A possible mechanism for the high-temperature superconductivity is proposed on the basis of the interplay between the renormalized expectation quantities of both parts.
△ Less
Submitted 4 March, 2010; v1 submitted 31 December, 2009;
originally announced January 2010.
-
Electrostatic modification of infrared response in gated structures based on VO2
Authors:
M. M. Qazilbash,
Z. Q. Li,
V. Podzorov,
M. Brehm,
F. Keilmann,
B. G. Chae,
H. T. Kim,
D. N. Basov
Abstract:
We investigate the changes in the infrared response due to charge carriers introduced by electrostatic do** of the correlated insulator vanadium dioxide (VO2) integrated in the architecture of the field effect transistor. Accumulation of holes at the VO2 interface with the gate dielectric leads to an increase in infrared absorption. This phenomenon is observed only in the insulator-to-metal tr…
▽ More
We investigate the changes in the infrared response due to charge carriers introduced by electrostatic do** of the correlated insulator vanadium dioxide (VO2) integrated in the architecture of the field effect transistor. Accumulation of holes at the VO2 interface with the gate dielectric leads to an increase in infrared absorption. This phenomenon is observed only in the insulator-to-metal transition regime of VO2 with coexisting metallic and insulating regions. We postulate that doped holes lead to the growth of the metallic islands thereby promoting percolation, an effect that persists upon removal of the applied gate voltage.
△ Less
Submitted 30 June, 2008;
originally announced June 2008.