-
DefakeHop: A Light-Weight High-Performance Deepfake Detector
Authors:
Hong-Shuo Chen,
Mozhdeh Rouhsedaghat,
Hamza Ghani,
Shuowen Hu,
Suya You,
C. -C. Jay Kuo
Abstract:
A light-weight high-performance Deepfake detection method, called DefakeHop, is proposed in this work. State-of-the-art Deepfake detection methods are built upon deep neural networks. DefakeHop extracts features automatically using the successive subspace learning (SSL) principle from various parts of face images. The features are extracted by c/w Saab transform and further processed by our featur…
▽ More
A light-weight high-performance Deepfake detection method, called DefakeHop, is proposed in this work. State-of-the-art Deepfake detection methods are built upon deep neural networks. DefakeHop extracts features automatically using the successive subspace learning (SSL) principle from various parts of face images. The features are extracted by c/w Saab transform and further processed by our feature distillation module using spatial dimension reduction and soft classification for each channel to get a more concise description of the face. Extensive experiments are conducted to demonstrate the effectiveness of the proposed DefakeHop method. With a small model size of 42,845 parameters, DefakeHop achieves state-of-the-art performance with the area under the ROC curve (AUC) of 100%, 94.95%, and 90.56% on UADFV, Celeb-DF v1 and Celeb-DF v2 datasets, respectively.
△ Less
Submitted 11 March, 2021;
originally announced March 2021.
-
Successive Subspace Learning: An Overview
Authors:
Mozhdeh Rouhsedaghat,
Masoud Monajatipoor,
Zohreh Azizi,
C. -C. Jay Kuo
Abstract:
Successive Subspace Learning (SSL) offers a light-weight unsupervised feature learning method based on inherent statistical properties of data units (e.g. image pixels and points in point cloud sets). It has shown promising results, especially on small datasets. In this paper, we intuitively explain this method, provide an overview of its development, and point out some open questions and challeng…
▽ More
Successive Subspace Learning (SSL) offers a light-weight unsupervised feature learning method based on inherent statistical properties of data units (e.g. image pixels and points in point cloud sets). It has shown promising results, especially on small datasets. In this paper, we intuitively explain this method, provide an overview of its development, and point out some open questions and challenges for future research.
△ Less
Submitted 26 February, 2021;
originally announced March 2021.
-
Terrestrial Probes of Electromagnetically Interacting Dark Radiation
Authors:
Jui-Lin Kuo,
Maxim Pospelov,
Josef Pradler
Abstract:
We study the possibility that dark radiation, sourced through the decay of dark matter in the late Universe, carries electromagnetic interactions. The relativistic flux of particles induces recoil signals in direct detection and neutrino experiments through its interaction with millicharge, electric/magnetic dipole moments, or anapole moment/charge radius. Taking the DM lifetime as 35 times the ag…
▽ More
We study the possibility that dark radiation, sourced through the decay of dark matter in the late Universe, carries electromagnetic interactions. The relativistic flux of particles induces recoil signals in direct detection and neutrino experiments through its interaction with millicharge, electric/magnetic dipole moments, or anapole moment/charge radius. Taking the DM lifetime as 35 times the age of the Universe, as currently cosmologically allowed, we show that direct detection (neutrino) experiments have complementary sensitivity down to $ε\sim 10^{-11}$ $(10^{-12})$, $d_χ/μ_χ\sim 10^{-9}\,μ_B$ $(10^{-13}μ_B)$, and $a_χ/b_χ\sim 10^{-2}\,{\rm GeV}^{-2}$ $(10^{-8}\,{\rm GeV}^{-2})$ on the respective couplings. Finally, we show that such dark radiation can lead to a satisfactory explanation of the recently observed XENON1T excess in the electron recoil signal without being in conflict with other bounds.
△ Less
Submitted 3 June, 2021; v1 submitted 16 February, 2021;
originally announced February 2021.
-
A Machine Learning Approach to Optimal Inverse Discrete Cosine Transform (IDCT) Design
Authors:
Yifan Wang,
Zhanxuan Mei,
Chia-Yang Tsai,
Ioannis Katsavounidis,
C. -C. Jay Kuo
Abstract:
The design of the optimal inverse discrete cosine transform (IDCT) to compensate the quantization error is proposed for effective lossy image compression in this work. The forward and inverse DCTs are designed in pair in current image/video coding standards without taking the quantization effect into account. Yet, the distribution of quantized DCT coefficients deviate from that of original DCT coe…
▽ More
The design of the optimal inverse discrete cosine transform (IDCT) to compensate the quantization error is proposed for effective lossy image compression in this work. The forward and inverse DCTs are designed in pair in current image/video coding standards without taking the quantization effect into account. Yet, the distribution of quantized DCT coefficients deviate from that of original DCT coefficients. This is particularly obvious when the quality factor of JPEG compressed images is small. To address this problem, we first use a set of training images to learn the compound effect of forward DCT, quantization and dequantization in cascade. Then, a new IDCT kernel is learned to reverse the effect of such a pipeline. Experiments are conducted to demonstrate that the advantage of the new method, which has a gain of 0.11-0.30dB over the standard JPEG over a wide range of quality factors.
△ Less
Submitted 31 January, 2021;
originally announced February 2021.
-
A convolutional-neural-network estimator of CMB constraints on dark matter energy injection
Authors:
Wei-Chih Huang,
Jui-Lin Kuo,
Yue-Lin Sming Tsai
Abstract:
We show that the impact of energy injection by dark matter annihilation on the cosmic microwave background power spectra can be apprehended via a residual likelihood map. By resorting to convolutional neural networks that can fully discover the underlying pattern of the map, we propose a novel way of constraining dark matter annihilation based on the Planck 2018 data. We demonstrate that the train…
▽ More
We show that the impact of energy injection by dark matter annihilation on the cosmic microwave background power spectra can be apprehended via a residual likelihood map. By resorting to convolutional neural networks that can fully discover the underlying pattern of the map, we propose a novel way of constraining dark matter annihilation based on the Planck 2018 data. We demonstrate that the trained neural network can efficiently predict the likelihood and accurately place bounds on the annihilation cross-section in a $\textit{model-independent}$ fashion. The machinery will be made public in the near future.
△ Less
Submitted 3 June, 2021; v1 submitted 25 January, 2021;
originally announced January 2021.
-
Symmetric-Constrained Irregular Structure Inpainting for Brain MRI Registration with Tumor Pathology
Authors:
Xiaofeng Liu,
Fangxu Xing,
Chao Yang,
C. -C. Jay Kuo,
Georges ElFakhri,
Jonghye Woo
Abstract:
Deformable registration of magnetic resonance images between patients with brain tumors and healthy subjects has been an important tool to specify tumor geometry through location alignment and facilitate pathological analysis. Since tumor region does not match with any ordinary brain tissue, it has been difficult to deformably register a patients brain to a normal one. Many patient images are asso…
▽ More
Deformable registration of magnetic resonance images between patients with brain tumors and healthy subjects has been an important tool to specify tumor geometry through location alignment and facilitate pathological analysis. Since tumor region does not match with any ordinary brain tissue, it has been difficult to deformably register a patients brain to a normal one. Many patient images are associated with irregularly distributed lesions, resulting in further distortion of normal tissue structures and complicating registration's similarity measure. In this work, we follow a multi-step context-aware image inpainting framework to generate synthetic tissue intensities in the tumor region. The coarse image-to-image translation is applied to make a rough inference of the missing parts. Then, a feature-level patch-match refinement module is applied to refine the details by modeling the semantic relevance between patch-wise features. A symmetry constraint reflecting a large degree of anatomical symmetry in the brain is further proposed to achieve better structure understanding. Deformable registration is applied between inpainted patient images and normal brains, and the resulting deformation field is eventually used to deform original patient data for the final alignment. The method was applied to the Multimodal Brain Tumor Segmentation (BraTS) 2018 challenge database and compared against three existing inpainting methods. The proposed method yielded results with increased peak signal-to-noise ratio, structural similarity index, inception score, and reduced L1 error, leading to successful patient-to-normal brain image registration.
△ Less
Submitted 17 January, 2021;
originally announced January 2021.
-
VoxelHop: Successive Subspace Learning for ALS Disease Classification Using Structural MRI
Authors:
Xiaofeng Liu,
Fangxu Xing,
Chao Yang,
C. -C. Jay Kuo,
Suma Babu,
Georges El Fakhri,
Thomas Jenkins,
Jonghye Woo
Abstract:
Deep learning has great potential for accurate detection and classification of diseases with medical imaging data, but the performance is often limited by the number of training datasets and memory requirements. In addition, many deep learning models are considered a "black-box," thereby often limiting their adoption in clinical applications. To address this, we present a successive subspace learn…
▽ More
Deep learning has great potential for accurate detection and classification of diseases with medical imaging data, but the performance is often limited by the number of training datasets and memory requirements. In addition, many deep learning models are considered a "black-box," thereby often limiting their adoption in clinical applications. To address this, we present a successive subspace learning model, termed VoxelHop, for accurate classification of Amyotrophic Lateral Sclerosis (ALS) using T2-weighted structural MRI data. Compared with popular convolutional neural network (CNN) architectures, VoxelHop has modular and transparent structures with fewer parameters without any backpropagation, so it is well-suited to small dataset size and 3D imaging data. Our VoxelHop has four key components, including (1) sequential expansion of near-to-far neighborhood for multi-channel 3D data; (2) subspace approximation for unsupervised dimension reduction; (3) label-assisted regression for supervised dimension reduction; and (4) concatenation of features and classification between controls and patients. Our experimental results demonstrate that our framework using a total of 20 controls and 26 patients achieves an accuracy of 93.48$\%$ and an AUC score of 0.9394 in differentiating patients from controls, even with a relatively small number of datasets, showing its robustness and effectiveness. Our thorough evaluations also show its validity and superiority to the state-of-the-art 3D CNN classification methods. Our framework can easily be generalized to other classification tasks using different imaging modalities.
△ Less
Submitted 13 January, 2021;
originally announced January 2021.
-
Protecting Big Data Privacy Using Randomized Tensor Network Decomposition and Dispersed Tensor Computation
Authors:
Jenn-Bing Ong,
Wee-Keong Ng,
Ivan Tjuawinata,
Chao Li,
Jielin Yang,
Sai None Myne,
Huaxiong Wang,
Kwok-Yan Lam,
C. -C. Jay Kuo
Abstract:
Data privacy is an important issue for organizations and enterprises to securely outsource data storage, sharing, and computation on clouds / fogs. However, data encryption is complicated in terms of the key management and distribution; existing secure computation techniques are expensive in terms of computational / communication cost and therefore do not scale to big data computation. Tensor netw…
▽ More
Data privacy is an important issue for organizations and enterprises to securely outsource data storage, sharing, and computation on clouds / fogs. However, data encryption is complicated in terms of the key management and distribution; existing secure computation techniques are expensive in terms of computational / communication cost and therefore do not scale to big data computation. Tensor network decomposition and distributed tensor computation have been widely used in signal processing and machine learning for dimensionality reduction and large-scale optimization. However, the potential of distributed tensor networks for big data privacy preservation have not been considered before, this motivates the current study. Our primary intuition is that tensor network representations are mathematically non-unique, unlinkable, and uninterpretable; tensor network representations naturally support a range of multilinear operations for compressed and distributed / dispersed computation. Therefore, we propose randomized algorithms to decompose big data into randomized tensor network representations and analyze the privacy leakage for 1D to 3D data tensors. The randomness mainly comes from the complex structural information commonly found in big data; randomization is based on controlled perturbation applied to the tensor blocks prior to decomposition. The distributed tensor representations are dispersed on multiple clouds / fogs or servers / devices with metadata privacy, this provides both distributed trust and management to seamlessly secure big data storage, communication, sharing, and computation. Experiments show that the proposed randomization techniques are helpful for big data anonymization and efficient for big data storage and computation.
△ Less
Submitted 4 January, 2021;
originally announced January 2021.
-
GraphHop: An Enhanced Label Propagation Method for Node Classification
Authors:
Tian Xie,
Bin Wang,
C. -C. Jay Kuo
Abstract:
A scalable semi-supervised node classification method on graph-structured data, called GraphHop, is proposed in this work. The graph contains attributes of all nodes but labels of a few nodes. The classical label propagation (LP) method and the emerging graph convolutional network (GCN) are two popular semi-supervised solutions to this problem. The LP method is not effective in modeling node attri…
▽ More
A scalable semi-supervised node classification method on graph-structured data, called GraphHop, is proposed in this work. The graph contains attributes of all nodes but labels of a few nodes. The classical label propagation (LP) method and the emerging graph convolutional network (GCN) are two popular semi-supervised solutions to this problem. The LP method is not effective in modeling node attributes and labels jointly or facing a slow convergence rate on large-scale graphs. GraphHop is proposed to its shortcoming. With proper initial label vector embeddings, each iteration of GraphHop contains two steps: 1) label aggregation and 2) label update. In Step 1, each node aggregates its neighbors' label vectors obtained in the previous iteration. In Step 2, a new label vector is predicted for each node based on the label of the node itself and the aggregated label information obtained in Step 1. This iterative procedure exploits the neighborhood information and enables GraphHop to perform well in an extremely small label rate setting and scale well for very large graphs. Experimental results show that GraphHop outperforms state-of-the-art graph learning methods on a wide range of tasks (e.g., multi-label and multi-class classification on citation networks, social graphs, and commodity consumption graphs) in graphs of various sizes. Our codes are publicly available on GitHub (https://github.com/TianXieUSC/GraphHop).
△ Less
Submitted 6 January, 2021;
originally announced January 2021.
-
Subtype-aware Unsupervised Domain Adaptation for Medical Diagnosis
Authors:
Xiaofeng Liu,
Xiongchang Liu,
Bo Hu,
Wenxuan Ji,
Fangxu Xing,
Jun Lu,
Jane You,
C. -C. Jay Kuo,
Georges El Fakhri,
Jonghye Woo
Abstract:
Recent advances in unsupervised domain adaptation (UDA) show that transferable prototypical learning presents a powerful means for class conditional alignment, which encourages the closeness of cross-domain class centroids. However, the cross-domain inner-class compactness and the underlying fine-grained subtype structure remained largely underexplored. In this work, we propose to adaptively carry…
▽ More
Recent advances in unsupervised domain adaptation (UDA) show that transferable prototypical learning presents a powerful means for class conditional alignment, which encourages the closeness of cross-domain class centroids. However, the cross-domain inner-class compactness and the underlying fine-grained subtype structure remained largely underexplored. In this work, we propose to adaptively carry out the fine-grained subtype-aware alignment by explicitly enforcing the class-wise separation and subtype-wise compactness with intermediate pseudo labels. Our key insight is that the unlabeled subtypes of a class can be divergent to one another with different conditional and label shifts, while inheriting the local proximity within a subtype. The cases of with or without the prior information on subtype numbers are investigated to discover the underlying subtype structure in an online fashion. The proposed subtype-aware dynamic UDA achieves promising results on medical diagnosis tasks.
△ Less
Submitted 11 January, 2021; v1 submitted 1 January, 2021;
originally announced January 2021.
-
Explainable Machine Learning based Transform Coding for High Efficiency Intra Prediction
Authors:
Na Li,
Yun Zhang,
C. -C. Jay Kuo
Abstract:
Machine learning techniques provide a chance to explore the coding performance potential of transform. In this work, we propose an explainable transform based intra video coding to improve the coding efficiency. Firstly, we model machine learning based transform design as an optimization problem of maximizing the energy compaction or decorrelation capability. The explainable machine learning based…
▽ More
Machine learning techniques provide a chance to explore the coding performance potential of transform. In this work, we propose an explainable transform based intra video coding to improve the coding efficiency. Firstly, we model machine learning based transform design as an optimization problem of maximizing the energy compaction or decorrelation capability. The explainable machine learning based transform, i.e., Subspace Approximation with Adjusted Bias (Saab) transform, is analyzed and compared with the mainstream Discrete Cosine Transform (DCT) on their energy compaction and decorrelation capabilities. Secondly, we propose a Saab transform based intra video coding framework with off-line Saab transform learning. Meanwhile, intra mode dependent Saab transform is developed. Then, Rate Distortion (RD) gain of Saab transform based intra video coding is theoretically and experimentally analyzed in detail. Finally, three strategies on integrating the Saab transform and DCT in intra video coding are developed to improve the coding efficiency. Experimental results demonstrate that the proposed 8$\times$8 Saab transform based intra video coding can achieve Bjønteggard Delta Bit Rate (BDBR) from -1.19% to -10.00% and -3.07% on average as compared with the mainstream 8$\times$8 DCT based coding scheme.
△ Less
Submitted 21 December, 2020;
originally announced December 2020.
-
Low-Resolution Face Recognition In Resource-Constrained Environments
Authors:
Mozhdeh Rouhsedaghat,
Yifan Wang,
Shuowen Hu,
Suya You,
C. -C. Jay Kuo
Abstract:
A non-parametric low-resolution face recognition model for resource-constrained environments with limited networking and computing is proposed in this work. Such environments often demand a small model capable of being effectively trained on a small number of labeled data samples, with low training complexity, and low-resolution input images. To address these challenges, we adopt an emerging expla…
▽ More
A non-parametric low-resolution face recognition model for resource-constrained environments with limited networking and computing is proposed in this work. Such environments often demand a small model capable of being effectively trained on a small number of labeled data samples, with low training complexity, and low-resolution input images. To address these challenges, we adopt an emerging explainable machine learning methodology called successive subspace learning (SSL).SSL offers an explainable non-parametric model that flexibly trades the model size for verification performance. Its training complexity is significantly lower since its model is trained in a one-pass feedforward manner without backpropagation. Furthermore, active learning can be conveniently incorporated to reduce the labeling cost. The effectiveness of the proposed model is demonstrated by experiments on the LFW and the CMU Multi-PIE datasets.
△ Less
Submitted 23 November, 2020;
originally announced November 2020.
-
SLADE: A Self-Training Framework For Distance Metric Learning
Authors:
Jiali Duan,
Yen-Liang Lin,
Son Tran,
Larry S. Davis,
C. -C. Jay Kuo
Abstract:
Most existing distance metric learning approaches use fully labeled data to learn the sample similarities in an embedding space. We present a self-training framework, SLADE, to improve retrieval performance by leveraging additional unlabeled data. We first train a teacher model on the labeled data and use it to generate pseudo labels for the unlabeled data. We then train a student model on both la…
▽ More
Most existing distance metric learning approaches use fully labeled data to learn the sample similarities in an embedding space. We present a self-training framework, SLADE, to improve retrieval performance by leveraging additional unlabeled data. We first train a teacher model on the labeled data and use it to generate pseudo labels for the unlabeled data. We then train a student model on both labels and pseudo labels to generate final feature embeddings. We use self-supervised representation learning to initialize the teacher model. To better deal with noisy pseudo labels generated by the teacher network, we design a new feature basis learning component for the student network, which learns basis functions of feature representations for unlabeled data. The learned basis vectors better measure the pairwise similarity and are used to select high-confident samples for training the student network. We evaluate our method on standard retrieval benchmarks: CUB-200, Cars-196 and In-shop. Experimental results demonstrate that our approach significantly improves the performance over the state-of-the-art methods.
△ Less
Submitted 29 March, 2021; v1 submitted 20 November, 2020;
originally announced November 2020.
-
End-to-end spoken language understanding using transformer networks and self-supervised pre-trained features
Authors:
Edmilson Morais,
Hong-Kwang J. Kuo,
Samuel Thomas,
Zoltan Tuske,
Brian Kingsbury
Abstract:
Transformer networks and self-supervised pre-training have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of spoken language understanding (SLU) still need further investigation. In this paper we introduce a modular End-to-End (E2E) SLU transformer network based architecture which allows the use of self-supervised p…
▽ More
Transformer networks and self-supervised pre-training have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of spoken language understanding (SLU) still need further investigation. In this paper we introduce a modular End-to-End (E2E) SLU transformer network based architecture which allows the use of self-supervised pre-trained acoustic features, pre-trained model initialization and multi-task training. Several SLU experiments for predicting intent and entity labels/values using the ATIS dataset are performed. These experiments investigate the interaction of pre-trained model initialization and multi-task training with either traditional filterbank or self-supervised pre-trained acoustic features. Results show not only that self-supervised pre-trained acoustic features outperform filterbank features in almost all the experiments, but also that when these features are used in combination with multi-task training, they almost eliminate the necessity of pre-trained model initialization.
△ Less
Submitted 16 November, 2020;
originally announced November 2020.
-
Point Cloud Attribute Compression via Successive Subspace Graph Transform
Authors:
Yueru Chen,
Yiting Shao,
**g Wang,
Ge Li,
C. -C. Jay Kuo
Abstract:
Inspired by the recently proposed successive subspace learning (SSL) principles, we develop a successive subspace graph transform (SSGT) to address point cloud attribute compression in this work. The octree geometry structure is utilized to partition the point cloud, where every node of the octree represents a point cloud subspace with a certain spatial size. We design a weighted graph with self-l…
▽ More
Inspired by the recently proposed successive subspace learning (SSL) principles, we develop a successive subspace graph transform (SSGT) to address point cloud attribute compression in this work. The octree geometry structure is utilized to partition the point cloud, where every node of the octree represents a point cloud subspace with a certain spatial size. We design a weighted graph with self-loop to describe the subspace and define a graph Fourier transform based on the normalized graph Laplacian. The transforms are applied to large point clouds from the leaf nodes to the root node of the octree recursively, while the represented subspace is expanded from the smallest one to the whole point cloud successively. It is shown by experimental results that the proposed SSGT method offers better R-D performances than the previous Region Adaptive Haar Transform (RAHT) method.
△ Less
Submitted 28 October, 2020;
originally announced October 2020.
-
Constructing Multilayer Perceptrons as Piecewise Low-Order Polynomial Approximators: A Signal Processing Approach
Authors:
Ruiyuan Lin,
Suya You,
Raghuveer Rao,
C. -C. Jay Kuo
Abstract:
The construction of a multilayer perceptron (MLP) as a piecewise low-order polynomial approximator using a signal processing approach is presented in this work. The constructed MLP contains one input, one intermediate and one output layers. Its construction includes the specification of neuron numbers and all filter weights. Through the construction, a one-to-one correspondence between the approxi…
▽ More
The construction of a multilayer perceptron (MLP) as a piecewise low-order polynomial approximator using a signal processing approach is presented in this work. The constructed MLP contains one input, one intermediate and one output layers. Its construction includes the specification of neuron numbers and all filter weights. Through the construction, a one-to-one correspondence between the approximation of an MLP and that of a piecewise low-order polynomial is established. Comparison between piecewise polynomial and MLP approximations is made. Since the approximation capability of piecewise low-order polynomials is well understood, our findings shed light on the universal approximation capability of an MLP.
△ Less
Submitted 15 October, 2020;
originally announced October 2020.
-
Scalar Dark Matter Candidates -- Revisited
Authors:
Céline Bœhm,
Xiaoyong Chu,
Jui-Lin Kuo,
Josef Pradler
Abstract:
We revisit the possibility of light scalar dark matter, in the MeV to GeV mass bracket and coupled to electrons through fermion or vector mediators, in light of significant experimental and observational advances that probe new physics below the GeV-scale. We establish new limits from electron colliders and fixed-target beams, and derive the strength of loop-induced processes that are probed by pr…
▽ More
We revisit the possibility of light scalar dark matter, in the MeV to GeV mass bracket and coupled to electrons through fermion or vector mediators, in light of significant experimental and observational advances that probe new physics below the GeV-scale. We establish new limits from electron colliders and fixed-target beams, and derive the strength of loop-induced processes that are probed by precision physics, among other laboratory probes. In addition, we compute the cooling bound from SN1987A, consider self-scattering, structure formation, and cosmological constraints as well as the limits from dark matter-electron scattering in direct detection experiments. We then show that the combination of constraints largely excludes the possibility that the galactic annihilation of these particles may explain the long-standing INTEGRAL excess of 511 keV photons as observed in the galactic bulge. As caveat to these conclusions we identify the resonant annihilation regime where the vector mediator goes nearly on-shell.
△ Less
Submitted 17 March, 2021; v1 submitted 6 October, 2020;
originally announced October 2020.
-
End-to-End Spoken Language Understanding Without Full Transcripts
Authors:
Hong-Kwang J. Kuo,
Zoltán Tüske,
Samuel Thomas,
Yinghui Huang,
Kartik Audhkhasi,
Brian Kingsbury,
Gakuto Kurata,
Zvi Kons,
Ron Hoory,
Luis Lastras
Abstract:
An essential component of spoken language understanding (SLU) is slot filling: representing the meaning of a spoken utterance using semantic entity labels. In this paper, we develop end-to-end (E2E) spoken language understanding systems that directly convert speech input to semantic entities and investigate if these E2E SLU models can be trained solely on semantic entity annotations without word-f…
▽ More
An essential component of spoken language understanding (SLU) is slot filling: representing the meaning of a spoken utterance using semantic entity labels. In this paper, we develop end-to-end (E2E) spoken language understanding systems that directly convert speech input to semantic entities and investigate if these E2E SLU models can be trained solely on semantic entity annotations without word-for-word transcripts. Training such models is very useful as they can drastically reduce the cost of data collection. We created two types of such speech-to-entities models, a CTC model and an attention-based encoder-decoder model, by adapting models trained originally for speech recognition. Given that our experiments involve speech input, these systems need to recognize both the entity label and words representing the entity value correctly. For our speech-to-entities experiments on the ATIS corpus, both the CTC and attention models showed impressive ability to skip non-entity words: there was little degradation when trained on just entities versus full transcripts. We also explored the scenario where the entities are in an order not necessarily related to spoken order in the utterance. With its ability to do re-ordering, the attention model did remarkably well, achieving only about 2% degradation in speech-to-bag-of-entities F1 score.
△ Less
Submitted 29 September, 2020;
originally announced September 2020.
-
Inductive Learning on Commonsense Knowledge Graph Completion
Authors:
Bin Wang,
Guangtao Wang,
**g Huang,
Jiaxuan You,
Jure Leskovec,
C. -C. Jay Kuo
Abstract:
Commonsense knowledge graph (CKG) is a special type of knowledge graph (KG), where entities are composed of free-form text. However, most existing CKG completion methods focus on the setting where all the entities are presented at training time. Although this setting is standard for conventional KG completion, it has limitations for CKG completion. At test time, entities in CKGs can be unseen beca…
▽ More
Commonsense knowledge graph (CKG) is a special type of knowledge graph (KG), where entities are composed of free-form text. However, most existing CKG completion methods focus on the setting where all the entities are presented at training time. Although this setting is standard for conventional KG completion, it has limitations for CKG completion. At test time, entities in CKGs can be unseen because they may have unseen text/names and entities may be disconnected from the training graph, since CKGs are generally very sparse. Here, we propose to study the inductive learning setting for CKG completion where unseen entities may present at test time. We develop a novel learning framework named InductivE. Different from previous approaches, InductiveE ensures the inductive learning capability by directly computing entity embeddings from raw entity attributes/text. InductiveE consists of a free-text encoder, a graph encoder, and a KG completion decoder. Specifically, the free-text encoder first extracts the textual representation of each entity based on the pre-trained language model and word embedding. The graph encoder is a gated relational graph convolutional neural network that learns from a densified graph for more informative entity representation learning. We develop a method that densifies CKGs by adding edges among semantic-related entities and provide more supportive information for unseen entities, leading to better generalization ability of entity embedding for unseen entities. Finally, inductiveE employs Conv-TransE as the CKG completion decoder. Experimental results show that InductiveE significantly outperforms state-of-the-art baselines in both standard and inductive settings on ATOMIC and ConceptNet benchmarks. InductivE performs especially well on inductive scenarios where it achieves above 48% improvement over present methods.
△ Less
Submitted 17 February, 2021; v1 submitted 19 September, 2020;
originally announced September 2020.
-
From Two-Class Linear Discriminant Analysis to Interpretable Multilayer Perceptron Design
Authors:
Ruiyuan Lin,
Zhiruo Zhou,
Suya You,
Raghuveer Rao,
C. -C. Jay Kuo
Abstract:
A closed-form solution exists in two-class linear discriminant analysis (LDA), which discriminates two Gaussian-distributed classes in a multi-dimensional feature space. In this work, we interpret the multilayer perceptron (MLP) as a generalization of a two-class LDA system so that it can handle an input composed by multiple Gaussian modalities belonging to multiple classes. Besides input layer…
▽ More
A closed-form solution exists in two-class linear discriminant analysis (LDA), which discriminates two Gaussian-distributed classes in a multi-dimensional feature space. In this work, we interpret the multilayer perceptron (MLP) as a generalization of a two-class LDA system so that it can handle an input composed by multiple Gaussian modalities belonging to multiple classes. Besides input layer $l_{in}$ and output layer $l_{out}$, the MLP of interest consists of two intermediate layers, $l_1$ and $l_2$. We propose a feedforward design that has three stages: 1) from $l_{in}$ to $l_1$: half-space partitionings accomplished by multiple parallel LDAs, 2) from $l_1$ to $l_2$: subspace isolation where one Gaussian modality is represented by one neuron, 3) from $l_2$ to $l_{out}$: class-wise subspace mergence, where each Gaussian modality is connected to its target class. Through this process, we present an automatic MLP design that can specify the network architecture (i.e., the layer number and the neuron number at a layer) and all filter weights in a feedforward one-pass fashion. This design can be generalized to an arbitrary distribution by leveraging the Gaussian mixture model (GMM). Experiments are conducted to compare the performance of the traditional backpropagation-based MLP (BP-MLP) and the new feedforward MLP (FF-MLP).
△ Less
Submitted 9 September, 2020;
originally announced September 2020.
-
Noise-Aware Texture-Preserving Low-Light Enhancement
Authors:
Zohreh Azizi,
Xue**g Lei,
C. -C Jay Kuo
Abstract:
A simple and effective low-light image enhancement method based on a noise-aware texture-preserving retinex model is proposed in this work. The new method, called NATLE, attempts to strike a balance between noise removal and natural texture preservation through a low-complexity solution. Its cost function includes an estimated piece-wise smooth illumination map and a noise-free texture-preserving…
▽ More
A simple and effective low-light image enhancement method based on a noise-aware texture-preserving retinex model is proposed in this work. The new method, called NATLE, attempts to strike a balance between noise removal and natural texture preservation through a low-complexity solution. Its cost function includes an estimated piece-wise smooth illumination map and a noise-free texture-preserving reflectance map. Afterwards, illumination is adjusted to form the enhanced image together with the reflectance map. Extensive experiments are conducted on common low-light image enhancement datasets to demonstrate the superior performance of NATLE.
△ Less
Submitted 2 September, 2020;
originally announced September 2020.
-
NITES: A Non-Parametric Interpretable Texture Synthesis Method
Authors:
Xue**g Lei,
Ganning Zhao,
C. -C. Jay Kuo
Abstract:
A non-parametric interpretable texture synthesis method, called the NITES method, is proposed in this work. Although automatic synthesis of visually pleasant texture can be achieved by deep neural networks nowadays, the associated generation models are mathematically intractable and their training demands higher computational cost. NITES offers a new texture synthesis solution to address these sho…
▽ More
A non-parametric interpretable texture synthesis method, called the NITES method, is proposed in this work. Although automatic synthesis of visually pleasant texture can be achieved by deep neural networks nowadays, the associated generation models are mathematically intractable and their training demands higher computational cost. NITES offers a new texture synthesis solution to address these shortcomings. NITES is mathematically transparent and efficient in training and inference. The input is a single exemplary texture image. The NITES method crops out patches from the input and analyzes the statistical properties of these texture patches to obtain their joint spatial-spectral representations. Then, the probabilistic distributions of samples in the joint spatial-spectral spaces are characterized. Finally, numerous texture images that are visually similar to the exemplary texture image can be generated automatically. Experimental results are provided to show the superior quality of generated texture images and efficiency of the proposed NITES method in terms of both training and inference time.
△ Less
Submitted 2 September, 2020;
originally announced September 2020.
-
Unsupervised Point Cloud Registration via Salient Points Analysis (SPA)
Authors:
Pranav Kadam,
Min Zhang,
Shan Liu,
C. -C. Jay Kuo
Abstract:
An unsupervised point cloud registration method, called salient points analysis (SPA), is proposed in this work. The proposed SPA method can register two point clouds effectively using only a small subset of salient points. It first applies the PointHop++ method to point clouds, finds corresponding salient points in two point clouds based on the local surface characteristics of points and performs…
▽ More
An unsupervised point cloud registration method, called salient points analysis (SPA), is proposed in this work. The proposed SPA method can register two point clouds effectively using only a small subset of salient points. It first applies the PointHop++ method to point clouds, finds corresponding salient points in two point clouds based on the local surface characteristics of points and performs registration by matching the corresponding salient points. The SPA method offers several advantages over the recent deep learning based solutions for registration. Deep learning methods such as PointNetLK and DCP train end-to-end networks and rely on full supervision (namely, ground truth transformation matrix and class label). In contrast, the SPA is completely unsupervised. Furthermore, SPA's training time and model size are much less. The effectiveness of the SPA method is demonstrated by experiments on seen and unseen classes and noisy point clouds from the ModelNet-40 dataset.
△ Less
Submitted 2 September, 2020;
originally announced September 2020.
-
Unsupervised Feedforward Feature (UFF) Learning for Point Cloud Classification and Segmentation
Authors:
Min Zhang,
Pranav Kadam,
Shan Liu,
C. -C. Jay Kuo
Abstract:
In contrast to supervised backpropagation-based feature learning in deep neural networks (DNNs), an unsupervised feedforward feature (UFF) learning scheme for joint classification and segmentation of 3D point clouds is proposed in this work. The UFF method exploits statistical correlations of points in a point cloud set to learn shape and point features in a one-pass feedforward manner through a c…
▽ More
In contrast to supervised backpropagation-based feature learning in deep neural networks (DNNs), an unsupervised feedforward feature (UFF) learning scheme for joint classification and segmentation of 3D point clouds is proposed in this work. The UFF method exploits statistical correlations of points in a point cloud set to learn shape and point features in a one-pass feedforward manner through a cascaded encoder-decoder architecture. It learns global shape features through the encoder and local point features through the concatenated encoder-decoder architecture. The extracted features of an input point cloud are fed to classifiers for shape classification and part segmentation. Experiments are conducted to evaluate the performance of the UFF method. For shape classification, the UFF is superior to existing unsupervised methods and on par with state-of-the-art DNNs. For part segmentation, the UFF outperforms semi-supervised methods and performs slightly worse than DNNs.
△ Less
Submitted 2 September, 2020;
originally announced September 2020.
-
Advancing Multiple Instance Learning with Attention Modeling for Categorical Speech Emotion Recognition
Authors:
Shuiyang Mao,
P. C. Ching,
C. -C. Jay Kuo,
Tan Lee
Abstract:
Categorical speech emotion recognition is typically performed as a sequence-to-label problem, i.e., to determine the discrete emotion label of the input utterance as a whole. One of the main challenges in practice is that most of the existing emotion corpora do not give ground truth labels for each segment; instead, we only have labels for whole utterances. To extract segment-level emotional infor…
▽ More
Categorical speech emotion recognition is typically performed as a sequence-to-label problem, i.e., to determine the discrete emotion label of the input utterance as a whole. One of the main challenges in practice is that most of the existing emotion corpora do not give ground truth labels for each segment; instead, we only have labels for whole utterances. To extract segment-level emotional information from such weakly labeled emotion corpora, we propose using multiple instance learning (MIL) to learn segment embeddings in a weakly supervised manner. Also, for a sufficiently long utterance, not all of the segments contain relevant emotional information. In this regard, three attention-based neural network models are then applied to the learned segment embeddings to attend the most salient part of a speech utterance. Experiments on the CASIA corpus and the IEMOCAP database show better or highly competitive results than other state-of-the-art approaches.
△ Less
Submitted 15 August, 2020;
originally announced August 2020.
-
FaceHop: A Light-Weight Low-Resolution Face Gender Classification Method
Authors:
Mozhdeh Rouhsedaghat,
Yifan Wang,
Xiou Ge,
Shuowen Hu,
Suya You,
C. -C. Jay Kuo
Abstract:
A light-weight low-resolution face gender classification method, called FaceHop, is proposed in this research. We have witnessed rapid progress in face gender classification accuracy due to the adoption of deep learning (DL) technology. Yet, DL-based systems are not suitable for resource-constrained environments with limited networking and computing. FaceHop offers an interpretable non-parametric…
▽ More
A light-weight low-resolution face gender classification method, called FaceHop, is proposed in this research. We have witnessed rapid progress in face gender classification accuracy due to the adoption of deep learning (DL) technology. Yet, DL-based systems are not suitable for resource-constrained environments with limited networking and computing. FaceHop offers an interpretable non-parametric machine learning solution. It has desired characteristics such as a small model size, a small training data amount, low training complexity, and low-resolution input images. FaceHop is developed with the successive subspace learning (SSL) principle and built upon the foundation of PixelHop++. The effectiveness of the FaceHop method is demonstrated by experiments. For gray-scale face images of resolution $32 \times 32$ in the LFW and the CMU Multi-PIE datasets, FaceHop achieves correct gender classification rates of 94.63% and 95.12% with model sizes of 16.9K and 17.6K parameters, respectively. It outperforms LeNet-5 in classification accuracy while LeNet-5 has a model size of 75.8K parameters.
△ Less
Submitted 12 November, 2020; v1 submitted 18 July, 2020;
originally announced July 2020.
-
Learning Color Compatibility in Fashion Outfits
Authors:
Heming Zhang,
Xuewen Yang,
Jianchao Tan,
Chi-Hao Wu,
Jue Wang,
C. -C. Jay Kuo
Abstract:
Color compatibility is important for evaluating the compatibility of a fashion outfit, yet it was neglected in previous studies. We bring this important problem to researchers' attention and present a compatibility learning framework as solution to various fashion tasks. The framework consists of a novel way to model outfit compatibility and an innovative learning scheme. Specifically, we model th…
▽ More
Color compatibility is important for evaluating the compatibility of a fashion outfit, yet it was neglected in previous studies. We bring this important problem to researchers' attention and present a compatibility learning framework as solution to various fashion tasks. The framework consists of a novel way to model outfit compatibility and an innovative learning scheme. Specifically, we model the outfits as graphs and propose a novel graph construction to better utilize the power of graph neural networks. Then we utilize both ground-truth labels and pseudo labels to train the compatibility model in a weakly-supervised manner.Extensive experimental results verify the importance of color compatibility alone with the effectiveness of our framework. With color information alone, our model's performance is already comparable to previous methods that use deep image features. Our full model combining the aforementioned contributions set the new state-of-the-art in fashion compatibility prediction.
△ Less
Submitted 5 July, 2020;
originally announced July 2020.
-
Novel Human-Object Interaction Detection via Adversarial Domain Generalization
Authors:
Yuhang Song,
Wenbo Li,
Lei Zhang,
Jianwei Yang,
Emre Kiciman,
Hamid Palangi,
Jianfeng Gao,
C. -C. Jay Kuo,
Pengchuan Zhang
Abstract:
We study in this paper the problem of novel human-object interaction (HOI) detection, aiming at improving the generalization ability of the model to unseen scenarios. The challenge mainly stems from the large compositional space of objects and predicates, which leads to the lack of sufficient training data for all the object-predicate combinations. As a result, most existing HOI methods heavily re…
▽ More
We study in this paper the problem of novel human-object interaction (HOI) detection, aiming at improving the generalization ability of the model to unseen scenarios. The challenge mainly stems from the large compositional space of objects and predicates, which leads to the lack of sufficient training data for all the object-predicate combinations. As a result, most existing HOI methods heavily rely on object priors and can hardly generalize to unseen combinations. To tackle this problem, we propose a unified framework of adversarial domain generalization to learn object-invariant features for predicate prediction. To measure the performance improvement, we create a new split of the HICO-DET dataset, where the HOIs in the test set are all unseen triplet categories in the training set. Our experiments show that the proposed framework significantly increases the performance by up to 50% on the new split of HICO-DET dataset and up to 125% on the UnRel dataset for auxiliary evaluation in detecting novel HOIs.
△ Less
Submitted 22 May, 2020;
originally announced May 2020.
-
Multi-View Matching (MVM): Facilitating Multi-Person 3D Pose Estimation Learning with Action-Frozen People Video
Authors:
Yeji Shen,
C. -C. Jay Kuo
Abstract:
To tackle the challeging problem of multi-person 3D pose estimation from a single image, we propose a multi-view matching (MVM) method in this work. The MVM method generates reliable 3D human poses from a large-scale video dataset, called the Mannequin dataset, that contains action-frozen people immitating mannequins. With a large amount of in-the-wild video data labeled by 3D supervisions automat…
▽ More
To tackle the challeging problem of multi-person 3D pose estimation from a single image, we propose a multi-view matching (MVM) method in this work. The MVM method generates reliable 3D human poses from a large-scale video dataset, called the Mannequin dataset, that contains action-frozen people immitating mannequins. With a large amount of in-the-wild video data labeled by 3D supervisions automatically generated by MVM, we are able to train a neural network that takes a single image as the input for multi-person 3D pose estimation. The core technology of MVM lies in effective alignment of 2D poses obtained from multiple views of a static scene that has a strong geometric constraint. Our objective is to maximize mutual consistency of 2D poses estimated in multiple frames, where geometric constraints as well as appearance similarities are taken into account simultaneously. To demonstrate the effectiveness of 3D supervisions provided by the MVM method, we conduct experiments on the 3DPW and the MSCOCO datasets and show that our proposed solution offers the state-of-the-art performance.
△ Less
Submitted 10 April, 2020;
originally announced April 2020.
-
Redesigning SLAM for Arbitrary Multi-Camera Systems
Authors:
Juichung Kuo,
Manasi Muglikar,
Zichao Zhang,
Davide Scaramuzza
Abstract:
Adding more cameras to SLAM systems improves robustness and accuracy but complicates the design of the visual front-end significantly. Thus, most systems in the literature are tailored for specific camera configurations. In this work, we aim at an adaptive SLAM system that works for arbitrary multi-camera setups. To this end, we revisit several common building blocks in visual SLAM. In particular,…
▽ More
Adding more cameras to SLAM systems improves robustness and accuracy but complicates the design of the visual front-end significantly. Thus, most systems in the literature are tailored for specific camera configurations. In this work, we aim at an adaptive SLAM system that works for arbitrary multi-camera setups. To this end, we revisit several common building blocks in visual SLAM. In particular, we propose an adaptive initialization scheme, a sensor-agnostic, information-theoretic keyframe selection algorithm, and a scalable voxel-based map. These techniques make little assumption about the actual camera setups and prefer theoretically grounded methods over heuristics. We adapt a state-of-the-art visual-inertial odometry with these modifications, and experimental results show that the modified pipeline can adapt to a wide range of camera setups (e.g., 2 to 6 cameras in one experiment) without the need of sensor-specific modifications or tuning.
△ Less
Submitted 4 March, 2020;
originally announced March 2020.
-
Efficient Sentence Embedding via Semantic Subspace Analysis
Authors:
Bin Wang,
Fenxiao Chen,
Yuncheng Wang,
C. -C. Jay Kuo
Abstract:
A novel sentence embedding method built upon semantic subspace analysis, called semantic subspace sentence embedding (S3E), is proposed in this work. Given the fact that word embeddings can capture semantic relationship while semantically similar words tend to form semantic groups in a high-dimensional embedding space, we develop a sentence representation scheme by analyzing semantic subspaces of…
▽ More
A novel sentence embedding method built upon semantic subspace analysis, called semantic subspace sentence embedding (S3E), is proposed in this work. Given the fact that word embeddings can capture semantic relationship while semantically similar words tend to form semantic groups in a high-dimensional embedding space, we develop a sentence representation scheme by analyzing semantic subspaces of its constituent words. Specifically, we construct a sentence model from two aspects. First, we represent words that lie in the same semantic group using the intra-group descriptor. Second, we characterize the interaction between multiple semantic groups with the inter-group descriptor. The proposed S3E method is evaluated on both textual similarity tasks and supervised tasks. Experimental results show that it offers comparable or better performance than the state-of-the-art. The complexity of our S3E method is also much lower than other parameterized models.
△ Less
Submitted 3 March, 2020; v1 submitted 21 February, 2020;
originally announced February 2020.
-
SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word Models
Authors:
Bin Wang,
C. -C. Jay Kuo
Abstract:
Sentence embedding is an important research topic in natural language processing (NLP) since it can transfer knowledge to downstream tasks. Meanwhile, a contextualized word representation, called BERT, achieves the state-of-the-art performance in quite a few NLP tasks. Yet, it is an open problem to generate a high quality sentence representation from BERT-based word models. It was shown in previou…
▽ More
Sentence embedding is an important research topic in natural language processing (NLP) since it can transfer knowledge to downstream tasks. Meanwhile, a contextualized word representation, called BERT, achieves the state-of-the-art performance in quite a few NLP tasks. Yet, it is an open problem to generate a high quality sentence representation from BERT-based word models. It was shown in previous study that different layers of BERT capture different linguistic properties. This allows us to fusion information across layers to find better sentence representation. In this work, we study the layer-wise pattern of the word representation of deep contextualized models. Then, we propose a new sentence embedding method by dissecting BERT-based word models through geometric analysis of the space spanned by the word representation. It is called the SBERT-WK method. No further training is required in SBERT-WK. We evaluate SBERT-WK on semantic textual similarity and downstream supervised tasks. Furthermore, ten sentence-level probing tasks are presented for detailed linguistic analysis. Experiments show that SBERT-WK achieves the state-of-the-art performance. Our codes are publicly available.
△ Less
Submitted 1 June, 2020; v1 submitted 16 February, 2020;
originally announced February 2020.
-
PointHop++: A Lightweight Learning Model on Point Sets for 3D Classification
Authors:
Min Zhang,
Yifan Wang,
Pranav Kadam,
Shan Liu,
C. -C. Jay Kuo
Abstract:
The PointHop method was recently proposed by Zhang et al. for 3D point cloud classification with unsupervised feature extraction. It has an extremely low training complexity while achieving state-of-the-art classification performance. In this work, we improve the PointHop method furthermore in two aspects: 1) reducing its model complexity in terms of the model parameter number and 2) ordering disc…
▽ More
The PointHop method was recently proposed by Zhang et al. for 3D point cloud classification with unsupervised feature extraction. It has an extremely low training complexity while achieving state-of-the-art classification performance. In this work, we improve the PointHop method furthermore in two aspects: 1) reducing its model complexity in terms of the model parameter number and 2) ordering discriminant features automatically based on the cross-entropy criterion. The resulting method is called PointHop++. The first improvement is essential for wearable and mobile computing while the second improvement bridges statistics-based and optimization-based machine learning methodologies. With experiments conducted on the ModelNet40 benchmark dataset, we show that the PointHop++ method performs on par with deep neural network (DNN) solutions and surpasses other unsupervised feature extraction methods.
△ Less
Submitted 22 May, 2020; v1 submitted 8 February, 2020;
originally announced February 2020.
-
PixelHop++: A Small Successive-Subspace-Learning-Based (SSL-based) Model for Image Classification
Authors:
Yueru Chen,
Mozhdeh Rouhsedaghat,
Suya You,
Raghuveer Rao,
C. -C. Jay Kuo
Abstract:
The successive subspace learning (SSL) principle was developed and used to design an interpretable learning model, known as the PixelHop method,for image classification in our prior work. Here, we propose an improved PixelHop method and call it PixelHop++. First, to make the PixelHop model size smaller, we decouple a joint spatial-spectral input tensor to multiple spatial tensors (one for each spe…
▽ More
The successive subspace learning (SSL) principle was developed and used to design an interpretable learning model, known as the PixelHop method,for image classification in our prior work. Here, we propose an improved PixelHop method and call it PixelHop++. First, to make the PixelHop model size smaller, we decouple a joint spatial-spectral input tensor to multiple spatial tensors (one for each spectral component) under the spatial-spectral separability assumption and perform the Saab transform in a channel-wise manner, called the channel-wise (c/w) Saab transform.Second, by performing this operation from one hop to another successively, we construct a channel-decomposed feature tree whose leaf nodes contain features of one dimension (1D). Third, these 1D features are ranked according to their cross-entropy values, which allows us to select a subset of discriminant features for image classification. In PixelHop++, one can control the learning model size of fine-granularity,offering a flexible tradeoff between the model size and the classification performance. We demonstrate the flexibility of PixelHop++ on MNIST, Fashion MNIST, and CIFAR-10 three datasets.
△ Less
Submitted 8 February, 2020;
originally announced February 2020.
-
Dark sector-photon interactions in proton-beam experiments
Authors:
Xiaoyong Chu,
Jui-Lin Kuo,
Josef Pradler
Abstract:
We consider electromagnetically neutral dark states that couple to the photon through higher dimensional effective operators, such as electric and magnetic dipole moment, anapole moment and charge radius operators. We investigate the possibility of probing the existence of such dark states, taking a Dirac fermion $χ$ as an example, at several representative proton-beam experiments. As no positive…
▽ More
We consider electromagnetically neutral dark states that couple to the photon through higher dimensional effective operators, such as electric and magnetic dipole moment, anapole moment and charge radius operators. We investigate the possibility of probing the existence of such dark states, taking a Dirac fermion $χ$ as an example, at several representative proton-beam experiments. As no positive signal has been reported, we obtain upper limits (or projected sensitivities) on the corresponding electromagnetic form factors for dark states lighter than several GeV. We demonstrate that while the current limits from proton-beam experiments are at most comparable with those from high-energy electron colliders, future experiments, such as DUNE and SHiP, will be able to improve the sensitivities to electric and magnetic dipole moment interactions, owing to their high intensity.
△ Less
Submitted 16 April, 2020; v1 submitted 16 January, 2020;
originally announced January 2020.
-
Towards Disentangled Representations for Human Retargeting by Multi-view Learning
Authors:
Chao Yang,
Xiaofeng Liu,
Qingming Tang,
C. -C. Jay Kuo
Abstract:
We study the problem of learning disentangled representations for data across multiple domains and its applications in human retargeting. Our goal is to map an input image to an identity-invariant latent representation that captures intrinsic factors such as expressions and poses. To this end, we present a novel multi-view learning approach that leverages various data sources such as images, keypo…
▽ More
We study the problem of learning disentangled representations for data across multiple domains and its applications in human retargeting. Our goal is to map an input image to an identity-invariant latent representation that captures intrinsic factors such as expressions and poses. To this end, we present a novel multi-view learning approach that leverages various data sources such as images, keypoints, and poses. Our model consists of multiple id-conditioned VAEs for different views of the data. During training, we encourage the latent embeddings to be consistent across these views. Our observation is that auxiliary data like keypoints and poses contain critical, id-agnostic semantic information, and it is easier to train a disentangling CVAE on these simpler views to separate such semantics from other id-specific attributes. We show that training multi-view CVAEs and encourage latent-consistency guides the image encoding to preserve the semantics of expressions and poses, leading to improved disentangled representations and better human retargeting results.
△ Less
Submitted 12 December, 2019;
originally announced December 2019.
-
PointDAN: A Multi-Scale 3D Domain Adaption Network for Point Cloud Representation
Authors:
Can Qin,
Haoxuan You,
Lichen Wang,
C. -C. Jay Kuo,
Yun Fu
Abstract:
Domain Adaptation (DA) approaches achieved significant improvements in a wide range of machine learning and computer vision tasks (i.e., classification, detection, and segmentation). However, as far as we are aware, there are few methods yet to achieve domain adaptation directly on 3D point cloud data. The unique challenge of point cloud data lies in its abundant spatial geometric information, and…
▽ More
Domain Adaptation (DA) approaches achieved significant improvements in a wide range of machine learning and computer vision tasks (i.e., classification, detection, and segmentation). However, as far as we are aware, there are few methods yet to achieve domain adaptation directly on 3D point cloud data. The unique challenge of point cloud data lies in its abundant spatial geometric information, and the semantics of the whole object is contributed by including regional geometric structures. Specifically, most general-purpose DA methods that struggle for global feature alignment and ignore local geometric information are not suitable for 3D domain alignment. In this paper, we propose a novel 3D Domain Adaptation Network for point cloud data (PointDAN). PointDAN jointly aligns the global and local features in multi-level. For local alignment, we propose Self-Adaptive (SA) node module with an adjusted receptive field to model the discriminative local structures for aligning domains. To represent hierarchically scaled features, node-attention module is further introduced to weight the relationship of SA nodes across objects and domains. For global alignment, an adversarial-training strategy is employed to learn and align global features across domains. Since there is no common evaluation benchmark for 3D point cloud DA scenario, we build a general benchmark (i.e., PointDA-10) extracted from three popular 3D object/scene datasets (i.e., ModelNet, ShapeNet and ScanNet) for cross-domain 3D objects classification fashion. Extensive experiments on PointDA-10 illustrate the superiority of our model over the state-of-the-art general-purpose DA methods.
△ Less
Submitted 6 November, 2019;
originally announced November 2019.
-
Recent Advances on HEVC Inter-frame Coding: From Optimization to Implementation and Beyond
Authors:
Yongfei Zhang,
Chao Zhang,
Rui Fan,
Siwei Ma,
Zhibo Chen,
C. -C. Jay Kuo
Abstract:
High Efficiency Video Coding (HEVC) has doubled the video compression ratio with equivalent subjective quality as compared to its predecessor H.264/AVC. The significant coding efficiency improvement is attributed to many new techniques. Inter-frame coding is one of the most powerful yet complicated techniques therein and has posed high computational burden thus main obstacle in HEVC-based real-tim…
▽ More
High Efficiency Video Coding (HEVC) has doubled the video compression ratio with equivalent subjective quality as compared to its predecessor H.264/AVC. The significant coding efficiency improvement is attributed to many new techniques. Inter-frame coding is one of the most powerful yet complicated techniques therein and has posed high computational burden thus main obstacle in HEVC-based real-time applications. Recently, plenty of research has been done to optimize the inter-frame coding, either to reduce the complexity for real-time applications, or to further enhance the encoding efficiency. In this paper, we provide a comprehensive review of the state-of-the-art techniques for HEVC inter-frame coding from three aspects, namely fast inter coding solutions, implementation on different hardware platforms as well as advanced inter coding techniques. More specifically, different algorithms in each aspect are further subdivided into sub-categories and compared in terms of pros, cons, coding efficiency and coding complexity. To the best of our knowledge, this is the first such comprehensive review of the recent advances of the inter-frame coding for HEVC and hopefully it would help the improvement, implementation and applications of HEVC as well as the ongoing development of the next generation video coding standard.
△ Less
Submitted 2 December, 2019; v1 submitted 22 October, 2019;
originally announced October 2019.
-
Continuous focal translation enhances rate of point-scan volumetric microscopy
Authors:
Courtney Johnson,
Jack Exell,
Jonathon Kuo,
Kevin Welsher
Abstract:
Two-Photon Laser-Scanning Microscopy is a powerful tool for exploring biological structure and function because of its ability to optically section through a sample with a tight focus. While it is possible to obtain 3D image stacks by moving a stage, this perframe imaging process is time consuming. Here, we present a method for an easy to implement and inexpensive modification of an existing two-p…
▽ More
Two-Photon Laser-Scanning Microscopy is a powerful tool for exploring biological structure and function because of its ability to optically section through a sample with a tight focus. While it is possible to obtain 3D image stacks by moving a stage, this perframe imaging process is time consuming. Here, we present a method for an easy to implement and inexpensive modification of an existing two-photon microscope to rapidly image in 3D using an electrically tunable lens to create a tessellating scan pattern which repeats with the volume rate. Using appropriate interpolating algorithms, the volumetric imaging rate can be increased by a factor up to four-fold. This capability provides the expansion of the two-photon microscope into the third dimension for faster volumetric imaging capable of visualizing dynamics on timescales not achievable by traditional stage stack methods.
△ Less
Submitted 9 September, 2019;
originally announced September 2019.
-
PixelHop: A Successive Subspace Learning (SSL) Method for Object Classification
Authors:
Yueru Chen,
C. -C. Jay Kuo
Abstract:
A new machine learning methodology, called successive subspace learning (SSL), is introduced in this work. SSL contains four key ingredients: 1) successive near-to-far neighborhood expansion; 2) unsupervised dimension reduction via subspace approximation; 3) supervised dimension reduction via label-assisted regression (LAG); and 4) feature concatenation and decision making. An image-based object c…
▽ More
A new machine learning methodology, called successive subspace learning (SSL), is introduced in this work. SSL contains four key ingredients: 1) successive near-to-far neighborhood expansion; 2) unsupervised dimension reduction via subspace approximation; 3) supervised dimension reduction via label-assisted regression (LAG); and 4) feature concatenation and decision making. An image-based object classification method, called PixelHop, is proposed to illustrate the SSL design. It is shown by experimental results that the PixelHop method outperforms the classic CNN model of similar model complexity in three benchmarking datasets (MNIST, Fashion MNIST and CIFAR-10). Although SSL and deep learning (DL) have some high-level concept in common, they are fundamentally different in model formulation, the training process and training complexity. Extensive discussion on the comparison of SSL and DL is made to provide further insights into the potential of SSL.
△ Less
Submitted 17 September, 2019;
originally announced September 2019.
-
Graph Representation Learning: A Survey
Authors:
Fenxiao Chen,
Yuncheng Wang,
Bin Wang,
C. -C. Jay Kuo
Abstract:
Research on graph representation learning has received a lot of attention in recent years since many data in real-world applications come in form of graphs. High-dimensional graph data are often in irregular form, which makes them more difficult to analyze than image/video/audio data defined on regular lattices. Various graph embedding techniques have been developed to convert the raw graph data i…
▽ More
Research on graph representation learning has received a lot of attention in recent years since many data in real-world applications come in form of graphs. High-dimensional graph data are often in irregular form, which makes them more difficult to analyze than image/video/audio data defined on regular lattices. Various graph embedding techniques have been developed to convert the raw graph data into a low-dimensional vector representation while preserving the intrinsic graph properties. In this review, we first explain the graph embedding task and its challenges. Next, we review a wide range of graph embedding techniques with insights. Then, we evaluate several state-of-the-art methods against small and large datasets and compare their performance. Finally, potential applications and future directions are presented.
△ Less
Submitted 3 September, 2019;
originally announced September 2019.
-
On Energy Compaction of 2D Saab Image Transforms
Authors:
Na Li,
Yongfei Zhang,
Yun Zhang,
C. -C. Jay Kuo
Abstract:
The block Discrete Cosine Transform (DCT) is commonly used in image and video compression due to its good energy compaction property. The Saab transform was recently proposed as an effective signal transform for image understanding. In this work, we study the energy compaction property of the Saab transform in the context of intra-coding of the High Efficiency Video Coding (HEVC) standard. We comp…
▽ More
The block Discrete Cosine Transform (DCT) is commonly used in image and video compression due to its good energy compaction property. The Saab transform was recently proposed as an effective signal transform for image understanding. In this work, we study the energy compaction property of the Saab transform in the context of intra-coding of the High Efficiency Video Coding (HEVC) standard. We compare the energy compaction property of the Saab transform, the DCT, and the Karhunen-Loeve transform (KLT) by applying them to different sizes of intra-predicted residual blocks in HEVC. The basis functions of the Saab transform are visualized. Extensive experimental results are given to demonstrate the energy compaction capability of the Saab transform.
△ Less
Submitted 28 August, 2019;
originally announced August 2019.
-
Stellar probes of dark sector-photon interactions
Authors:
Xiaoyong Chu,
Jui-Lin Kuo,
Josef Pradler,
Lukas Semmelrock
Abstract:
Electromagnetically neutral dark sector particles may directly couple to the photon through higher dimensional effective operators. Considering electric and magnetic dipole moment, anapole moment, and charge radius interactions, we derive constraints from stellar energy loss in the Sun, horizontal branch and red giant stars, as well as from cooling of the proto-neutron star of SN1987A. We provide…
▽ More
Electromagnetically neutral dark sector particles may directly couple to the photon through higher dimensional effective operators. Considering electric and magnetic dipole moment, anapole moment, and charge radius interactions, we derive constraints from stellar energy loss in the Sun, horizontal branch and red giant stars, as well as from cooling of the proto-neutron star of SN1987A. We provide the exact formula for in-medium photon-mediated pair production to leading order in the dark coupling, and compute the energy loss rates explicitly for the most important processes, including a careful discussion on resonances and potential double counting between the processes. Stringent limits for dark states with masses below $3\,$keV ($40\,$MeV) arise from red giant stars (SN1987A), implying an effective lower mass-scale of approximately $10^9\,$GeV ($10^7\,$GeV) for mass-dimension five, and $100\,$GeV ($2.5\,$TeV) for mass-dimension six operators as long as dark states stream freely; for the proto-neutron star, the trap** of dark states is also evaluated. Together with direct limits previously derived by us in Chu et al. (2018), this provides the first comprehensive overview of the viability of effective electromagnetic dark-state interactions below the GeV mass-scale.
△ Less
Submitted 7 October, 2019; v1 submitted 1 August, 2019;
originally announced August 2019.
-
PointHop: An Explainable Machine Learning Method for Point Cloud Classification
Authors:
Min Zhang,
Haoxuan You,
Pranav Kadam,
Shan Liu,
C. -C. Jay Kuo
Abstract:
An explainable machine learning method for point cloud classification, called the PointHop method, is proposed in this work. The PointHop method consists of two stages: 1) local-to-global attribute building through iterative one-hop information exchange, and 2) classification and ensembles. In the attribute building stage, we address the problem of unordered point cloud data using a space partitio…
▽ More
An explainable machine learning method for point cloud classification, called the PointHop method, is proposed in this work. The PointHop method consists of two stages: 1) local-to-global attribute building through iterative one-hop information exchange, and 2) classification and ensembles. In the attribute building stage, we address the problem of unordered point cloud data using a space partitioning procedure and develo** a robust descriptor that characterizes the relationship between a point and its one-hop neighbor in a PointHop unit. When we put multiple PointHop units in cascade, the attributes of a point will grow by taking its relationship with one-hop neighbor points into account iteratively. Furthermore, to control the rapid dimension growth of the attribute vector associated with a point, we use the Saab transform to reduce the attribute dimension in each PointHop unit. In the classification and ensemble stage, we feed the feature vector obtained from multiple PointHop units to a classifier. We explore ensemble methods to improve the classification performance furthermore. It is shown by experimental results that the PointHop method offers classification performance that is comparable with state-of-the-art methods while demanding much lower training complexity.
△ Less
Submitted 15 December, 2019; v1 submitted 30 July, 2019;
originally announced July 2019.
-
An Interpretable Compression and Classification System: Theory and Applications
Authors:
Tzu-Wei Tseng,
Kai-Jiun Yang,
C. -C. Jay Kuo,
Shang-Ho,
Tsai
Abstract:
This study proposes a low-complexity interpretable classification system. The proposed system contains three main modules including feature extraction, feature reduction, and classification. All of them are linear. Thanks to the linear property, the extracted and reduced features can be inversed to original data, like a linear transform such as Fourier transform, so that one can quantify and visua…
▽ More
This study proposes a low-complexity interpretable classification system. The proposed system contains three main modules including feature extraction, feature reduction, and classification. All of them are linear. Thanks to the linear property, the extracted and reduced features can be inversed to original data, like a linear transform such as Fourier transform, so that one can quantify and visualize the contribution of individual features towards the original data. Also, the reduced features and reversibility naturally endure the proposed system ability of data compression. This system can significantly compress data with a small percent deviation between the compressed and the original data. At the same time, when the compressed data is used for classification, it still achieves high testing accuracy. Furthermore, we observe that the extracted features of the proposed system can be approximated to uncorrelated Gaussian random variables. Hence, classical theory in estimation and detection can be applied for classification. This motivates us to propose using a MAP (maximum a posteriori) based classification method. As a result, the extracted features and the corresponding performance have statistical meaning and mathematically interpretable. Simulation results show that the proposed classification system not only enjoys significant reduced training and testing time but also high testing accuracy compared to the conventional schemes.
△ Less
Submitted 14 April, 2020; v1 submitted 21 July, 2019;
originally announced July 2019.
-
Appearance and Shape from Water Reflection
Authors:
Ryo Kawahara,
Meng-Yu Jennifer Kuo,
Shohei Nobuhara,
Ko Nishino
Abstract:
This paper introduces single-image geometric and appearance reconstruction from water reflection photography, i.e., images capturing direct and water-reflected real-world scenes. Water reflection offers an additional viewpoint to the direct sight, collectively forming a stereo pair. The water-reflected scene, however, includes internally scattered and reflected environmental illumination in additi…
▽ More
This paper introduces single-image geometric and appearance reconstruction from water reflection photography, i.e., images capturing direct and water-reflected real-world scenes. Water reflection offers an additional viewpoint to the direct sight, collectively forming a stereo pair. The water-reflected scene, however, includes internally scattered and reflected environmental illumination in addition to the scene radiance, which precludes direct stereo matching. We derive a principled iterative method that disentangles this scene radiometry and geometry for reconstructing 3D scene structure as well as its high-dynamic range appearance. In the presence of waves, we simultaneously recover the wave geometry as surface normal perturbations of the water surface. Most important, we show that the water reflection enables calibration of the camera. In other words, for the first time, we show that capturing a direct and water-reflected scene in a single exposure forms a self-calibrating HDR catadioptric stereo camera. We demonstrate our method on a number of images taken in the wild. The results demonstrate a new means for leveraging this accidental catadioptric camera.
△ Less
Submitted 7 January, 2020; v1 submitted 24 June, 2019;
originally announced June 2019.
-
Deep Kinship Verification via Appearance-shape Joint Prediction and Adaptation-based Approach
Authors:
Heming Zhang,
Xiaolong Wang,
C. -C. Jay Kuo
Abstract:
Kinship verification aims to identify the kin relation between two given face images. It is a very challenging problem due to the lack of training data and facial similarity variations between kinship pairs. In this work, we build a novel appearance and shape based deep learning pipeline. First we adopt the knowledge learned from general face recognition network to learn general facial features. A…
▽ More
Kinship verification aims to identify the kin relation between two given face images. It is a very challenging problem due to the lack of training data and facial similarity variations between kinship pairs. In this work, we build a novel appearance and shape based deep learning pipeline. First we adopt the knowledge learned from general face recognition network to learn general facial features. Afterwards, we learn kinship oriented appearance and shape features from kinship pairs and combine them for the final prediction. We have evaluated the model performance on a widely used popular benchmark and demonstrated the superiority over the state-of-the-art.
△ Less
Submitted 15 May, 2019;
originally announced May 2019.
-
Compressed Image Quality Assessment Based on Saak Features
Authors:
Xinfeng Zhang,
Sam Kwong,
C. -C. Jay Kuo
Abstract:
Compressed image quality assessment plays an important role in image services, especially in image compression applications, which can be utilized as a guidance to optimize image processing algorithms. In this paper, we propose an objective image quality assessment algorithm to measure the quality of compressed images. The proposed method utilizes a data-driven transform, Saak (Subspace approximat…
▽ More
Compressed image quality assessment plays an important role in image services, especially in image compression applications, which can be utilized as a guidance to optimize image processing algorithms. In this paper, we propose an objective image quality assessment algorithm to measure the quality of compressed images. The proposed method utilizes a data-driven transform, Saak (Subspace approximation with augmented kernels), to decompose images into hierarchical structural feature space. We measure the distortions of Saak features and accumulate these distortions according to the feature importance to human visual system. Compared with the state-of-the-art image quality assessment methods on widely utilized datasets, the proposed method correlates better with the subjective results. In addition, the proposed methods achieves more robust results on different datasets.
△ Less
Submitted 16 May, 2019; v1 submitted 6 May, 2019;
originally announced May 2019.
-
Accelerating Proposal Generation Network for \\Fast Face Detection on Mobile Devices
Authors:
Heming Zhang,
Xiaolong Wang,
**gwen Zhu,
C. -C. Jay Kuo
Abstract:
Face detection is a widely studied problem over the past few decades. Recently, significant improvements have been achieved via the deep neural network, however, it is still challenging to directly apply these techniques to mobile devices for its limited computational power and memory. In this work, we present a proposal generation acceleration framework for real-time face detection. More specific…
▽ More
Face detection is a widely studied problem over the past few decades. Recently, significant improvements have been achieved via the deep neural network, however, it is still challenging to directly apply these techniques to mobile devices for its limited computational power and memory. In this work, we present a proposal generation acceleration framework for real-time face detection. More specifically, we adopt a popular cascaded convolutional neural network (CNN) as the basis, then apply our acceleration approach on the basic framework to speed up the model inference time. We are motivated by the observation that the computation bottleneck of this framework arises from the proposal generation stage, where each level of the dense image pyramid has to go through the network. In this work, we reduce the number of image pyramid levels by utilizing both global and local facial characteristics (i.e., global face and facial parts). Experimental results on public benchmarks WIDER-face and FDDB demonstrate the satisfactory performance and faster speed compared to the state-of-the-arts. %the comparable accuracy to state-of-the-arts with faster speed.
△ Less
Submitted 26 April, 2019;
originally announced April 2019.
-
Learning a Multi-Modal Policy via Imitating Demonstrations with Mixed Behaviors
Authors:
Fang-I Hsiao,
Jui-Hsuan Kuo,
Min Sun
Abstract:
We propose a novel approach to train a multi-modal policy from mixed demonstrations without their behavior labels. We develop a method to discover the latent factors of variation in the demonstrations. Specifically, our method is based on the variational autoencoder with a categorical latent variable. The encoder infers discrete latent factors corresponding to different behaviors from demonstratio…
▽ More
We propose a novel approach to train a multi-modal policy from mixed demonstrations without their behavior labels. We develop a method to discover the latent factors of variation in the demonstrations. Specifically, our method is based on the variational autoencoder with a categorical latent variable. The encoder infers discrete latent factors corresponding to different behaviors from demonstrations. The decoder, as a policy, performs the behaviors accordingly. Once learned, the policy is able to reproduce a specific behavior by simply conditioning on a categorical vector. We evaluate our method on three different tasks, including a challenging task with high-dimensional visual inputs. Experimental results show that our approach is better than various baseline methods and competitive with a multi-modal policy trained by ground truth behavior labels.
△ Less
Submitted 25 March, 2019;
originally announced March 2019.