-
High-Dimensional Expanders from Expanders
Authors:
Siqi Liu,
Sidhanth Mohanty,
Elizabeth Yang
Abstract:
We present an elementary way to transform an expander graph into a simplicial complex where all high order random walks have a constant spectral gap, i.e., they converge rapidly to the stationary distribution. As an upshot, we obtain new constructions, as well as a natural probabilistic model to sample constant degree high-dimensional expanders.
In particular, we show that given an expander grap…
▽ More
We present an elementary way to transform an expander graph into a simplicial complex where all high order random walks have a constant spectral gap, i.e., they converge rapidly to the stationary distribution. As an upshot, we obtain new constructions, as well as a natural probabilistic model to sample constant degree high-dimensional expanders.
In particular, we show that given an expander graph $G$, adding self loops to $G$ and taking the tensor product of the modified graph with a high-dimensional expander produces a new high-dimensional expander. Our proof of rapid mixing of high order random walks is based on the decomposable Markov chains framework introduced by Jerrum et al.
△ Less
Submitted 20 November, 2019; v1 submitted 24 July, 2019;
originally announced July 2019.
-
Soliton fractional charges in graphene nanoribbon and polyacetylene: similarities and differences
Authors:
S. -R. Eric Yang
Abstract:
An introductory overview of current research developments regarding solitons and fractional boundary charges in graphene nanoribbons is presented. Graphene nanoribbons and polyacetylene have chiral symmetry and share numerous similar properties, e.g., the bulk-edge correspondence between the Zak phase and the existence of edge states, along with the presence of chiral boundary states, which are im…
▽ More
An introductory overview of current research developments regarding solitons and fractional boundary charges in graphene nanoribbons is presented. Graphene nanoribbons and polyacetylene have chiral symmetry and share numerous similar properties, e.g., the bulk-edge correspondence between the Zak phase and the existence of edge states, along with the presence of chiral boundary states, which are important for charge fractionalization. In polyacetylene, a fermion mass potential in the Dirac equation produces an excitation gap, and a twist in this scalar potential produces a zero-energy chiral soliton. Similarly, in a gapful armchair graphene nanoribbon, a distortion in the chiral gauge field can produce soliton states. In polyacetylene, a soliton is bound to a domain wall connecting two different dimerized phases. In graphene nanoribbons, a domain-wall soliton connects two topological zigzag edges with different chiralities. However, such a soliton does not display spin-charge separation. The existence of a soliton in finite-length polyacetylene can induce formation of fractional charges on the opposite ends. In contrast, for gapful graphene nanoribbons, the antiferromagnetic coupling between the opposite zigzag edges induces integer boundary charges. The presence of disorder in graphene nanoribbons partly mitigates antiferromagnetic coupling effect. Hence, the average edge charge of gap states with energies within a small interval is e/2, with significant charge fluctuations. However, midgap states exhibit a well-defined charge fractionalization between the opposite zigzag edges in the weak-disorder regime. Numerous occupied soliton states in a disorder-free and doped zigzag graphene nanoribbon form a solitonic phase.
△ Less
Submitted 18 June, 2019;
originally announced June 2019.
-
Reliable Estimation of Individual Treatment Effect with Causal Information Bottleneck
Authors:
Sungyub Kim,
Yongsu Baek,
Sung Ju Hwang,
Eunho Yang
Abstract:
Estimating individual level treatment effects (ITE) from observational data is a challenging and important area in causal machine learning and is commonly considered in diverse mission-critical applications. In this paper, we propose an information theoretic approach in order to find more reliable representations for estimating ITE. We leverage the Information Bottleneck (IB) principle, which addr…
▽ More
Estimating individual level treatment effects (ITE) from observational data is a challenging and important area in causal machine learning and is commonly considered in diverse mission-critical applications. In this paper, we propose an information theoretic approach in order to find more reliable representations for estimating ITE. We leverage the Information Bottleneck (IB) principle, which addresses the trade-off between conciseness and predictive power of representation. With the introduction of an extended graphical model for causal information bottleneck, we encourage the independence between the learned representation and the treatment type. We also introduce an additional form of a regularizer from the perspective of understanding ITE in the semi-supervised learning framework to ensure more reliable representations. Experimental results show that our model achieves the state-of-the-art results and exhibits more reliable prediction performances with uncertainty information on real-world datasets.
△ Less
Submitted 7 June, 2019;
originally announced June 2019.
-
Why Not to Use Zero Imputation? Correcting Sparsity Bias in Training Neural Networks
Authors:
Joonyoung Yi,
Juhyuk Lee,
Kwang Joon Kim,
Sung Ju Hwang,
Eunho Yang
Abstract:
Handling missing data is one of the most fundamental problems in machine learning. Among many approaches, the simplest and most intuitive way is zero imputation, which treats the value of a missing entry simply as zero. However, many studies have experimentally confirmed that zero imputation results in suboptimal performances in training neural networks. Yet, none of the existing work has explaine…
▽ More
Handling missing data is one of the most fundamental problems in machine learning. Among many approaches, the simplest and most intuitive way is zero imputation, which treats the value of a missing entry simply as zero. However, many studies have experimentally confirmed that zero imputation results in suboptimal performances in training neural networks. Yet, none of the existing work has explained what brings such performance degradations. In this paper, we introduce the variable sparsity problem (VSP), which describes a phenomenon where the output of a predictive model largely varies with respect to the rate of missingness in the given input, and show that it adversarially affects the model performance. We first theoretically analyze this phenomenon and propose a simple yet effective technique to handle missingness, which we refer to as Sparsity Normalization (SN), that directly targets and resolves the VSP. We further experimentally validate SN on diverse benchmark datasets, to show that debiasing the effect of input-level sparsity improves the performance and stabilizes the training of neural networks.
△ Less
Submitted 6 February, 2020; v1 submitted 1 June, 2019;
originally announced June 2019.
-
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks
Authors:
Hae Beom Lee,
Hayeon Lee,
Donghyun Na,
Saehoon Kim,
Minseop Park,
Eunho Yang,
Sung Ju Hwang
Abstract:
While tasks could come with varying the number of instances and classes in realistic settings, the existing meta-learning approaches for few-shot classification assume that the number of instances per task and class is fixed. Due to such restriction, they learn to equally utilize the meta-knowledge across all the tasks, even when the number of instances per task and class largely varies. Moreover,…
▽ More
While tasks could come with varying the number of instances and classes in realistic settings, the existing meta-learning approaches for few-shot classification assume that the number of instances per task and class is fixed. Due to such restriction, they learn to equally utilize the meta-knowledge across all the tasks, even when the number of instances per task and class largely varies. Moreover, they do not consider distributional difference in unseen tasks, on which the meta-knowledge may have less usefulness depending on the task relatedness. To overcome these limitations, we propose a novel meta-learning model that adaptively balances the effect of the meta-learning and task-specific learning within each task. Through the learning of the balancing variables, we can decide whether to obtain a solution by relying on the meta-knowledge or task-specific learning. We formulate this objective into a Bayesian inference framework and tackle it using variational inference. We validate our Bayesian Task-Adaptive Meta-Learning (Bayesian TAML) on multiple realistic task- and class-imbalanced datasets, on which it significantly outperforms existing meta-learning approaches. Further ablation study confirms the effectiveness of each balancing component and the Bayesian learning framework.
△ Less
Submitted 12 February, 2022; v1 submitted 30 May, 2019;
originally announced May 2019.
-
Meta Dropout: Learning to Perturb Features for Generalization
Authors:
Hae Beom Lee,
Taewook Nam,
Eunho Yang,
Sung Ju Hwang
Abstract:
A machine learning model that generalizes well should obtain low errors on unseen test examples. Thus, if we know how to optimally perturb training examples to account for test examples, we may achieve better generalization performance. However, obtaining such perturbation is not possible in standard machine learning frameworks as the distribution of the test data is unknown. To tackle this challe…
▽ More
A machine learning model that generalizes well should obtain low errors on unseen test examples. Thus, if we know how to optimally perturb training examples to account for test examples, we may achieve better generalization performance. However, obtaining such perturbation is not possible in standard machine learning frameworks as the distribution of the test data is unknown. To tackle this challenge, we propose a novel regularization method, meta-dropout, which learns to perturb the latent features of training examples for generalization in a meta-learning framework. Specifically, we meta-learn a noise generator which outputs a multiplicative noise distribution for latent features, to obtain low errors on the test instances in an input-dependent manner. Then, the learned noise generator can perturb the training examples of unseen tasks at the meta-test time for improved generalization. We validate our method on few-shot classification datasets, whose results show that it significantly improves the generalization performance of the base model, and largely outperforms existing regularization methods such as information bottleneck, manifold mixup, and information dropout.
△ Less
Submitted 12 February, 2022; v1 submitted 30 May, 2019;
originally announced May 2019.
-
Stochastic Gradient Methods with Block Diagonal Matrix Adaptation
Authors:
Jihun Yun,
Aurelie C. Lozano,
Eunho Yang
Abstract:
Adaptive gradient approaches that automatically adjust the learning rate on a per-feature basis have been very popular for training deep networks. This rich class of algorithms includes Adagrad, RMSprop, Adam, and recent extensions. All these algorithms have adopted diagonal matrix adaptation, due to the prohibitive computational burden of manipulating full matrices in high-dimensions. In this pap…
▽ More
Adaptive gradient approaches that automatically adjust the learning rate on a per-feature basis have been very popular for training deep networks. This rich class of algorithms includes Adagrad, RMSprop, Adam, and recent extensions. All these algorithms have adopted diagonal matrix adaptation, due to the prohibitive computational burden of manipulating full matrices in high-dimensions. In this paper, we show that block-diagonal matrix adaptation can be a practical and powerful solution that can effectively utilize structural characteristics of deep learning architectures, and significantly improve convergence and out-of-sample generalization. We present a general framework with block-diagonal matrix updates via coordinate grou**, which includes counterparts of the aforementioned algorithms, prove their convergence in non-convex optimization, highlighting benefits compared to diagonal versions. In addition, we propose an efficient spectrum-clip** scheme that benefits from superior generalization performance of Sgd. Extensive experiments reveal that block-diagonal approaches achieve state-of-the-art results on several deep learning tasks, and can outperform adaptive diagonal methods, vanilla Sgd, as well as a modified version of full-matrix adaptation proposed very recently.
△ Less
Submitted 26 May, 2019;
originally announced May 2019.
-
Incorporating Sememes into Chinese Definition Modeling
Authors:
Liner Yang,
Cunliang Kong,
Yun Chen,
Yang Liu,
Qinan Fan,
Erhong Yang
Abstract:
Chinese definition modeling is a challenging task that generates a dictionary definition in Chinese for a given Chinese word. To accomplish this task, we construct the Chinese Definition Modeling Corpus (CDM), which contains triples of word, sememes and the corresponding definition. We present two novel models to improve Chinese definition modeling: the Adaptive-Attention model (AAM) and the Self-…
▽ More
Chinese definition modeling is a challenging task that generates a dictionary definition in Chinese for a given Chinese word. To accomplish this task, we construct the Chinese Definition Modeling Corpus (CDM), which contains triples of word, sememes and the corresponding definition. We present two novel models to improve Chinese definition modeling: the Adaptive-Attention model (AAM) and the Self- and Adaptive-Attention Model (SAAM). AAM successfully incorporates sememes for generating the definition with an adaptive attention mechanism. It has the capability to decide which sememes to focus on and when to pay attention to sememes. SAAM further replaces recurrent connections in AAM with self-attention and relies entirely on the attention mechanism, reducing the path length between word, sememes and definition. Experiments on CDM demonstrate that by incorporating sememes, our best proposed model can outperform the state-of-the-art method by +6.0 BLEU.
△ Less
Submitted 15 May, 2019;
originally announced May 2019.
-
Spectral Approximate Inference
Authors:
Sejun Park,
Eunho Yang,
Se-Young Yun,
**woo Shin
Abstract:
Given a graphical model (GM), computing its partition function is the most essential inference task, but it is computationally intractable in general. To address the issue, iterative approximation algorithms exploring certain local structure/consistency of GM have been investigated as popular choices in practice. However, due to their local/iterative nature, they often output poor approximations o…
▽ More
Given a graphical model (GM), computing its partition function is the most essential inference task, but it is computationally intractable in general. To address the issue, iterative approximation algorithms exploring certain local structure/consistency of GM have been investigated as popular choices in practice. However, due to their local/iterative nature, they often output poor approximations or even do not converge, e.g., in low-temperature regimes (hard instances of large parameters). To overcome the limitation, we propose a novel approach utilizing the global spectral feature of GM. Our contribution is two-fold: (a) we first propose a fully polynomial-time approximation scheme (FPTAS) for approximating the partition function of GM associating with a low-rank coupling matrix; (b) for general high-rank GMs, we design a spectral mean-field scheme utilizing (a) as a subroutine, where it approximates a high-rank GM into a product of rank-1 GMs for an efficient approximation of the partition function. The proposed algorithm is more robust in its running time and accuracy than prior methods, i.e., neither suffers from the convergence issue nor depends on hard local structures, as demonstrated in our experiments.
△ Less
Submitted 13 May, 2019;
originally announced May 2019.
-
DistillHash: Unsupervised Deep Hashing by Distilling Data Pairs
Authors:
Erkun Yang,
Tongliang Liu,
Cheng Deng,
Wei Liu,
Dacheng Tao
Abstract:
Due to the high storage and search efficiency, hashing has become prevalent for large-scale similarity search. Particularly, deep hashing methods have greatly improved the search performance under supervised scenarios. In contrast, unsupervised deep hashing models can hardly achieve satisfactory performance due to the lack of reliable supervisory similarity signals. To address this issue, we propo…
▽ More
Due to the high storage and search efficiency, hashing has become prevalent for large-scale similarity search. Particularly, deep hashing methods have greatly improved the search performance under supervised scenarios. In contrast, unsupervised deep hashing models can hardly achieve satisfactory performance due to the lack of reliable supervisory similarity signals. To address this issue, we propose a novel deep unsupervised hashing model, dubbed DistillHash, which can learn a distilled data set consisted of data pairs, which have confidence similarity signals. Specifically, we investigate the relationship between the initial noisy similarity signals learned from local structures and the semantic similarity labels assigned by a Bayes optimal classifier. We show that under a mild assumption, some data pairs, of which labels are consistent with those assigned by the Bayes optimal classifier, can be potentially distilled. Inspired by this fact, we design a simple yet effective strategy to distill data pairs automatically and further adopt a Bayesian learning framework to learn hash functions from the distilled data set. Extensive experimental results on three widely used benchmark datasets show that the proposed DistillHash consistently accomplishes the state-of-the-art search performance.
△ Less
Submitted 9 May, 2019;
originally announced May 2019.
-
Spin-Layer- and Spin-Valley-Locking in CVD-Grown AA'- and AB-Stacked Tungsten-Disulfide Bilayers
Authors:
Lorenz Maximilian Schneider,
Jan Kuhnert,
Simon Schmitt,
Wolfram Heimbrodt,
Ulrich Huttner,
Lars Meckbach,
Tineke Stroucken,
Stephan W. Koch,
Shichen Fu,
Xiaotian Wang,
Kyungnam Kang,
Eui-Hyeok Yang,
Arash Rahimi-Iman
Abstract:
Valley-selective optical selection rules and a spin-valley locking in transition-metal dichalcogenide (TMDC) monolayers are at the heart of "valleytronic physics", which exploits the valley degree of freedom and has been a major research topic in recent years. In contrast, valleytronic properties of TMDC bilayers have not been in the focus so much by now. Here, we report on the valleytronic proper…
▽ More
Valley-selective optical selection rules and a spin-valley locking in transition-metal dichalcogenide (TMDC) monolayers are at the heart of "valleytronic physics", which exploits the valley degree of freedom and has been a major research topic in recent years. In contrast, valleytronic properties of TMDC bilayers have not been in the focus so much by now. Here, we report on the valleytronic properties and optical characterization of bilayers of WS2 as a representative TMDC material. In particular, we study the influence of the relative layer alignment in TMDC homo-bilayer samples on their polarization-dependent optical properties. Therefore, CVD-grown WS2 bilayer samples have been prepared that favor either the inversion symmetric AA' stacking or AB stacking without inversion symmetry during synthesis. Subsequently, a detailed analysis of reflection contrast and photoluminescence spectra under different polarization conditions has been performed. We observe circular and linear dichroism of the photoluminescence that is more pronounced for the AB stacking configuration. Our experimental findings are supported by theoretical calculations showing that the observed dichroism can be linked to optical selection rules, that maintain the spin-valley locking in the AB-stacked WS2 bilayer, whereas a spin-layer-locking is present the inversion symmetric AA' bilayer instead. Furthermore, our theoretical calculations predict a small relative shift of the excitonic resonances in both stacking configurations, which is also experimentally observed.
△ Less
Submitted 7 May, 2019;
originally announced May 2019.
-
Shared Predictive Cross-Modal Deep Quantization
Authors:
Erkun Yang,
Cheng Deng,
Chao Li,
Wei Liu,
Jie Li,
Dacheng Tao
Abstract:
With explosive growth of data volume and ever-increasing diversity of data modalities, cross-modal similarity search, which conducts nearest neighbor search across different modalities, has been attracting increasing interest. This paper presents a deep compact code learning solution for efficient cross-modal similarity search. Many recent studies have proven that quantization-based approaches per…
▽ More
With explosive growth of data volume and ever-increasing diversity of data modalities, cross-modal similarity search, which conducts nearest neighbor search across different modalities, has been attracting increasing interest. This paper presents a deep compact code learning solution for efficient cross-modal similarity search. Many recent studies have proven that quantization-based approaches perform generally better than hashing-based approaches on single-modal similarity search. In this paper, we propose a deep quantization approach, which is among the early attempts of leveraging deep neural networks into quantization-based cross-modal similarity search. Our approach, dubbed shared predictive deep quantization (SPDQ), explicitly formulates a shared subspace across different modalities and two private subspaces for individual modalities, and representations in the shared subspace and the private subspaces are learned simultaneously by embedding them to a reproducing kernel Hilbert space, where the mean embedding of different modality distributions can be explicitly compared. In addition, in the shared subspace, a quantizer is learned to produce the semantics preserving compact codes with the help of label alignment. Thanks to this novel network architecture in cooperation with supervised quantization training, SPDQ can preserve intramodal and intermodal similarities as much as possible and greatly reduce quantization error. Experiments on two popular benchmarks corroborate that our approach outperforms state-of-the-art methods.
△ Less
Submitted 16 April, 2019;
originally announced April 2019.
-
Scalable and Order-robust Continual Learning with Additive Parameter Decomposition
Authors:
Jaehong Yoon,
Saehoon Kim,
Eunho Yang,
Sung Ju Hwang
Abstract:
While recent continual learning methods largely alleviate the catastrophic problem on toy-sized datasets, some issues remain to be tackled to apply them to real-world problem domains. First, a continual learning model should effectively handle catastrophic forgetting and be efficient to train even with a large number of tasks. Secondly, it needs to tackle the problem of order-sensitivity, where th…
▽ More
While recent continual learning methods largely alleviate the catastrophic problem on toy-sized datasets, some issues remain to be tackled to apply them to real-world problem domains. First, a continual learning model should effectively handle catastrophic forgetting and be efficient to train even with a large number of tasks. Secondly, it needs to tackle the problem of order-sensitivity, where the performance of the tasks largely varies based on the order of the task arrival sequence, as it may cause serious problems where fairness plays a critical role (e.g. medical diagnosis). To tackle these practical challenges, we propose a novel continual learning method that is scalable as well as order-robust, which instead of learning a completely shared set of weights, represents the parameters for each task as a sum of task-shared and sparse task-adaptive parameters. With our Additive Parameter Decomposition (APD), the task-adaptive parameters for earlier tasks remain mostly unaffected, where we update them only to reflect the changes made to the task-shared parameters. This decomposition of parameters effectively prevents catastrophic forgetting and order-sensitivity, while being computation- and memory-efficient. Further, we can achieve even better scalability with APD using hierarchical knowledge consolidation, which clusters the task-adaptive parameters to obtain hierarchically shared parameters. We validate our network with APD, APD-Net, on multiple benchmark datasets against state-of-the-art continual learning methods, which it largely outperforms in accuracy, scalability, and order-robustness.
△ Less
Submitted 15 February, 2020; v1 submitted 25 February, 2019;
originally announced February 2019.
-
Künneth formulas for motives and additivity of traces
Authors:
Fangzhou **,
Enlin Yang
Abstract:
We prove several Künneth formulas in motivic homotopy categories and deduce a Verdier pairing in these categories following SGA5, which leads to the characteristic class of a constructible motive, an invariant closely related to the Euler-Poincaré characteristic. We prove an additivity property of the Verdier pairing using the language of derivators, following the approach of May and Groth-Ponto-S…
▽ More
We prove several Künneth formulas in motivic homotopy categories and deduce a Verdier pairing in these categories following SGA5, which leads to the characteristic class of a constructible motive, an invariant closely related to the Euler-Poincaré characteristic. We prove an additivity property of the Verdier pairing using the language of derivators, following the approach of May and Groth-Ponto-Shulman; using such a result we show that in the presence of a Chow weight structure, the characteristic class for all constructible motives is uniquely characterized by proper covariance, additivity along distinguished triangles, refined Gysin morphisms and Euler classes. In the relative setting, we prove the relative Künneth formulas under some transversality conditions, and define the relative characteristic class.
△ Less
Submitted 29 March, 2019; v1 submitted 16 December, 2018;
originally announced December 2018.
-
Evolution of A Magnetic Flux Rope toward Eruption
Authors:
Wensi Wang,
Chunming Zhu,
Jiong Qiu,
Rui Liu,
Kai E. Yang,
Qiang Hu
Abstract:
It is well accepted that a magnetic flux rope (MFR) is a critical component of many coronal mass ejections (CMEs), yet how it evolves toward eruption remains unclear. Here we investigate the continuous evolution of a pre-existing MFR, which is rooted in strong photospheric magnetic fields and electric currents. The evolution of the MFR is observed by the Solar Terrestrial Relations Observatory (ST…
▽ More
It is well accepted that a magnetic flux rope (MFR) is a critical component of many coronal mass ejections (CMEs), yet how it evolves toward eruption remains unclear. Here we investigate the continuous evolution of a pre-existing MFR, which is rooted in strong photospheric magnetic fields and electric currents. The evolution of the MFR is observed by the Solar Terrestrial Relations Observatory (STEREO) and the Solar Dynamics Observatory (SDO) from multiple viewpoints. From STEREO's perspective, the MFR starts to rise slowly above the limb five hours before it erupts as a halo CME on 2012 June 14. In SDO observations, conjugate dimmings develop on the disk, simultaneously with the gradual expansion of the MFR, suggesting that the dimmings map the MFR's feet. The evolution comprises a two-stage gradual expansion followed by another stage of rapid acceleration/eruption. Quantitative measurements indicate that magnetic twist of the MFR increases from 1.0 +/- 0.5 to 2.0 +/- 0.5 turns during the five-hour expansion, and further increases to about 4.0 turns per AU when detected as a magnetic cloud at 1 AU two day later. In addition, each stage is preceded by flare(s), implying reconnection is actively involved in the evolution and eruption of the MFR. The implications of these measurements on the CME initiation mechanisms are discussed.
△ Less
Submitted 9 December, 2018;
originally announced December 2018.
-
Soliton fractional charge of disordered graphene nanoribbon
Authors:
Y. H. Jeong,
S. -R. Eric Yang,
M. -C. Cha
Abstract:
We investigate the properties of the gap-edge states of half-filled interacting disordered zigzag graphene nanoribbons. We find that the midgap states can display the quantized fractional charge of 1/2. These gap-edge states can be represented by topological kinks with their site probability distribution divided between the opposite zigzag edges with different chiralities. In addition, there are n…
▽ More
We investigate the properties of the gap-edge states of half-filled interacting disordered zigzag graphene nanoribbons. We find that the midgap states can display the quantized fractional charge of 1/2. These gap-edge states can be represented by topological kinks with their site probability distribution divided between the opposite zigzag edges with different chiralities. In addition, there are numerous spin-split gap-edge states, similar to those in a Mott-Anderson insulator.
△ Less
Submitted 24 April, 2019; v1 submitted 6 December, 2018;
originally announced December 2018.
-
Magic angle effects in a trigonal Mn(III)3 cluster: deconstruction of a single-molecule magnet
Authors:
Jonathan Marbey,
Pei-Rung Gan,
En-Che Yang,
Stephen Hill
Abstract:
We present angle-dependent high-frequency EPR studies on a single-crystal of a trigonal Mn3 cluster with an unusual structure in which the local magnetic easy-axes of the constituent Mn(III) ions are tilted significantly away from the molecular C3 axis towards the magic-angle of 54.7 degrees, resulting in an almost complete cancelation of the 2nd-order axial magnetic anisotropy associated with the…
▽ More
We present angle-dependent high-frequency EPR studies on a single-crystal of a trigonal Mn3 cluster with an unusual structure in which the local magnetic easy-axes of the constituent Mn(III) ions are tilted significantly away from the molecular C3 axis towards the magic-angle of 54.7 degrees, resulting in an almost complete cancelation of the 2nd-order axial magnetic anisotropy associated with the ferromagnetically coupled total spin ST = 6 ground state. This contrasts the situation in many related Mn3 single-molecule magnets (SMMs) that have been studied intensively in the past, for which the local MnIII anisotropy tensors are reasonably parallel, resulting in substantial barriers to magnetization relaxation (Ueff = 30 to 35 cm 1) and magnetization blocking below about 2.5 K. The suppression of the 2nd-order anisotropy in the present case results in a situation in which the zero-field splitting (ZFS) of the ST = 6 ground state is dominated by 4th- and higher-order interactions. This provides a unique opportunity to study in depth how molecular geometry influences these interactions that are responsible for quantum tunneling of magnetization in high-symmetry SMMs. Angle-dependent EPR measurements provide a full map** of the molecular magneto-anisotropy. Meanwhile, irreducible tensor operator (ITO) methods are employed in order to obtain analytic expressions that directly relate molecular anisotropy to the microscopic physics, i.e., the ZFS tensors associated with the individual MnIII ions, their orientations, and the exchange coupling between the three spins. We find that the magic-angle tilting leads to a massive compression of the ST = 6 ground state energy level diagram and strong mixing between spin projection states. Although these characteristics are antagonistic to SMM behavior, they provide important insights into the physics of polynuclear molecular nanomagnets.
△ Less
Submitted 27 August, 2018;
originally announced August 2018.
-
Design and performance of wide-band corrugated walls for the BICEP Array detector modules at 30/40 GHz
Authors:
A. Soliman,
P. A. R. Ade,
Z. Ahmed,
R. W. Aikin,
K. D. Alexander,
D. Barkats,
S. J. Benton,
C. A. Bischoff,
J. J. Bock,
R. Bowens-Rubin,
J. A. Brevik,
I. Buder,
E. Bullock,
V. Buza,
J. Connors,
J. Cornelison,
B. P. Crill,
M. Crumrine,
M. Dierickx,
L. Duband,
C. Dvorkin,
J. P. Filippini,
S. Fliescher,
J. Grayson,
G. Hall
, et al. (53 additional authors not shown)
Abstract:
BICEP Array is a degree-scale Cosmic Microwave Background (CMB) experiment that will search for primordial B-mode polarization while constraining Galactic foregrounds. BICEP Array will be comprised of four receivers to cover a broad frequency range with channels at 30/40, 95, 150 and 220/270 GHz. The first low-frequency receiver will map synchrotron emission at 30 and 40 GHz and will deploy to the…
▽ More
BICEP Array is a degree-scale Cosmic Microwave Background (CMB) experiment that will search for primordial B-mode polarization while constraining Galactic foregrounds. BICEP Array will be comprised of four receivers to cover a broad frequency range with channels at 30/40, 95, 150 and 220/270 GHz. The first low-frequency receiver will map synchrotron emission at 30 and 40 GHz and will deploy to the South Pole at the end of 2019. In this paper, we give an overview of the BICEP Array science and instrument, with a focus on the detector module. We designed corrugations in the metal frame of the module to suppress unwanted interactions with the antenna-coupled detectors that would otherwise deform the beams of edge pixels. This design reduces the residual beam systematics and temperature-to-polarization leakage due to beam steering and shape mismatch between polarized beam pairs. We report on the simulated performance of single- and wide-band corrugations designed to minimize these effects. Our optimized design alleviates beam differential ellipticity caused by the metal frame to about 7% over 57% bandwidth (25 to 45 GHz), which is close to the level due the bare antenna itself without a metal frame. Initial laboratory measurements are also presented.
△ Less
Submitted 1 August, 2018;
originally announced August 2018.
-
BICEP Array: a multi-frequency degree-scale CMB polarimeter
Authors:
Howard Hui,
P. A. R. Ade,
Z. Ahmed,
R. W. Aikin,
K. D. Alexander,
D. Barkats,
S. J. Benton,
C. A. Bischoff,
J. J. Bock,
R. Bowens-Rubin,
J. A. Brevik,
I. Buder,
E. Bullock,
V. Buza,
J. Connors,
J. Cornelison,
B. P. Crill,
M. Crumrine,
M. Dierickx,
L. Duband,
C. Dvorkin,
J. P. Filippini,
S. Fliescher,
J. Grayson,
G. Hall
, et al. (53 additional authors not shown)
Abstract:
BICEP Array is the newest multi-frequency instrument in the BICEP/Keck Array program. It is comprised of four 550 mm aperture refractive telescopes observing the polarization of the cosmic microwave background (CMB) at 30/40, 95, 150 and 220/270 GHz with over 30,000 detectors. We present an overview of the receiver, detailing the optics, thermal, mechanical, and magnetic shielding design. BICEP Ar…
▽ More
BICEP Array is the newest multi-frequency instrument in the BICEP/Keck Array program. It is comprised of four 550 mm aperture refractive telescopes observing the polarization of the cosmic microwave background (CMB) at 30/40, 95, 150 and 220/270 GHz with over 30,000 detectors. We present an overview of the receiver, detailing the optics, thermal, mechanical, and magnetic shielding design. BICEP Array follows BICEP3's modular focal plane concept, and upgrades to 6" wafer to reduce fabrication with higher detector count per module. The first receiver at 30/40 GHz is expected to start observing at the South Pole during the 2019-20 season. By the end of the planned BICEP Array program, we project $σ(r) \sim 0.003$, assuming current modeling of polarized Galactic foreground and depending on the level of delensing that can be achieved with higher resolution maps from the South Pole Telescope.
△ Less
Submitted 1 August, 2018;
originally announced August 2018.
-
On the relative twist formula of $\ell$-adic sheaves
Authors:
Enlin Yang,
Yigeng Zhao
Abstract:
We propose a conjecture on the relative twist formula of $\ell$-adic sheaves, which can be viewed as a generalization of Kato-Saito's conjecture. We verify this conjecture under some transversal assumptions.
We also define a relative cohomological characteristic class and prove that its formation is compatible with proper push-forward. A conjectural relation is also given between the relative tw…
▽ More
We propose a conjecture on the relative twist formula of $\ell$-adic sheaves, which can be viewed as a generalization of Kato-Saito's conjecture. We verify this conjecture under some transversal assumptions.
We also define a relative cohomological characteristic class and prove that its formation is compatible with proper push-forward. A conjectural relation is also given between the relative twist formula and the relative cohomological characteristic class.
△ Less
Submitted 18 July, 2018;
originally announced July 2018.
-
Extremely Large Extreme-ultraviolet Late Phase Powered by Intense Early Heating in a Non-eruptive Solar Flare
Authors:
Yu Dai,
Mingde Ding,
Weiguo Zong,
Kai E. Yang
Abstract:
We analyzed and modeled an M1.2 non-eruptive solar flare on 2011 September 9. The flare exhibits a strong late-phase peak of the warm coronal emissions ($\sim$3~MK) of extreme-ultraviolet (EUV), with peak emission over 1.3 times that of the main flare peak. Multiple flare ribbons are observed, whose evolution indicates a two-stage energy release process. A non-linear force-free field (NLFFF) extra…
▽ More
We analyzed and modeled an M1.2 non-eruptive solar flare on 2011 September 9. The flare exhibits a strong late-phase peak of the warm coronal emissions ($\sim$3~MK) of extreme-ultraviolet (EUV), with peak emission over 1.3 times that of the main flare peak. Multiple flare ribbons are observed, whose evolution indicates a two-stage energy release process. A non-linear force-free field (NLFFF) extrapolation reveals the existence of a magnetic null point, a fan-spine structure, and two flux ropes embedded in the fan dome. Magnetic reconnections involved in the flare are driven by the destabilization and rise of one of the flux ropes. In the first stage, the fast ascending flux rope drives reconnections at the null point and the surrounding quasi-separatrix layer (QSL), while in the second stage, reconnection mainly occurs between the two legs of the field lines stretched by the eventually stopped flux rope. The late-phase loops are mainly produced by the first-stage QSL reconnection, while the second-stage reconnection is responsible for the heating of main flaring loops. The first-stage reconnection is believed to be more powerful, leading to an extremely strong EUV late phase. We find that the delayed occurrence of the late-phase peak is mainly due to the long cooling process of the long late-phase loops. Using the model enthalpy-based thermal evolution of loops (EBTEL), we model the EUV emissions from a late-phase loop. The modeling reveals a peak heating rate of 1.1~erg~cm$^{-3}$~s$^{-1}$ for the late-phase loop, which is obviously higher than previous values.
△ Less
Submitted 19 August, 2018; v1 submitted 3 July, 2018;
originally announced July 2018.
-
Deep Mixed Effect Model using Gaussian Processes: A Personalized and Reliable Prediction for Healthcare
Authors:
Ingyo Chung,
Saehoon Kim,
Juho Lee,
Kwang Joon Kim,
Sung Ju Hwang,
Eunho Yang
Abstract:
We present a personalized and reliable prediction model for healthcare, which can provide individually tailored medical services such as diagnosis, disease treatment, and prevention. Our proposed framework targets at making personalized and reliable predictions from time-series data, such as Electronic Health Records (EHR), by modeling two complementary components: i) a shared component that captu…
▽ More
We present a personalized and reliable prediction model for healthcare, which can provide individually tailored medical services such as diagnosis, disease treatment, and prevention. Our proposed framework targets at making personalized and reliable predictions from time-series data, such as Electronic Health Records (EHR), by modeling two complementary components: i) a shared component that captures global trend across diverse patients and ii) a patient-specific component that models idiosyncratic variability for each patient. To this end, we propose a composite model of a deep neural network to learn complex global trends from the large number of patients, and Gaussian Processes (GP) to probabilistically model individual time-series given relatively small number of visits per patient. We evaluate our model on diverse and heterogeneous tasks from EHR datasets and show practical advantages over standard time-series deep models such as pure Recurrent Neural Network (RNN).
△ Less
Submitted 24 November, 2019; v1 submitted 5 June, 2018;
originally announced June 2018.
-
Adaptive Network Sparsification with Dependent Variational Beta-Bernoulli Dropout
Authors:
Juho Lee,
Saehoon Kim,
Jaehong Yoon,
Hae Beom Lee,
Eunho Yang,
Sung Ju Hwang
Abstract:
While variational dropout approaches have been shown to be effective for network sparsification, they are still suboptimal in the sense that they set the dropout rate for each neuron without consideration of the input data. With such input-independent dropout, each neuron is evolved to be generic across inputs, which makes it difficult to sparsify networks without accuracy loss. To overcome this l…
▽ More
While variational dropout approaches have been shown to be effective for network sparsification, they are still suboptimal in the sense that they set the dropout rate for each neuron without consideration of the input data. With such input-independent dropout, each neuron is evolved to be generic across inputs, which makes it difficult to sparsify networks without accuracy loss. To overcome this limitation, we propose adaptive variational dropout whose probabilities are drawn from sparsity-inducing beta Bernoulli prior. It allows each neuron to be evolved either to be generic or specific for certain inputs, or dropped altogether. Such input-adaptive sparsity-inducing dropout allows the resulting network to tolerate larger degree of sparsity without losing its expressive power by removing redundancies among features. We validate our dependent variational beta-Bernoulli dropout on multiple public datasets, on which it obtains significantly more compact networks than baseline methods, with consistent accuracy improvements over the base networks.
△ Less
Submitted 3 March, 2019; v1 submitted 28 May, 2018;
originally announced May 2018.
-
Learning to Propagate Labels: Transductive Propagation Network for Few-shot Learning
Authors:
Yanbin Liu,
Juho Lee,
Minseop Park,
Saehoon Kim,
Eunho Yang,
Sung Ju Hwang,
Yi Yang
Abstract:
The goal of few-shot learning is to learn a classifier that generalizes well even when trained with a limited number of training instances per class. The recently introduced meta-learning approaches tackle this problem by learning a generic classifier across a large number of multiclass classification tasks and generalizing the model to a new task. Yet, even with such meta-learning, the low-data p…
▽ More
The goal of few-shot learning is to learn a classifier that generalizes well even when trained with a limited number of training instances per class. The recently introduced meta-learning approaches tackle this problem by learning a generic classifier across a large number of multiclass classification tasks and generalizing the model to a new task. Yet, even with such meta-learning, the low-data problem in the novel classification task still remains. In this paper, we propose Transductive Propagation Network (TPN), a novel meta-learning framework for transductive inference that classifies the entire test set at once to alleviate the low-data problem. Specifically, we propose to learn to propagate labels from labeled instances to unlabeled test instances, by learning a graph construction module that exploits the manifold structure in the data. TPN jointly learns both the parameters of feature embedding and the graph construction in an end-to-end manner. We validate TPN on multiple benchmark datasets, on which it largely outperforms existing few-shot learning approaches and achieves the state-of-the-art results.
△ Less
Submitted 8 February, 2019; v1 submitted 25 May, 2018;
originally announced May 2018.
-
Uncertainty-Aware Attention for Reliable Interpretation and Prediction
Authors:
Jay Heo,
Hae Beom Lee,
Saehoon Kim,
Juho Lee,
Kwang Joon Kim,
Eunho Yang,
Sung Ju Hwang
Abstract:
Attention mechanism is effective in both focusing the deep learning models on relevant features and interpreting them. However, attentions may be unreliable since the networks that generate them are often trained in a weakly-supervised manner. To overcome this limitation, we introduce the notion of input-dependent uncertainty to the attention mechanism, such that it generates attention for each fe…
▽ More
Attention mechanism is effective in both focusing the deep learning models on relevant features and interpreting them. However, attentions may be unreliable since the networks that generate them are often trained in a weakly-supervised manner. To overcome this limitation, we introduce the notion of input-dependent uncertainty to the attention mechanism, such that it generates attention for each feature with varying degrees of noise based on the given input, to learn larger variance on instances it is uncertain about. We learn this Uncertainty-aware Attention (UA) mechanism using variational inference, and validate it on various risk prediction tasks from electronic health records on which our model significantly outperforms existing attention models. The analysis of the learned attentions shows that our model generates attentions that comply with clinicians' interpretation, and provide richer interpretation via learned variance. Further evaluation of both the accuracy of the uncertainty calibration and the prediction performance with "I don't know" decision show that UA yields networks with high reliability as well.
△ Less
Submitted 24 May, 2018;
originally announced May 2018.
-
M-estimation with the Trimmed l1 Penalty
Authors:
Jihun Yun,
Peng Zheng,
Eunho Yang,
Aurelie Lozano,
Aleksandr Aravkin
Abstract:
We study high-dimensional estimators with the trimmed $\ell_1$ penalty, which leaves the $h$ largest parameter entries penalty-free. While optimization techniques for this nonconvex penalty have been studied, the statistical properties have not yet been analyzed. We present the first statistical analyses for $M$-estimation and characterize support recovery, $\ell_\infty$ and $\ell_2$ error of the…
▽ More
We study high-dimensional estimators with the trimmed $\ell_1$ penalty, which leaves the $h$ largest parameter entries penalty-free. While optimization techniques for this nonconvex penalty have been studied, the statistical properties have not yet been analyzed. We present the first statistical analyses for $M$-estimation and characterize support recovery, $\ell_\infty$ and $\ell_2$ error of the trimmed $\ell_1$ estimates as a function of the trimming parameter $h$. Our results show different regimes based on how $h$ compares to the true support size. Our second contribution is a new algorithm for the trimmed regularization problem, which has the same theoretical convergence rate as the difference of convex (DC) algorithms, but in practice is faster and finds lower objective values. Empirical evaluation of $\ell_1$ trimming for sparse linear regression and graphical model estimation indicate that trimmed $\ell_1$ can outperform vanilla $\ell_1$ and non-convex alternatives. Our last contribution is to show that the trimmed penalty is beneficial beyond $M$-estimation, and yields promising results for two deep learning tasks: input structures recovery and network sparsification.
△ Less
Submitted 10 May, 2019; v1 submitted 18 May, 2018;
originally announced May 2018.
-
Observationally quantified reconnection providing a viable mechanism for active region coronal heating
Authors:
Kai E. Yang,
Dana W. Longcope,
M. D. Ding,
Yang Guo
Abstract:
The heating of the Sun's corona has been explained by several different mechanisms including wave dissipation and magnetic reconnection. While both have been shown capable of supplying the requisite power, neither has been used in a quantitative model of observations fed by measured inputs. Here we show that impulsive reconnection is capable of producing an active region corona agreeing both quali…
▽ More
The heating of the Sun's corona has been explained by several different mechanisms including wave dissipation and magnetic reconnection. While both have been shown capable of supplying the requisite power, neither has been used in a quantitative model of observations fed by measured inputs. Here we show that impulsive reconnection is capable of producing an active region corona agreeing both qualitatively and quantitatively with extreme-ultraviolet observations. We calculate the heating power proportional to the velocity difference between magnetic footpoints and the photospheric plasma, called the non-ideal velocity. The length scale of flux elements reconnected in the corona is found to be around 160 km. The differential emission measure of the model corona agrees with that derived using multi-wavelength images. Synthesized extreme-ultraviolet images resemble observations both in their loop-dominated appearance and their intensity histograms. This work provides compelling evidence that impulsive reconnection events are a viable mechanism for heating the corona.
△ Less
Submitted 17 February, 2018;
originally announced February 2018.
-
DropMax: Adaptive Variational Softmax
Authors:
Hae Beom Lee,
Juho Lee,
Saehoon Kim,
Eunho Yang,
Sung Ju Hwang
Abstract:
We propose DropMax, a stochastic version of softmax classifier which at each iteration drops non-target classes according to dropout probabilities adaptively decided for each instance. Specifically, we overlay binary masking variables over class output probabilities, which are input-adaptively learned via variational inference. This stochastic regularization has an effect of building an ensemble c…
▽ More
We propose DropMax, a stochastic version of softmax classifier which at each iteration drops non-target classes according to dropout probabilities adaptively decided for each instance. Specifically, we overlay binary masking variables over class output probabilities, which are input-adaptively learned via variational inference. This stochastic regularization has an effect of building an ensemble classifier out of exponentially many classifiers with different decision boundaries. Moreover, the learning of dropout rates for non-target classes on each instance allows the classifier to focus more on classification against the most confusing classes. We validate our model on multiple public datasets for classification, on which it obtains significantly improved accuracy over the regular softmax classifier and other baselines. Further analysis of the learned dropout probabilities shows that our model indeed selects confusing classes more often when it performs classification.
△ Less
Submitted 2 November, 2018; v1 submitted 21 December, 2017;
originally announced December 2017.
-
A Foundry of Human Activities and Infrastructures
Authors:
Robert B. Allen,
Eunsang Yang,
Tatsawan Timakum
Abstract:
Direct representation knowledgebases can enhance and even provide an alternative to document-centered digital libraries. Here we consider realist semantic modeling of everyday activities and infrastructures in such knowledgebases. Because we want to integrate a wide variety of topics, a collection of ontologies (a foundry) and a range of other knowledge resources are needed. We first consider mode…
▽ More
Direct representation knowledgebases can enhance and even provide an alternative to document-centered digital libraries. Here we consider realist semantic modeling of everyday activities and infrastructures in such knowledgebases. Because we want to integrate a wide variety of topics, a collection of ontologies (a foundry) and a range of other knowledge resources are needed. We first consider modeling the routine procedures that support human activities and technologies. Next, we examine the interactions of technologies with aspects of social organization. Then, we consider approaches and issues for develo** and validating explanations of the relationships among various entities.
△ Less
Submitted 31 October, 2017;
originally announced November 2017.
-
Multitask Learning using Task Clustering with Applications to Predictive Modeling and GWAS of Plant Varieties
Authors:
Ming Yu,
Addie M. Thompson,
Karthikeyan Natesan Ramamurthy,
Eunho Yang,
Aurélie C. Lozano
Abstract:
Inferring predictive maps between multiple input and multiple output variables or tasks has innumerable applications in data science. Multi-task learning attempts to learn the maps to several output tasks simultaneously with information sharing between them. We propose a novel multi-task learning framework for sparse linear regression, where a full task hierarchy is automatically inferred from the…
▽ More
Inferring predictive maps between multiple input and multiple output variables or tasks has innumerable applications in data science. Multi-task learning attempts to learn the maps to several output tasks simultaneously with information sharing between them. We propose a novel multi-task learning framework for sparse linear regression, where a full task hierarchy is automatically inferred from the data, with the assumption that the task parameters follow a hierarchical tree structure. The leaves of the tree are the parameters for individual tasks, and the root is the global model that approximates all the tasks. We apply the proposed approach to develop and evaluate: (a) predictive models of plant traits using large-scale and automated remote sensing data, and (b) GWAS methodologies map** such derived phenotypes in lieu of hand-measured traits. We demonstrate the superior performance of our approach compared to other methods, as well as the usefulness of discovering hierarchical grou**s between tasks. Our results suggest that richer genetic map** can indeed be obtained from the remote sensing data. In addition, our discovered grou**s reveal interesting insights from a plant science perspective.
△ Less
Submitted 4 October, 2017;
originally announced October 2017.
-
Gap states and edge properties of rectangular graphene quantum dot in staggered potential
Authors:
Y. H. Jeong,
S. -R. Eric Yang
Abstract:
We investigate edge properties of a gapful rectangular graphene quantum dot in a staggered potential. In such a system gap states with discrete and closely spaced energy levels exist that are spatially located on the left or right zigzag edge. We find that, although the bulk states outside the energy gap are nearly unaffected, spin degeneracy of each gap state is lifted by the staggered potential.…
▽ More
We investigate edge properties of a gapful rectangular graphene quantum dot in a staggered potential. In such a system gap states with discrete and closely spaced energy levels exist that are spatially located on the left or right zigzag edge. We find that, although the bulk states outside the energy gap are nearly unaffected, spin degeneracy of each gap state is lifted by the staggered potential. We have computed the occupation numbers of spin-up and -down gap states at various values of the strength of the staggered potential. The electronic and magnetic properties of the zigzag edges depend sensitively on these numbers. We discuss the possibility of applying this system as a single electron spintronic device.
△ Less
Submitted 27 September, 2017;
originally announced September 2017.
-
Why Pay More When You Can Pay Less: A Joint Learning Framework for Active Feature Acquisition and Classification
Authors:
Ha** Shim,
Sung Ju Hwang,
Eunho Yang
Abstract:
We consider the problem of active feature acquisition, where we sequentially select the subset of features in order to achieve the maximum prediction performance in the most cost-effective way. In this work, we formulate this active feature acquisition problem as a reinforcement learning problem, and provide a novel framework for jointly learning both the RL agent and the classifier (environment).…
▽ More
We consider the problem of active feature acquisition, where we sequentially select the subset of features in order to achieve the maximum prediction performance in the most cost-effective way. In this work, we formulate this active feature acquisition problem as a reinforcement learning problem, and provide a novel framework for jointly learning both the RL agent and the classifier (environment). We also introduce a more systematic way of encoding subsets of features that can properly handle innate challenge with missing entries in active feature acquisition problems, that uses the orderless LSTM-based set encoding mechanism that readily fits in the joint learning framework. We evaluate our model on a carefully designed synthetic dataset for the active feature acquisition as well as several real datasets such as electric health record (EHR) datasets, on which it outperforms all baselines in terms of prediction performance as well feature acquisition cost.
△ Less
Submitted 18 September, 2017;
originally announced September 2017.
-
Topological end states and Zak phase of rectangular armchair ribbon
Authors:
Y. H. Jeong,
S. -R. Eric Yang
Abstract:
We consider the end states of a half-filled rectangular armchair graphene ribbon (RAGR) in a staggered potential. Taking electron-electron interactions into account we find that, as the strength of the staggered potential varies, three types of couplings between the end states can occur: antiferromagnetic without or with spin splitting, and paramagnetic without spin-splitting. We find that a spin-…
▽ More
We consider the end states of a half-filled rectangular armchair graphene ribbon (RAGR) in a staggered potential. Taking electron-electron interactions into account we find that, as the strength of the staggered potential varies, three types of couplings between the end states can occur: antiferromagnetic without or with spin splitting, and paramagnetic without spin-splitting. We find that a spin-splitting is present only in the staggered potential region $0<Δ<Δ_c$. The transition from the antiferromagnetic state at $Δ=0$ to the paramagnetic state goes through an intermediate spin-split antiferromagnetic state, and this spin-splitting disappears suddenly at $Δ_c$. For small and large values of $Δ$ the end charge of a RAGR can be connected to the Zak phase of the periodic armchair graphene ribbon (PARG) with the same width, and it varies continuously as the strength of the potential changes.
△ Less
Submitted 13 September, 2017;
originally announced September 2017.
-
Lifelong Learning with Dynamically Expandable Networks
Authors:
Jaehong Yoon,
Eunho Yang,
Jeongtae Lee,
Sung Ju Hwang
Abstract:
We propose a novel deep network architecture for lifelong learning which we refer to as Dynamically Expandable Network (DEN), that can dynamically decide its network capacity as it trains on a sequence of tasks, to learn a compact overlap** knowledge sharing structure among tasks. DEN is efficiently trained in an online manner by performing selective retraining, dynamically expands network capac…
▽ More
We propose a novel deep network architecture for lifelong learning which we refer to as Dynamically Expandable Network (DEN), that can dynamically decide its network capacity as it trains on a sequence of tasks, to learn a compact overlap** knowledge sharing structure among tasks. DEN is efficiently trained in an online manner by performing selective retraining, dynamically expands network capacity upon arrival of each task with only the necessary number of units, and effectively prevents semantic drift by splitting/duplicating units and timestam** them. We validate DEN on multiple public datasets under lifelong learning scenarios, on which it not only significantly outperforms existing lifelong learning methods for deep networks, but also achieves the same level of performance as the batch counterparts with substantially fewer number of parameters. Further, the obtained network fine-tuned on all tasks obtained significantly better performance over the batch models, which shows that it can be used to estimate the optimal network structure even when all tasks are available in the first place.
△ Less
Submitted 11 June, 2018; v1 submitted 4 August, 2017;
originally announced August 2017.
-
Deep Asymmetric Multi-task Feature Learning
Authors:
Hae Beom Lee,
Eunho Yang,
Sung Ju Hwang
Abstract:
We propose Deep Asymmetric Multitask Feature Learning (Deep-AMTFL) which can learn deep representations shared across multiple tasks while effectively preventing negative transfer that may happen in the feature sharing process. Specifically, we introduce an asymmetric autoencoder term that allows reliable predictors for the easy tasks to have high contribution to the feature learning while suppres…
▽ More
We propose Deep Asymmetric Multitask Feature Learning (Deep-AMTFL) which can learn deep representations shared across multiple tasks while effectively preventing negative transfer that may happen in the feature sharing process. Specifically, we introduce an asymmetric autoencoder term that allows reliable predictors for the easy tasks to have high contribution to the feature learning while suppressing the influences of unreliable predictors for more difficult tasks. This allows the learning of less noisy representations, and enables unreliable predictors to exploit knowledge from the reliable predictors via the shared latent features. Such asymmetric knowledge transfer through shared features is also more scalable and efficient than inter-task asymmetric transfer. We validate our Deep-AMTFL model on multiple benchmark datasets for multitask learning and image classification, on which it significantly outperforms existing symmetric and asymmetric multitask learning models, by effectively preventing negative transfer in deep feature learning.
△ Less
Submitted 30 June, 2018; v1 submitted 1 August, 2017;
originally announced August 2017.
-
Achieving Large, Tunable Strain in Monolayer Transition-Metal Dichalcogenides
Authors:
Abdollah,
M. Dadgar,
Declan Scullion,
Kyungnam Kang,
Daniel Esposito,
Eui-Hyoek Yang,
Irving P. Herman,
Marcos A. Pimenta,
Elton-J. G. Santos,
Abhay N. Pasupathy
Abstract:
We describe a facile technique based on polymer encapsulation to apply several percent controllable strains to monolayer and few-layer Transition Metal Dichalcogenides (TMDs). We use this technique to study the lattice response to strain via polarized Raman spectroscopy in monolayer WSe2 and WS2. The application of strain causes mode-dependent redshifts, with larger shift rates observed for in-pla…
▽ More
We describe a facile technique based on polymer encapsulation to apply several percent controllable strains to monolayer and few-layer Transition Metal Dichalcogenides (TMDs). We use this technique to study the lattice response to strain via polarized Raman spectroscopy in monolayer WSe2 and WS2. The application of strain causes mode-dependent redshifts, with larger shift rates observed for in-plane modes. We observe a splitting of the degeneracy of the in-plane E' modes in both materials and measure the Gruneisen parameters. At large strain, we observe that the reduction of crystal symmetry can lead to a change in the polarization response of the A' mode in WS2. While both WSe2 and WS2 exhibit similar qualitative changes in the phonon structure with strain, we observe much larger changes in mode positions and intensities with strain in WS2. These differences can be explained simply by the degree of iconicity of the metal-chalcogen bond.
△ Less
Submitted 15 May, 2017;
originally announced May 2017.
-
Learning task structure via sparsity grouped multitask learning
Authors:
Meghana Kshirsagar,
Eunho Yang,
Aurélie C. Lozano
Abstract:
Sparse map** has been a key methodology in many high-dimensional scientific problems. When multiple tasks share the set of relevant features, learning them jointly in a group drastically improves the quality of relevant feature selection. However, in practice this technique is used limitedly since such grou** information is usually hidden. In this paper, our goal is to recover the group struct…
▽ More
Sparse map** has been a key methodology in many high-dimensional scientific problems. When multiple tasks share the set of relevant features, learning them jointly in a group drastically improves the quality of relevant feature selection. However, in practice this technique is used limitedly since such grou** information is usually hidden. In this paper, our goal is to recover the group structure on the sparsity patterns and leverage that information in the sparse learning. Toward this, we formulate a joint optimization problem in the task parameter and the group membership, by constructing an appropriate regularizer to encourage sparse learning as well as correct recovery of task groups. We further demonstrate that our proposed method recovers groups and the sparsity patterns in the task parameters accurately by extensive experiments.
△ Less
Submitted 14 September, 2017; v1 submitted 13 May, 2017;
originally announced May 2017.
-
Sequential Local Learning for Latent Graphical Models
Authors:
Sejun Park,
Eunho Yang,
**woo Shin
Abstract:
Learning parameters of latent graphical models (GM) is inherently much harder than that of no-latent ones since the latent variables make the corresponding log-likelihood non-concave. Nevertheless, expectation-maximization schemes are popularly used in practice, but they are typically stuck in local optima. In the recent years, the method of moments have provided a refreshing angle for resolving t…
▽ More
Learning parameters of latent graphical models (GM) is inherently much harder than that of no-latent ones since the latent variables make the corresponding log-likelihood non-concave. Nevertheless, expectation-maximization schemes are popularly used in practice, but they are typically stuck in local optima. In the recent years, the method of moments have provided a refreshing angle for resolving the non-convex issue, but it is applicable to a quite limited class of latent GMs. In this paper, we aim for enhancing its power via enlarging such a class of latent GMs. To this end, we introduce two novel concepts, coined marginalization and conditioning, which can reduce the problem of learning a larger GM to that of a smaller one. More importantly, they lead to a sequential learning framework that repeatedly increases the learning portion of given latent GM, and thus covers a significantly broader and more complicated class of loopy latent GMs which include convolutional and random regular models.
△ Less
Submitted 15 March, 2017; v1 submitted 12 March, 2017;
originally announced March 2017.
-
Relative singular support and the semi-continuity of characteristic cycles for étale sheaves
Authors:
Haoyu Hu,
Enlin Yang
Abstract:
Recently, the singular support and the characteristic cycle of an étale sheaf on a smooth variety over a perfect field are constructed by Beilinson and Saito, respectively. In this article, we extend the singular support to a relative situation. As an application, we prove the generic constancy for singular supports and characteristic cycles of étale sheaves on a smooth fibration. Meanwhile, we sh…
▽ More
Recently, the singular support and the characteristic cycle of an étale sheaf on a smooth variety over a perfect field are constructed by Beilinson and Saito, respectively. In this article, we extend the singular support to a relative situation. As an application, we prove the generic constancy for singular supports and characteristic cycles of étale sheaves on a smooth fibration. Meanwhile, we show the failure of the lower semi-continuity of characteristic cycles in a higher relative dimension case, which is different from Deligne and Laumon's result in the relative curve case.
△ Less
Submitted 22 February, 2017;
originally announced February 2017.
-
Characteristic class and the epsilon factor of an étale sheaf
Authors:
Naoya Umezaki,
Enlin Yang,
Yigeng Zhao
Abstract:
We prove a twist formula for the epsilon factor of a constructible sheaf on a projective smooth variety over a finite field in terms of characteristic class of the sheaf. This formula is a modified version of the formula conjectured by Kato and Saito in [Ann. Math., 168 (2008):33-96, Conjecture 4.3.11].
We give two applications of the twist formula. Firstly, we prove that the characteristic clas…
▽ More
We prove a twist formula for the epsilon factor of a constructible sheaf on a projective smooth variety over a finite field in terms of characteristic class of the sheaf. This formula is a modified version of the formula conjectured by Kato and Saito in [Ann. Math., 168 (2008):33-96, Conjecture 4.3.11].
We give two applications of the twist formula. Firstly, we prove that the characteristic classes of constructible étale sheaves on projective smooth varieties over a finite field are compatible with proper push-forward. Secondly, we show that the two Swan classes in the literature are the same on proper smooth surfaces over a finite field.
△ Less
Submitted 16 March, 2018; v1 submitted 10 January, 2017;
originally announced January 2017.
-
Influence of the Substrate Material on the Optical Properties of Tungsten Diselenide Monolayers
Authors:
Sina Lippert,
Lorenz Maximilian Schneider,
Dylan Renaud,
Kyung Nam Kang,
Obafunso Ajayi,
Marc-Uwe Halbich,
Oday M. Abdulmunem,
Xing Lin,
Jan Kuhnert,
Khaleel Hassoon,
Saeideh Edalati-Boostan,
Young Duck Kim,
Wolfram Heimbrodt,
Eui-Hyeok Yang,
James Hone,
Arash Rahimi-Iman
Abstract:
Monolayers of transition-metal dichalcogenides such as WSe2 have become increasingly attractive due to their potential in electrical and optical applications. Because the properties of these 2D systems are known to be affected by their surroundings, we report how the choice of the substrate material affects the optical properties of monolayer WSe2. To accomplish this study, pump-density-dependent…
▽ More
Monolayers of transition-metal dichalcogenides such as WSe2 have become increasingly attractive due to their potential in electrical and optical applications. Because the properties of these 2D systems are known to be affected by their surroundings, we report how the choice of the substrate material affects the optical properties of monolayer WSe2. To accomplish this study, pump-density-dependent micro-photoluminescence measurements are performed with time-integrating and time-resolving acquisition techniques. Spectral information and power-dependent mode intensities are compared at 290K and 10K for exfoliated WSe2 on SiO2/Si, sapphire (Al2O3), hBN/Si3N4/Si, and MgF2, indicating substrate-dependent appearance and strength of exciton, trion, and biexciton modes. Additionally, one CVD-grown WSe2 monolayer on sapphire is included in this study for direct comparison with its exfoliated counterpart. Time-resolved micro-photoluminescence shows how radiative decay times strongly differ for different substrate materials. Our data indicates exciton-exciton annihilation as a shortening mechanism at room temperature, and subtle trends in the decay rates in correlation to the dielectric environment at cryogenic temperatures. On the measureable time scales, trends are also related to the extent of the respective 2D-excitonic modes' appearance. This result highlights the importance of further detailed characterization of exciton features in 2D materials, particularly with respect to the choice of substrate.
△ Less
Submitted 30 September, 2016;
originally announced October 2016.
-
A Review of Multivariate Distributions for Count Data Derived from the Poisson Distribution
Authors:
David I. Inouye,
Eunho Yang,
Genevera I. Allen,
Pradeep Ravikumar
Abstract:
The Poisson distribution has been widely studied and used for modeling univariate count-valued data. Multivariate generalizations of the Poisson distribution that permit dependencies, however, have been far less popular. Yet, real-world high-dimensional count-valued data found in word counts, genomics, and crime statistics, for example, exhibit rich dependencies, and motivate the need for multivar…
▽ More
The Poisson distribution has been widely studied and used for modeling univariate count-valued data. Multivariate generalizations of the Poisson distribution that permit dependencies, however, have been far less popular. Yet, real-world high-dimensional count-valued data found in word counts, genomics, and crime statistics, for example, exhibit rich dependencies, and motivate the need for multivariate distributions that can appropriately model this data. We review multivariate distributions derived from the univariate Poisson, categorizing these models into three main classes: 1) where the marginal distributions are Poisson, 2) where the joint distribution is a mixture of independent multivariate Poisson distributions, and 3) where the node-conditional distributions are derived from the Poisson. We discuss the development of multiple instances of these classes and compare the models in terms of interpretability and theory. Then, we empirically compare multiple models from each class on three real-world datasets that have varying data characteristics from different domains, namely traffic accident data, biological next generation sequencing data, and text data. These empirical experiments develop intuition about the comparative advantages and disadvantages of each class of multivariate distribution that was derived from the Poisson. Finally, we suggest new research directions as explored in the subsequent discussion section.
△ Less
Submitted 27 December, 2016; v1 submitted 31 August, 2016;
originally announced September 2016.
-
Template-Assisted Direct Growth of 1Td/in$^2$ Bit Patterned Media
Authors:
En Yang,
Zuwei Liu,
Hitesh Arora,
Tsai-wei Wu,
Vipin Ayanoor-Vitikkate,
Detlef Spoddig,
Daniel Bedau,
Michael Grobis,
Bruce A. Gurney,
Thomas R. Albrecht,
Bruce Terris
Abstract:
We present a method for growing bit patterned magnetic recording media using directed growth of sputtered granular perpendicular magnetic recording media. The grain nucleation is templated using an epitaxial seed layer which contains Pt pillars separated by amorphous metal oxide. The scheme enables the creation of both templated data and servo regions suitable for high density hard disk drive oper…
▽ More
We present a method for growing bit patterned magnetic recording media using directed growth of sputtered granular perpendicular magnetic recording media. The grain nucleation is templated using an epitaxial seed layer which contains Pt pillars separated by amorphous metal oxide. The scheme enables the creation of both templated data and servo regions suitable for high density hard disk drive operation. We illustrate the importance of using a process that is both topographically and chemically driven to achieve high quality media.
△ Less
Submitted 16 June, 2016;
originally announced June 2016.
-
Two Dimensional Classification of the Swift/BAT GRBs
Authors:
E. B. Yang,
Z. B. Zhang,
X. X. Jiang
Abstract:
Using Gaussian Mixture Model and Expectation Maximization algorithm, we have performed a density estimation in the framework of $T_{90}$ versus hardness ratio for 296 Swift/BAT GRBs with known redshift. Here, Bayesian Information Criterion has been taken to compare different models. Our investigations show that two instead of three or more Gaussian components are favoured in both the observer and…
▽ More
Using Gaussian Mixture Model and Expectation Maximization algorithm, we have performed a density estimation in the framework of $T_{90}$ versus hardness ratio for 296 Swift/BAT GRBs with known redshift. Here, Bayesian Information Criterion has been taken to compare different models. Our investigations show that two instead of three or more Gaussian components are favoured in both the observer and rest frames. Our key findings are consistent with some previous results.
△ Less
Submitted 23 June, 2016; v1 submitted 5 June, 2016;
originally announced June 2016.
-
A General Family of Trimmed Estimators for Robust High-dimensional Data Analysis
Authors:
Eunho Yang,
Aurelie Lozano,
Aleksandr Aravkin
Abstract:
We consider the problem of robustifying high-dimensional structured estimation. Robust techniques are key in real-world applications which often involve outliers and data corruption. We focus on trimmed versions of structurally regularized M-estimators in the high-dimensional setting, including the popular Least Trimmed Squares estimator, as well as analogous estimators for generalized linear mode…
▽ More
We consider the problem of robustifying high-dimensional structured estimation. Robust techniques are key in real-world applications which often involve outliers and data corruption. We focus on trimmed versions of structurally regularized M-estimators in the high-dimensional setting, including the popular Least Trimmed Squares estimator, as well as analogous estimators for generalized linear models and graphical models, using possibly non-convex loss functions. We present a general analysis of their statistical convergence rates and consistency, and then take a closer look at the trimmed versions of the Lasso and Graphical Lasso estimators as special cases. On the optimization side, we show how to extend algorithms for M-estimators to fit trimmed variants and provide guarantees on their numerical convergence. The generality and competitive performance of high-dimensional trimmed estimators are illustrated numerically on both simulated and real-world genomics data.
△ Less
Submitted 21 August, 2017; v1 submitted 26 May, 2016;
originally announced May 2016.
-
Classifying Gamma-Ray Bursts with Gaussian Mixture Model
Authors:
En-Bo Yang,
Zhi-Bin Zhang,
Chul-Sung Choi,
Heon-Young Chang
Abstract:
Using Gaussian Mixture Model (GMM) and Expectation Maximization Algorithm, we perform an analysis of time duration ($T_{90}$) for \textit{CGRO}/BATSE, \textit{Swift}/BAT and \textit{Fermi}/GBM Gamma-Ray Bursts. The $T_{90}$ distributions of 298 redshift-known \textit{Swift}/BAT GRBs have also been studied in both observer and rest frames. Bayesian Information Criterion has been used to compare bet…
▽ More
Using Gaussian Mixture Model (GMM) and Expectation Maximization Algorithm, we perform an analysis of time duration ($T_{90}$) for \textit{CGRO}/BATSE, \textit{Swift}/BAT and \textit{Fermi}/GBM Gamma-Ray Bursts. The $T_{90}$ distributions of 298 redshift-known \textit{Swift}/BAT GRBs have also been studied in both observer and rest frames. Bayesian Information Criterion has been used to compare between different GMM models. We find that two Gaussian components are better to describe the \textit{CGRO}/BATSE and \textit{Fermi}/GBM GRBs in the observer frame. Also, we caution that two groups are expected for the \textit{Swift}/BAT bursts in the rest frame, which is consistent with some previous results. However, \textit{Swift} GRBs in the observer frame seem to show a trimodal distribution, of which the superficial intermediate class may result from the selection effect of \textit{Swift}/BAT.
△ Less
Submitted 29 July, 2016; v1 submitted 11 March, 2016;
originally announced March 2016.
-
Semi-continuity for total dimension divisors of étale sheaves
Authors:
Haoyu Hu,
Enlin Yang
Abstract:
In this article, we extend a pull-back inequality for total dimension divisors of étale sheavs due to Saito. Using this formula, we generalize Deligne and Laumon's lower semi-continuous property for Swan conductors of étale sheaves on relative curves to higher relative dimensions in a geometric situation.
In this article, we extend a pull-back inequality for total dimension divisors of étale sheavs due to Saito. Using this formula, we generalize Deligne and Laumon's lower semi-continuous property for Swan conductors of étale sheaves on relative curves to higher relative dimensions in a geometric situation.
△ Less
Submitted 15 January, 2016; v1 submitted 16 November, 2015;
originally announced November 2015.
-
Graphene nanosystems and low-dimensional Chern-Simons topological insulators
Authors:
Y. H. Jeong,
S. -R. Eric Yang
Abstract:
A graphene nanoribbon is a good candidate for a $(1+1)$ Chern-Simons topological insulator since it obeys particle-hole symmetry. We show that in a finite semiconducting armchair ribbon, which has two zigzag edges and two armchair edges, a $(1+1)$ Chern-Simons topological insulator is indeed realized as the length of the armchair edges becomes large in comparison to that of the zigzag edges. But o…
▽ More
A graphene nanoribbon is a good candidate for a $(1+1)$ Chern-Simons topological insulator since it obeys particle-hole symmetry. We show that in a finite semiconducting armchair ribbon, which has two zigzag edges and two armchair edges, a $(1+1)$ Chern-Simons topological insulator is indeed realized as the length of the armchair edges becomes large in comparison to that of the zigzag edges. But only a quasi-topological insulator is formed in a metallic armchair ribbon with a pseudogap. In such systems a zigzag edge acts like a domain wall, through which the polarization changes from $0$ to $e/2$, forming a fractional charge of one-half. When the lengths of the zigzag edges and the armchair edges are comparable a rectangular graphene sheet (RGS) is realized, which also possess particle-hole symmetry. We show that it is a $(0+1)$ Chern-Simons topological insulator. We find that the cyclic Berry phase of states of a RGS is quantized as $π$ or $0$ (mod $2π$), and that the Berry phases of the particle-hole conjugate states are equal each other. By applying the Atiyah-Singer index theorem to a rectangular ribbon and a RGS we find that the lower bound on the number of nearly zero energy end states is approximately proportional to the length of the zigzag edges. However, there is a correction to this index theorem due to the effects beyond the effective mass approximation.
△ Less
Submitted 3 November, 2015;
originally announced November 2015.
-
Robust Gaussian Graphical Modeling with the Trimmed Graphical Lasso
Authors:
Eunho Yang,
Aurélie C. Lozano
Abstract:
Gaussian Graphical Models (GGMs) are popular tools for studying network structures. However, many modern applications such as gene network discovery and social interactions analysis often involve high-dimensional noisy data with outliers or heavier tails than the Gaussian distribution. In this paper, we propose the Trimmed Graphical Lasso for robust estimation of sparse GGMs. Our method guards aga…
▽ More
Gaussian Graphical Models (GGMs) are popular tools for studying network structures. However, many modern applications such as gene network discovery and social interactions analysis often involve high-dimensional noisy data with outliers or heavier tails than the Gaussian distribution. In this paper, we propose the Trimmed Graphical Lasso for robust estimation of sparse GGMs. Our method guards against outliers by an implicit trimming mechanism akin to the popular Least Trimmed Squares method used for linear regression. We provide a rigorous statistical analysis of our estimator in the high-dimensional setting. In contrast, existing approaches for robust sparse GGMs estimation lack statistical guarantees. Our theoretical results are complemented by experiments on simulated and real gene expression data which further demonstrate the value of our approach.
△ Less
Submitted 28 October, 2015;
originally announced October 2015.
-
Inclusion of Forbidden Minors in Random Representable Matroids
Authors:
Jason Altschuler,
Elizabeth Yang
Abstract:
In 1984, Kelly and Oxley introduced the model of a random representable matroid $M[A_n]$ corresponding to a random matrix $A_n \in \mathbb{F}_q^{m(n) \times n}$, whose entries are drawn independently and uniformly from $\mathbb{F}_q$. Whereas properties such as rank, connectivity, and circuit size have been well-studied, forbidden minors have not yet been analyzed. Here, we investigate the asympto…
▽ More
In 1984, Kelly and Oxley introduced the model of a random representable matroid $M[A_n]$ corresponding to a random matrix $A_n \in \mathbb{F}_q^{m(n) \times n}$, whose entries are drawn independently and uniformly from $\mathbb{F}_q$. Whereas properties such as rank, connectivity, and circuit size have been well-studied, forbidden minors have not yet been analyzed. Here, we investigate the asymptotic probability as $n \to \infty$ that a fixed $\mathbb{F}_q$-representable matroid $M$ is a minor of $M[A_n]$. (We always assume $m(n) \geq \text{rank}(M)$ for all sufficiently large $n$, otherwise $M$ can never be a minor of the corresponding $M[A_n]$.) When $M$ is free, we show that $M$ is asymptotically almost surely (a.a.s.) a minor of $M[A_n]$. When $M$ is not free, we show a phase transition: $M$ is a.a.s. a minor if $n - m(n) \to \infty$, but is a.a.s. not if $m(n) - n \to \infty$. In the more general settings of $m \leq n$ and $m > n$, we give lower and upper bounds, respectively, on both the asymptotic and non-asymptotic probability that $M$ is a minor of $M[A_n]$. The tools we develop to analyze matroid operations and minors of random matroids may be of independent interest.
Our results directly imply that $M[A_n]$ is a.a.s. not contained in any proper, minor-closed class $\mathcal{M}$ of $\mathbb{F}_q$-representable matroids, provided: (i) $n - m(n) \to \infty$, and (ii) $m(n)$ is at least the minimum rank of any $\mathbb{F}_q$-representable forbidden minor of $\mathcal{M}$, for all sufficiently large $n$. As an application, this shows that graphic matroids are a vanishing subset of linear matroids, in a sense made precise in the paper. Our results provide an approach for applying the rich theory around matroid minors to the less-studied field of random matroids.
△ Less
Submitted 5 February, 2017; v1 submitted 19 July, 2015;
originally announced July 2015.