-
Observing low elevation sky and the CMB Cold Spot with BICEP3 at the South Pole
Authors:
J. Kang,
P. A. R. Ade,
Z. Ahmed,
M. Amiri,
D. Barkats,
R. Basu Thakur,
C. A. Bischoff,
J. J. Bock,
H. Boenish,
E. Bullock,
V. Buza,
J. R. Cheshire,
J. Connors,
J. Cornelison,
M. Crumrine,
A. Cukierman,
E. Denison,
M. Dierickx,
L. Duband,
M. Eiben,
S. Fatigoni,
J. P. Filippini,
S. Fliescher,
N. Goeckner-Wald,
D. C. Goldfinger
, et al. (62 additional authors not shown)
Abstract:
BICEP3 is a 520 mm aperture on-axis refracting telescope at the South Pole, which observes the polarization of the cosmic microwave background (CMB) at 95 GHz to search for the B-mode signal from inflationary gravitational waves. In addition to this main target, we have developed a low-elevation observation strategy to extend coverage of the Southern sky at the South Pole, where BICEP3 can quickly…
▽ More
BICEP3 is a 520 mm aperture on-axis refracting telescope at the South Pole, which observes the polarization of the cosmic microwave background (CMB) at 95 GHz to search for the B-mode signal from inflationary gravitational waves. In addition to this main target, we have developed a low-elevation observation strategy to extend coverage of the Southern sky at the South Pole, where BICEP3 can quickly achieve degree-scale E-mode measurements over a large area. An interesting E-mode measurement is probing a potential polarization anomaly around the CMB Cold Spot. During the austral summer seasons of 2018-19 and 2019-20, BICEP3 observed the sky with a flat mirror to redirect the beams to various low elevation ranges. The preliminary data analysis shows degree-scale E-modes measured with high signal-to-noise ratio.
△ Less
Submitted 17 December, 2020; v1 submitted 16 December, 2020;
originally announced December 2020.
-
Polarization Calibration of the BICEP3 CMB polarimeter at the South Pole
Authors:
J. Cornelison,
P. A. R. Ade,
Z. Ahmed,
M. Amiri,
D. Barkats,
R. Basu Thakur,
C. A. Bischoff,
J. J. Bock,
H. Boenish,
E. Bullock,
V. Buza,
J. R. Cheshire,
J. Connors,
M. Crumrine,
A. Cukierman,
E. Denison,
M. Dierickx,
L. Duband,
M. Eiben,
S. Fatigoni,
J. P. Filippini,
S. Fliescher,
N. Goeckner-Wald,
D. C. Goldfinger,
J. A. Grayson
, et al. (62 additional authors not shown)
Abstract:
The BICEP3 CMB Polarimeter is a small-aperture refracting telescope located at the South Pole and is specifically designed to search for the possible signature of inflationary gravitational waves in the Cosmic Microwave Background (CMB). The experiment measures polarization on the sky by differencing the signal of co-located, orthogonally polarized antennas coupled to Transition Edge Sensor (TES)…
▽ More
The BICEP3 CMB Polarimeter is a small-aperture refracting telescope located at the South Pole and is specifically designed to search for the possible signature of inflationary gravitational waves in the Cosmic Microwave Background (CMB). The experiment measures polarization on the sky by differencing the signal of co-located, orthogonally polarized antennas coupled to Transition Edge Sensor (TES) detectors. We present precise measurements of the absolute polarization response angles and polarization efficiencies for nearly all of BICEP3s $\sim800$ functioning polarization-sensitive detector pairs from calibration data taken in January 2018. Using a Rotating Polarized Source (RPS), we mapped polarization response for each detector over a full 360 degrees of source rotation and at multiple telescope boresight rotations from which per-pair polarization properties were estimated. In future work, these results will be used to constrain signals predicted by exotic physical models such as Cosmic Birefringence.
△ Less
Submitted 10 December, 2020;
originally announced December 2020.
-
Receiver development for BICEP Array, a next-generation CMB polarimeter at the South Pole
Authors:
L. Moncelsi,
P. A. R. Ade,
Z. Ahmed,
M. Amiri,
D. Barkats,
R. Basu Thakur,
C. A. Bischoff,
J. J. Bock,
V. Buza,
J. Cheshire,
J. Connors,
J. Cornelison,
M. Crumrine,
A. Cukierman,
E. V. Denison,
M. Dierickx,
L. Duband,
M. Eiben,
S. Fatigoni,
J. P. Filippini,
N. Goeckner-Wald,
D. C. Goldfinger,
J. Grayson,
P. Grimes,
G. Hall
, et al. (50 additional authors not shown)
Abstract:
A detection of curl-type ($B$-mode) polarization of the primary CMB would be direct evidence for the inflationary paradigm of the origin of the Universe. The BICEP/Keck Array (BK) program targets the degree angular scales, where the power from primordial $B$-mode polarization is expected to peak, with ever-increasing sensitivity and has published the most stringent constraints on inflation to date…
▽ More
A detection of curl-type ($B$-mode) polarization of the primary CMB would be direct evidence for the inflationary paradigm of the origin of the Universe. The BICEP/Keck Array (BK) program targets the degree angular scales, where the power from primordial $B$-mode polarization is expected to peak, with ever-increasing sensitivity and has published the most stringent constraints on inflation to date. BICEP Array (BA) is the Stage-3 instrument of the BK program and will comprise four BICEP3-class receivers observing at 30/40, 95, 150 and 220/270 GHz with a combined 32,000+ detectors; such wide frequency coverage is necessary for control of the Galactic foregrounds, which also produce degree-scale $B$-mode signal. The 30/40 GHz receiver is designed to constrain the synchrotron foreground and has begun observing at the South Pole in early 2020. By the end of a 3-year observing campaign, the full BICEP Array instrument is projected to reach $σ_r$ between 0.002 and 0.004, depending on foreground complexity and degree of removal of $B$-modes due to gravitational lensing (delensing). This paper presents an overview of the design, measured on-sky performance and calibration of the first BA receiver. We also give a preview of the added complexity in the time-domain multiplexed readout of the 7,776-detector 150 GHz receiver.
△ Less
Submitted 7 December, 2020;
originally announced December 2020.
-
EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference
Authors:
Thierry Tambe,
Coleman Hooper,
Lillian Pentecost,
Tianyu Jia,
En-Yu Yang,
Marco Donato,
Victor Sanh,
Paul N. Whatmough,
Alexander M. Rush,
David Brooks,
Gu-Yeon Wei
Abstract:
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks. However, their hefty computational and memory demands make them challenging to deploy to resource-constrained edge platforms with strict latency requirements. We present EdgeBERT, an in-depth algorithm-hardware co-design for latency-aware energy optimi…
▽ More
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks. However, their hefty computational and memory demands make them challenging to deploy to resource-constrained edge platforms with strict latency requirements. We present EdgeBERT, an in-depth algorithm-hardware co-design for latency-aware energy optimization for multi-task NLP. EdgeBERT employs entropy-based early exit predication in order to perform dynamic voltage-frequency scaling (DVFS), at a sentence granularity, for minimal energy consumption while adhering to a prescribed target latency. Computation and memory footprint overheads are further alleviated by employing a calibrated combination of adaptive attention span, selective network pruning, and floating-point quantization. Furthermore, in order to maximize the synergistic benefits of these algorithms in always-on and intermediate edge computing settings, we specialize a 12nm scalable hardware accelerator system, integrating a fast-switching low-dropout voltage regulator (LDO), an all-digital phase-locked loop (ADPLL), as well as, high-density embedded non-volatile memories (eNVMs) wherein the sparse floating-point bit encodings of the shared multi-task parameters are carefully stored. Altogether, latency-aware multi-task NLP inference acceleration on the EdgeBERT hardware system generates up to 7x, 2.5x, and 53x lower energy compared to the conventional inference without early stop**, the latency-unbounded early exit approach, and CUDA adaptations on an Nvidia Jetson Tegra X2 mobile GPU, respectively.
△ Less
Submitted 5 September, 2021; v1 submitted 28 November, 2020;
originally announced November 2020.
-
Topological entanglement entropy of interacting disordered zigzag graphene ribbons
Authors:
Young Heon Kim,
Hye Jeong Lee,
S. -R. Eric Yang
Abstract:
Interacting disordered zigzag graphene nanoribbons have fractional charges, are quasi-one-dimensional, and display an exponentially small gap. Our numerical computations showed that the topological entanglement entropy of these systems has a small finite but universal value, independent of the strength of the interaction and the disorder. The result that was obtained for the topological entangleme…
▽ More
Interacting disordered zigzag graphene nanoribbons have fractional charges, are quasi-one-dimensional, and display an exponentially small gap. Our numerical computations showed that the topological entanglement entropy of these systems has a small finite but universal value, independent of the strength of the interaction and the disorder. The result that was obtained for the topological entanglement entropy shows that the disorder-free phase is critical and becomes unstable in the presence of disorder.
△ Less
Submitted 21 March, 2021; v1 submitted 27 November, 2020;
originally announced November 2020.
-
Thermal Conductivities and Interfacial Thermal Conductance of 1- to 3-Layer WSe$_2$
Authors:
Elham Easy,
Yuan Gao,
Yingtao Wang,
Dingkai Yan,
Seyed M. Goushehgir,
Eui-Hyeok Yang,
Baoxing Xu,
Xian Zhang
Abstract:
Atomically thin materials such as graphene and semiconducting transition metal dichalcogenides have attracted extensive interest in recent years, motivating investigation into multiple properties. In this work, we used the opto thermal Raman technique to measure the thermal transport properties of a popular TMDC material WSe$_2$, in single atomic layer, bilayer, and trilayer forms.
Atomically thin materials such as graphene and semiconducting transition metal dichalcogenides have attracted extensive interest in recent years, motivating investigation into multiple properties. In this work, we used the opto thermal Raman technique to measure the thermal transport properties of a popular TMDC material WSe$_2$, in single atomic layer, bilayer, and trilayer forms.
△ Less
Submitted 7 March, 2021; v1 submitted 30 October, 2020;
originally announced November 2020.
-
GloFlow: Global Image Alignment for Creation of Whole Slide Images for Pathology from Video
Authors:
Viswesh Krishna,
Anirudh Joshi,
Philip L. Bulterys,
Eric Yang,
Andrew Y. Ng,
Pranav Rajpurkar
Abstract:
The application of deep learning to pathology assumes the existence of digital whole slide images of pathology slides. However, slide digitization is bottlenecked by the high cost of precise motor stages in slide scanners that are needed for position information used for slide stitching. We propose GloFlow, a two-stage method for creating a whole slide image using optical flow-based image registra…
▽ More
The application of deep learning to pathology assumes the existence of digital whole slide images of pathology slides. However, slide digitization is bottlenecked by the high cost of precise motor stages in slide scanners that are needed for position information used for slide stitching. We propose GloFlow, a two-stage method for creating a whole slide image using optical flow-based image registration with global alignment using a computationally tractable graph-pruning approach. In the first stage, we train an optical flow predictor to predict pairwise translations between successive video frames to approximate a stitch. In the second stage, this approximate stitch is used to create a neighborhood graph to produce a corrected stitch. On a simulated dataset of video scans of WSIs, we find that our method outperforms known approaches to slide-stitching, and stitches WSIs resembling those produced by slide scanners.
△ Less
Submitted 12 November, 2020; v1 submitted 28 October, 2020;
originally announced October 2020.
-
Attribution Preservation in Network Compression for Reliable Network Interpretation
Authors:
Geondo Park,
June Yong Yang,
Sung Ju Hwang,
Eunho Yang
Abstract:
Neural networks embedded in safety-sensitive applications such as self-driving cars and wearable health monitors rely on two important techniques: input attribution for hindsight analysis and network compression to reduce its size for edge-computing. In this paper, we show that these seemingly unrelated techniques conflict with each other as network compression deforms the produced attributions, w…
▽ More
Neural networks embedded in safety-sensitive applications such as self-driving cars and wearable health monitors rely on two important techniques: input attribution for hindsight analysis and network compression to reduce its size for edge-computing. In this paper, we show that these seemingly unrelated techniques conflict with each other as network compression deforms the produced attributions, which could lead to dire consequences for mission-critical applications. This phenomenon arises due to the fact that conventional network compression methods only preserve the predictions of the network while ignoring the quality of the attributions. To combat the attribution inconsistency problem, we present a framework that can preserve the attributions while compressing a network. By employing the Weighted Collapsed Attribution Matching regularizer, we match the attribution maps of the network being compressed to its pre-compression former self. We demonstrate the effectiveness of our algorithm both quantitatively and qualitatively on diverse compression methods.
△ Less
Submitted 28 October, 2020;
originally announced October 2020.
-
The NVIDIA PilotNet Experiments
Authors:
Mariusz Bojarski,
Chenyi Chen,
Joyjit Daw,
Alperen Değirmenci,
Joya Deri,
Bernhard Firner,
Beat Flepp,
Sachin Gogri,
Jesse Hong,
Lawrence Jackel,
Zhenhua Jia,
BJ Lee,
Bo Liu,
Fei Liu,
Urs Muller,
Samuel Payne,
Nischal Kota Nagendra Prasad,
Artem Provodin,
John Roach,
Timur Rvachov,
Neha Tadimeti,
Jesper van Engelen,
Haiguang Wen,
Eric Yang,
Zongyi Yang
Abstract:
Four years ago, an experimental system known as PilotNet became the first NVIDIA system to steer an autonomous car along a roadway. This system represents a departure from the classical approach for self-driving in which the process is manually decomposed into a series of modules, each performing a different task. In PilotNet, on the other hand, a single deep neural network (DNN) takes pixels as i…
▽ More
Four years ago, an experimental system known as PilotNet became the first NVIDIA system to steer an autonomous car along a roadway. This system represents a departure from the classical approach for self-driving in which the process is manually decomposed into a series of modules, each performing a different task. In PilotNet, on the other hand, a single deep neural network (DNN) takes pixels as input and produces a desired vehicle trajectory as output; there are no distinct internal modules connected by human-designed interfaces. We believe that handcrafted interfaces ultimately limit performance by restricting information flow through the system and that a learned approach, in combination with other artificial intelligence systems that add redundancy, will lead to better overall performing systems. We continue to conduct research toward that goal.
This document describes the PilotNet lane-kee** effort, carried out over the past five years by our NVIDIA PilotNet group in Holmdel, New Jersey. Here we present a snapshot of system status in mid-2020 and highlight some of the work done by the PilotNet group.
△ Less
Submitted 17 October, 2020;
originally announced October 2020.
-
Toward Cross-Lingual Definition Generation for Language Learners
Authors:
Cunliang Kong,
Liner Yang,
Tianzuo Zhang,
Qinan Fan,
Zhenghao Liu,
Yun Chen,
Erhong Yang
Abstract:
Generating dictionary definitions automatically can prove useful for language learners. However, it's still a challenging task of cross-lingual definition generation. In this work, we propose to generate definitions in English for words in various languages. To achieve this, we present a simple yet effective approach based on publicly available pretrained language models. In this approach, models…
▽ More
Generating dictionary definitions automatically can prove useful for language learners. However, it's still a challenging task of cross-lingual definition generation. In this work, we propose to generate definitions in English for words in various languages. To achieve this, we present a simple yet effective approach based on publicly available pretrained language models. In this approach, models can be directly applied to other languages after trained on the English dataset. We demonstrate the effectiveness of this approach on zero-shot definition generation. Experiments and manual analyses on newly constructed datasets show that our models have a strong cross-lingual transfer ability and can generate fluent English definitions for Chinese words. We further measure the lexical complexity of generated and reference definitions. The results show that the generated definitions are much simpler, which is more suitable for language learners.
△ Less
Submitted 12 October, 2020;
originally announced October 2020.
-
Symbolic Techniques for Deep Learning: Challenges and Opportunities
Authors:
Belinda Fang,
Elaine Yang,
Fei Xie
Abstract:
As the number of deep learning frameworks increase and certain ones gain popularity, it spurs the discussion of what methodologies are employed by these frameworks and the reasoning behind them. The goal of this survey is to study how symbolic techniques are utilized in deep learning. To do this, we look at some of the most popular deep learning frameworks being used today, including TensorFlow, K…
▽ More
As the number of deep learning frameworks increase and certain ones gain popularity, it spurs the discussion of what methodologies are employed by these frameworks and the reasoning behind them. The goal of this survey is to study how symbolic techniques are utilized in deep learning. To do this, we look at some of the most popular deep learning frameworks being used today, including TensorFlow, Keras, PyTorch, and MXNet. While these frameworks greatly differ from one another, many of them use symbolic techniques, whether it be symbolic execution, graphs, or programming. We focus this paper on symbolic techniques because they influence not only how neural networks are built but also the way in which they are executed.
Limitations of symbolic techniques have led to efforts in integrating symbolic and nonsymbolic aspects in deep learning, opening up new possibilities for symbolic techniques. For example, the Gluon API by Apache MXNet bridges the gap between imperative programming and symbolic execution through hybridization. Frameworks such as JANUS attempt to translate imperative programs into symbolic graphs, while approaches like DeepCheck attempt to use symbolic execution to analyze and validate imperative neural network programs. Symbolic analysis has also been paired with concrete execution in a technique called concolic testing in order to better test deep neural networks. Our study of these developments exemplifies just a few of the many ways the symbolic techniques employed by popular frameworks have the opportunity to be altered and utilized to achieve better performance.
△ Less
Submitted 1 October, 2020;
originally announced October 2020.
-
CMB-S4: Forecasting Constraints on Primordial Gravitational Waves
Authors:
CMB-S4 Collaboration,
:,
Kevork Abazajian,
Graeme E. Addison,
Peter Adshead,
Zeeshan Ahmed,
Daniel Akerib,
Aamir Ali,
Steven W. Allen,
David Alonso,
Marcelo Alvarez,
Mustafa A. Amin,
Adam Anderson,
Kam S. Arnold,
Peter Ashton,
Carlo Baccigalupi,
Debbie Bard,
Denis Barkats,
Darcy Barron,
Peter S. Barry,
James G. Bartlett,
Ritoban Basu Thakur,
Nicholas Battaglia,
Rachel Bean,
Chris Bebek
, et al. (212 additional authors not shown)
Abstract:
CMB-S4---the next-generation ground-based cosmic microwave background (CMB) experiment---is set to significantly advance the sensitivity of CMB measurements and enhance our understanding of the origin and evolution of the Universe, from the highest energies at the dawn of time through the growth of structure to the present day. Among the science cases pursued with CMB-S4, the quest for detecting p…
▽ More
CMB-S4---the next-generation ground-based cosmic microwave background (CMB) experiment---is set to significantly advance the sensitivity of CMB measurements and enhance our understanding of the origin and evolution of the Universe, from the highest energies at the dawn of time through the growth of structure to the present day. Among the science cases pursued with CMB-S4, the quest for detecting primordial gravitational waves is a central driver of the experimental design. This work details the development of a forecasting framework that includes a power-spectrum-based semi-analytic projection tool, targeted explicitly towards optimizing constraints on the tensor-to-scalar ratio, $r$, in the presence of Galactic foregrounds and gravitational lensing of the CMB. This framework is unique in its direct use of information from the achieved performance of current Stage 2--3 CMB experiments to robustly forecast the science reach of upcoming CMB-polarization endeavors. The methodology allows for rapid iteration over experimental configurations and offers a flexible way to optimize the design of future experiments given a desired scientific goal. To form a closed-loop process, we couple this semi-analytic tool with map-based validation studies, which allow for the injection of additional complexity and verification of our forecasts with several independent analysis methods. We document multiple rounds of forecasts for CMB-S4 using this process and the resulting establishment of the current reference design of the primordial gravitational-wave component of the Stage-4 experiment, optimized to achieve our science goals of detecting primordial gravitational waves for $r > 0.003$ at greater than $5σ$, or, in the absence of a detection, of reaching an upper limit of $r < 0.001$ at $95\%$ CL.
△ Less
Submitted 27 August, 2020;
originally announced August 2020.
-
Reconstructing Highly-twisted Magnetic Fields
Authors:
Victor M. Demcsak,
Michael S. Wheatland,
Alpha Mastrano,
Kai E. Yang
Abstract:
We investigate the ability of a nonlinear force-free code to calculate highly-twisted magnetic field configurations using the Titov and Démoulin (1999) equilibrium field as a test case. The code calculates a force-free field using boundary conditions on the normal component of the field in the lower boundary, and the normal component of the current density over one polarity of the field in the low…
▽ More
We investigate the ability of a nonlinear force-free code to calculate highly-twisted magnetic field configurations using the Titov and Démoulin (1999) equilibrium field as a test case. The code calculates a force-free field using boundary conditions on the normal component of the field in the lower boundary, and the normal component of the current density over one polarity of the field in the lower boundary. The code can also use the current density over both polarities of the field in the lower boundary as a boundary condition. We investigate the accuracy of the reconstructions with increasing flux-rope surface twist number $N_{\textrm{t}}$, achieved by decreasing the sub-surface line current in the model. We find that the code can approximately reconstruct the Titov-Démoulin field for surface twist numbers up to $N_{\textrm{t}} \approx 8.8$. This includes configurations with bald patches. We investigate the ability to recover bald patches, and more generally identify the limitations of our method for highly-twisted fields. The results have implications for our ability to reconstruct coronal magnetic fields from observational data.
△ Less
Submitted 7 August, 2020;
originally announced August 2020.
-
Bootstrap** Neural Processes
Authors:
Juho Lee,
Yoonho Lee,
Jungtaek Kim,
Eunho Yang,
Sung Ju Hwang,
Yee Whye Teh
Abstract:
Unlike in the traditional statistical modeling for which a user typically hand-specify a prior, Neural Processes (NPs) implicitly define a broad class of stochastic processes with neural networks. Given a data stream, NP learns a stochastic process that best describes the data. While this "data-driven" way of learning stochastic processes has proven to handle various types of data, NPs still rely…
▽ More
Unlike in the traditional statistical modeling for which a user typically hand-specify a prior, Neural Processes (NPs) implicitly define a broad class of stochastic processes with neural networks. Given a data stream, NP learns a stochastic process that best describes the data. While this "data-driven" way of learning stochastic processes has proven to handle various types of data, NPs still rely on an assumption that uncertainty in stochastic processes is modeled by a single latent variable, which potentially limits the flexibility. To this end, we propose the Boostrap** Neural Process (BNP), a novel extension of the NP family using the bootstrap. The bootstrap is a classical data-driven technique for estimating uncertainty, which allows BNP to learn the stochasticity in NPs without assuming a particular form. We demonstrate the efficacy of BNP on various types of data and its robustness in the presence of model-data mismatch.
△ Less
Submitted 27 October, 2020; v1 submitted 6 August, 2020;
originally announced August 2020.
-
Neural Complexity Measures
Authors:
Yoonho Lee,
Juho Lee,
Sung Ju Hwang,
Eunho Yang,
Seung** Choi
Abstract:
While various complexity measures for deep neural networks exist, specifying an appropriate measure capable of predicting and explaining generalization in deep networks has proven challenging. We propose Neural Complexity (NC), a meta-learning framework for predicting generalization. Our model learns a scalar complexity measure through interactions with many heterogeneous tasks in a data-driven wa…
▽ More
While various complexity measures for deep neural networks exist, specifying an appropriate measure capable of predicting and explaining generalization in deep networks has proven challenging. We propose Neural Complexity (NC), a meta-learning framework for predicting generalization. Our model learns a scalar complexity measure through interactions with many heterogeneous tasks in a data-driven way. The trained NC model can be added to the standard training loss to regularize any task learner in a standard supervised learning scenario. We contrast NC's approach against existing manually-designed complexity measures and other meta-learning models, and we validate NC's performance on multiple regression and classification tasks
△ Less
Submitted 23 October, 2020; v1 submitted 6 August, 2020;
originally announced August 2020.
-
GUIR at SemEval-2020 Task 12: Domain-Tuned Contextualized Models for Offensive Language Detection
Authors:
Sajad Sotudeh,
Tong Xiang,
Hao-Ren Yao,
Sean MacAvaney,
Eugene Yang,
Nazli Goharian,
Ophir Frieder
Abstract:
Offensive language detection is an important and challenging task in natural language processing. We present our submissions to the OffensEval 2020 shared task, which includes three English sub-tasks: identifying the presence of offensive language (Sub-task A), identifying the presence of target in offensive language (Sub-task B), and identifying the categories of the target (Sub-task C). Our expe…
▽ More
Offensive language detection is an important and challenging task in natural language processing. We present our submissions to the OffensEval 2020 shared task, which includes three English sub-tasks: identifying the presence of offensive language (Sub-task A), identifying the presence of target in offensive language (Sub-task B), and identifying the categories of the target (Sub-task C). Our experiments explore using a domain-tuned contextualized language model (namely, BERT) for this task. We also experiment with different components and configurations (e.g., a multi-view SVM) stacked upon BERT models for specific sub-tasks. Our submissions achieve F1 scores of 91.7% in Sub-task A, 66.5% in Sub-task B, and 63.2% in Sub-task C. We perform an ablation study which reveals that domain tuning considerably improves the classification performance. Furthermore, error analysis shows common misclassification errors made by our model and outlines research directions for future.
△ Less
Submitted 28 July, 2020;
originally announced July 2020.
-
Few-shot Visual Reasoning with Meta-analogical Contrastive Learning
Authors:
Youngsung Kim,
**woo Shin,
Eunho Yang,
Sung Ju Hwang
Abstract:
While humans can solve a visual puzzle that requires logical reasoning by observing only few samples, it would require training over large amount of data for state-of-the-art deep reasoning models to obtain similar performance on the same task. In this work, we propose to solve such a few-shot (or low-shot) visual reasoning problem, by resorting to analogical reasoning, which is a unique human abi…
▽ More
While humans can solve a visual puzzle that requires logical reasoning by observing only few samples, it would require training over large amount of data for state-of-the-art deep reasoning models to obtain similar performance on the same task. In this work, we propose to solve such a few-shot (or low-shot) visual reasoning problem, by resorting to analogical reasoning, which is a unique human ability to identify structural or relational similarity between two sets. Specifically, given training and test sets that contain the same type of visual reasoning problems, we extract the structural relationships between elements in both domains, and enforce them to be as similar as possible with analogical learning. We repeatedly apply this process with slightly modified queries of the same problem under the assumption that it does not affect the relationship between a training and a test sample. This allows to learn the relational similarity between the two samples in an effective manner even with a single pair of samples. We validate our method on RAVEN dataset, on which it outperforms state-of-the-art method, with larger gains when the training data is scarce. We further meta-learn our analogical contrastive learning model over the same tasks with diverse attributes, and show that it generalizes to the same visual reasoning problem with unseen attributes.
△ Less
Submitted 23 July, 2020;
originally announced July 2020.
-
Time-Reversal Symmetric ODE Network
Authors:
In Huh,
Eunho Yang,
Sung Ju Hwang,
**woo Shin
Abstract:
Time-reversal symmetry, which requires that the dynamics of a system should not change with the reversal of time axis, is a fundamental property that frequently holds in classical and quantum mechanics. In this paper, we propose a novel loss function that measures how well our ordinary differential equation (ODE) networks comply with this time-reversal symmetry; it is formally defined by the discr…
▽ More
Time-reversal symmetry, which requires that the dynamics of a system should not change with the reversal of time axis, is a fundamental property that frequently holds in classical and quantum mechanics. In this paper, we propose a novel loss function that measures how well our ordinary differential equation (ODE) networks comply with this time-reversal symmetry; it is formally defined by the discrepancy in the time evolutions of ODE networks between forward and backward dynamics. Then, we design a new framework, which we name as Time-Reversal Symmetric ODE Networks (TRS-ODENs), that can learn the dynamics of physical systems more sample-efficiently by learning with the proposed loss function. We evaluate TRS-ODENs on several classical dynamics, and find they can learn the desired time evolution from observed noisy and complex trajectories. We also show that, even for systems that do not possess the full time-reversal symmetry, TRS-ODENs can achieve better predictive performances over baselines.
△ Less
Submitted 6 January, 2021; v1 submitted 22 July, 2020;
originally announced July 2020.
-
Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning
Authors:
Jaehyung Kim,
Youngbum Hur,
Sejun Park,
Eunho Yang,
Sung Ju Hwang,
**woo Shin
Abstract:
While semi-supervised learning (SSL) has proven to be a promising way for leveraging unlabeled data when labeled data is scarce, the existing SSL algorithms typically assume that training class distributions are balanced. However, these SSL algorithms trained under imbalanced class distributions can severely suffer when generalizing to a balanced testing criterion, since they utilize biased pseudo…
▽ More
While semi-supervised learning (SSL) has proven to be a promising way for leveraging unlabeled data when labeled data is scarce, the existing SSL algorithms typically assume that training class distributions are balanced. However, these SSL algorithms trained under imbalanced class distributions can severely suffer when generalizing to a balanced testing criterion, since they utilize biased pseudo-labels of unlabeled data toward majority classes. To alleviate this issue, we formulate a convex optimization problem to softly refine the pseudo-labels generated from the biased model, and develop a simple algorithm, named Distribution Aligning Refinery of Pseudo-label (DARP) that solves it provably and efficiently. Under various class-imbalanced semi-supervised scenarios, we demonstrate the effectiveness of DARP and its compatibility with state-of-the-art SSL schemes.
△ Less
Submitted 13 September, 2021; v1 submitted 17 July, 2020;
originally announced July 2020.
-
A General Family of Stochastic Proximal Gradient Methods for Deep Learning
Authors:
Jihun Yun,
Aurelie C. Lozano,
Eunho Yang
Abstract:
We study the training of regularized neural networks where the regularizer can be non-smooth and non-convex. We propose a unified framework for stochastic proximal gradient descent, which we term ProxGen, that allows for arbitrary positive preconditioners and lower semi-continuous regularizers. Our framework encompasses standard stochastic proximal gradient methods without preconditioners as speci…
▽ More
We study the training of regularized neural networks where the regularizer can be non-smooth and non-convex. We propose a unified framework for stochastic proximal gradient descent, which we term ProxGen, that allows for arbitrary positive preconditioners and lower semi-continuous regularizers. Our framework encompasses standard stochastic proximal gradient methods without preconditioners as special cases, which have been extensively studied in various settings. Not only that, we present two important update rules beyond the well-known standard methods as a byproduct of our approach: (i) the first closed-form proximal map**s of $\ell_q$ regularization ($0 \leq q \leq 1$) for adaptive stochastic gradient methods, and (ii) a revised version of ProxQuant that fixes a caveat of the original approach for quantization-specific regularizers. We analyze the convergence of ProxGen and show that the whole family of ProxGen enjoys the same convergence rate as stochastic proximal gradient descent without preconditioners. We also empirically show the superiority of proximal methods compared to subgradient-based approaches via extensive experiments. Interestingly, our results indicate that proximal methods with non-convex regularizers are more effective than those with convex regularizers.
△ Less
Submitted 15 July, 2020;
originally announced July 2020.
-
Learning to Sample with Local and Global Contexts in Experience Replay Buffer
Authors:
Youngmin Oh,
Kimin Lee,
**woo Shin,
Eunho Yang,
Sung Ju Hwang
Abstract:
Experience replay, which enables the agents to remember and reuse experience from the past, has played a significant role in the success of off-policy reinforcement learning (RL). To utilize the experience replay efficiently, the existing sampling methods allow selecting out more meaningful experiences by imposing priorities on them based on certain metrics (e.g. TD-error). However, they may resul…
▽ More
Experience replay, which enables the agents to remember and reuse experience from the past, has played a significant role in the success of off-policy reinforcement learning (RL). To utilize the experience replay efficiently, the existing sampling methods allow selecting out more meaningful experiences by imposing priorities on them based on certain metrics (e.g. TD-error). However, they may result in sampling highly biased, redundant transitions since they compute the sampling rate for each transition independently, without consideration of its importance in relation to other transitions. In this paper, we aim to address the issue by proposing a new learning-based sampling method that can compute the relative importance of transition. To this end, we design a novel permutation-equivariant neural architecture that takes contexts from not only features of each transition (local) but also those of others (global) as inputs. We validate our framework, which we refer to as Neural Experience Replay Sampler (NERS), on multiple benchmark tasks for both continuous and discrete control tasks and show that it can significantly improve the performance of various off-policy RL methods. Further analysis confirms that the improvements of the sample efficiency indeed are due to sampling diverse and meaningful transitions by NERS that considers both local and global contexts.
△ Less
Submitted 7 April, 2021; v1 submitted 14 July, 2020;
originally announced July 2020.
-
A Revision of Neural Tangent Kernel-based Approaches for Neural Networks
Authors:
Kyung-Su Kim,
Aurélie C. Lozano,
Eunho Yang
Abstract:
Recent theoretical works based on the neural tangent kernel (NTK) have shed light on the optimization and generalization of over-parameterized networks, and partially bridge the gap between their practical success and classical learning theory. Especially, using the NTK-based approach, the following three representative results were obtained: (1) A training error bound was derived to show that net…
▽ More
Recent theoretical works based on the neural tangent kernel (NTK) have shed light on the optimization and generalization of over-parameterized networks, and partially bridge the gap between their practical success and classical learning theory. Especially, using the NTK-based approach, the following three representative results were obtained: (1) A training error bound was derived to show that networks can fit any finite training sample perfectly by reflecting a tighter characterization of training speed depending on the data complexity. (2) A generalization error bound invariant of network size was derived by using a data-dependent complexity measure (CMD). It follows from this CMD bound that networks can generalize arbitrary smooth functions. (3) A simple and analytic kernel function was derived as indeed equivalent to a fully-trained network. This kernel outperforms its corresponding network and the existing gold standard, Random Forests, in few shot learning. For all of these results to hold, the network scaling factor $κ$ should decrease w.r.t. sample size n. In this case of decreasing $κ$, however, we prove that the aforementioned results are surprisingly erroneous. It is because the output value of trained network decreases to zero when $κ$ decreases w.r.t. n. To solve this problem, we tighten key bounds by essentially removing $κ$-affected values. Our tighter analysis resolves the scaling problem and enables the validation of the original NTK-based results.
△ Less
Submitted 6 August, 2020; v1 submitted 2 July, 2020;
originally announced July 2020.
-
Compressed Sensing via Measurement-Conditional Generative Models
Authors:
Kyung-Su Kim,
Jung Hyun Lee,
Eunho Yang
Abstract:
A pre-trained generator has been frequently adopted in compressed sensing (CS) due to its ability to effectively estimate signals with the prior of NNs. In order to further refine the NN-based prior, we propose a framework that allows the generator to utilize additional information from a given measurement for prior learning, thereby yielding more accurate prediction for signals. As our framework…
▽ More
A pre-trained generator has been frequently adopted in compressed sensing (CS) due to its ability to effectively estimate signals with the prior of NNs. In order to further refine the NN-based prior, we propose a framework that allows the generator to utilize additional information from a given measurement for prior learning, thereby yielding more accurate prediction for signals. As our framework has a simple form, it is easily applied to existing CS methods using pre-trained generators. We demonstrate through extensive experiments that our framework exhibits uniformly superior performances by large margin and can reduce the reconstruction error up to an order of magnitude for some applications. We also explain the experimental success in theory by showing that our framework can slightly relax the stringent signal presence condition, which is required to guarantee the success of signal recovery.
△ Less
Submitted 2 November, 2020; v1 submitted 2 July, 2020;
originally announced July 2020.
-
Set Based Stochastic Subsampling
Authors:
Bruno Andreis,
Seanie Lee,
A. Tuan Nguyen,
Juho Lee,
Eunho Yang,
Sung Ju Hwang
Abstract:
Deep models are designed to operate on huge volumes of high dimensional data such as images. In order to reduce the volume of data these models must process, we propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an \textit{arbitrary} downstream task network (e.g. classifier). In the first stage, we efficiently subsample \textit{candidate elements} usin…
▽ More
Deep models are designed to operate on huge volumes of high dimensional data such as images. In order to reduce the volume of data these models must process, we propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an \textit{arbitrary} downstream task network (e.g. classifier). In the first stage, we efficiently subsample \textit{candidate elements} using conditionally independent Bernoulli random variables by capturing coarse grained global information using set encoding functions, followed by conditionally dependent autoregressive subsampling of the candidate elements using Categorical random variables by modeling pair-wise interactions using set attention networks in the second stage. We apply our method to feature and instance selection and show that it outperforms the relevant baselines under low subsampling rates on a variety of tasks including image classification, image reconstruction, function reconstruction and few-shot classification. Additionally, for nonparametric models such as Neural Processes that require to leverage the whole training data at inference time, we show that our method enhances the scalability of these models.
△ Less
Submitted 30 May, 2022; v1 submitted 25 June, 2020;
originally announced June 2020.
-
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Learning
Authors:
A. Tuan Nguyen,
Hyewon Jeong,
Eunho Yang,
Sung Ju Hwang
Abstract:
Although recent multi-task learning methods have shown to be effective in improving the generalization of deep neural networks, they should be used with caution for safety-critical applications, such as clinical risk prediction. This is because even if they achieve improved task-average performance, they may still yield degraded performance on individual tasks, which may be critical (e.g., predict…
▽ More
Although recent multi-task learning methods have shown to be effective in improving the generalization of deep neural networks, they should be used with caution for safety-critical applications, such as clinical risk prediction. This is because even if they achieve improved task-average performance, they may still yield degraded performance on individual tasks, which may be critical (e.g., prediction of mortality risk). Existing asymmetric multi-task learning methods tackle this negative transfer problem by performing knowledge transfer from tasks with low loss to tasks with high loss. However, using loss as a measure of reliability is risky since it could be a result of overfitting. In the case of time-series prediction tasks, knowledge learned for one task (e.g., predicting the sepsis onset) at a specific timestep may be useful for learning another task (e.g., prediction of mortality) at a later timestep, but lack of loss at each timestep makes it difficult to measure the reliability at each timestep. To capture such dynamically changing asymmetric relationships between tasks in time-series data, we propose a novel temporal asymmetric multi-task learning model that performs knowledge transfer from certain tasks/timesteps to relevant uncertain tasks, based on feature-level uncertainty. We validate our model on multiple clinical risk prediction tasks against various deep learning models for time-series prediction, which our model significantly outperforms, without any sign of negative transfer. Further qualitative analysis of learned knowledge graphs by clinicians shows that they are helpful in analyzing the predictions of the model. Our final code is available at https://github.com/anhtuan5696/TPAMTL.
△ Less
Submitted 18 February, 2021; v1 submitted 23 June, 2020;
originally announced June 2020.
-
Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive Meta-Pruning
Authors:
Minyoung Song,
Jaehong Yoon,
Eunho Yang,
Sung Ju Hwang
Abstract:
As deep neural networks are growing in size and being increasingly deployed to more resource-limited devices, there has been a recent surge of interest in network pruning methods, which aim to remove less important weights or activations of a given network. A common limitation of most existing pruning techniques, is that they require pre-training of the network at least once before pruning, and th…
▽ More
As deep neural networks are growing in size and being increasingly deployed to more resource-limited devices, there has been a recent surge of interest in network pruning methods, which aim to remove less important weights or activations of a given network. A common limitation of most existing pruning techniques, is that they require pre-training of the network at least once before pruning, and thus we can benefit from reduction in memory and computation only at the inference time. However, reducing the training cost of neural networks with rapid structural pruning may be beneficial either to minimize monetary cost with cloud computing or to enable on-device learning on a resource-limited device. Recently introduced random-weight pruning approaches can eliminate the needs of pretraining, but they often obtain suboptimal performance over conventional pruning techniques and also does not allow for faster training since they perform unstructured pruning. To overcome their limitations, we propose Set-based Task-Adaptive Meta Pruning (STAMP), which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset. To ensure maximum performance improvements on the target task, we meta-learn the mask generator over different subsets of the reference dataset, such that it can generalize well to any unseen datasets within a few gradient steps of training. We validate STAMP against recent advanced pruning methods on benchmark datasets, on which it not only obtains significantly improved compression rates over the baselines at similar accuracy, but also orders of magnitude faster training speed.
△ Less
Submitted 22 June, 2020;
originally announced June 2020.
-
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint Learning
Authors:
Wonyong Jeong,
Jaehong Yoon,
Eunho Yang,
Sung Ju Hwang
Abstract:
While existing federated learning approaches mostly require that clients have fully-labeled data to train on, in realistic settings, data obtained at the client-side often comes without any accompanying labels. Such deficiency of labels may result from either high labeling cost, or difficulty of annotation due to the requirement of expert knowledge. Thus the private data at each client may be eith…
▽ More
While existing federated learning approaches mostly require that clients have fully-labeled data to train on, in realistic settings, data obtained at the client-side often comes without any accompanying labels. Such deficiency of labels may result from either high labeling cost, or difficulty of annotation due to the requirement of expert knowledge. Thus the private data at each client may be either partly labeled, or completely unlabeled with labeled data being available only at the server, which leads us to a new practical federated learning problem, namely Federated Semi-Supervised Learning (FSSL). In this work, we study two essential scenarios of FSSL based on the location of the labeled data. The first scenario considers a conventional case where clients have both labeled and unlabeled data (labels-at-client), and the second scenario considers a more challenging case, where the labeled data is only available at the server (labels-at-server). We then propose a novel method to tackle the problems, which we refer to as Federated Matching (FedMatch). FedMatch improves upon naive combinations of federated learning and semi-supervised learning approaches with a new inter-client consistency loss and decomposition of the parameters for disjoint learning on labeled and unlabeled data. Through extensive experimental validation of our method in the two different scenarios, we show that our method outperforms both local semi-supervised learning and baselines which naively combine federated learning with semi-supervised learning. The code is available at https://github.com/wyjeong/FedMatch.
△ Less
Submitted 29 March, 2021; v1 submitted 22 June, 2020;
originally announced June 2020.
-
Cost-effective Interactive Attention Learning with Neural Attention Processes
Authors:
Jay Heo,
Junhyeon Park,
Hyewon Jeong,
Kwang Joon Kim,
Juho Lee,
Eunho Yang,
Sung Ju Hwang
Abstract:
We propose a novel interactive learning framework which we refer to as Interactive Attention Learning (IAL), in which the human supervisors interactively manipulate the allocated attentions, to correct the model's behavior by updating the attention-generating network. However, such a model is prone to overfitting due to scarcity of human annotations, and requires costly retraining. Moreover, it is…
▽ More
We propose a novel interactive learning framework which we refer to as Interactive Attention Learning (IAL), in which the human supervisors interactively manipulate the allocated attentions, to correct the model's behavior by updating the attention-generating network. However, such a model is prone to overfitting due to scarcity of human annotations, and requires costly retraining. Moreover, it is almost infeasible for the human annotators to examine attentions on tons of instances and features. We tackle these challenges by proposing a sample-efficient attention mechanism and a cost-effective reranking algorithm for instances and features. First, we propose Neural Attention Process (NAP), which is an attention generator that can update its behavior by incorporating new attention-level supervisions without any retraining. Secondly, we propose an algorithm which prioritizes the instances and the features by their negative impacts, such that the model can yield large improvements with minimal human feedback. We validate IAL on various time-series datasets from multiple domains (healthcare, real-estate, and computer vision) on which it significantly outperforms baselines with conventional attention mechanisms, or without cost-effective reranking, with substantially less retraining and human-model interaction cost.
△ Less
Submitted 9 June, 2020;
originally announced June 2020.
-
Thickness dependence of electronic and crystal structures in VO$_2$ ultrathin films: suppression of the collaborative Mott-Peierls transition
Authors:
D. Shiga,
B. E. Yang,
N. Hasegawa,
T. Kanda,
R. Tokunaga,
K. Yoshimatsu,
R. Yukawa,
M. Kitamura,
K. Horiba,
H. Kumigashira
Abstract:
Through ${in~situ}$ photoemission spectroscopy, we investigated the change in the electronic and crystal structures of dimensionality-controlled VO$_2$ films coherently grown on TiO$_2$(001) substrates. In the nanostructured films, the balance between the instabilities of a bandlike Peierls transition and a Mott transition is controlled as a function of thickness. The characteristic spectral chang…
▽ More
Through ${in~situ}$ photoemission spectroscopy, we investigated the change in the electronic and crystal structures of dimensionality-controlled VO$_2$ films coherently grown on TiO$_2$(001) substrates. In the nanostructured films, the balance between the instabilities of a bandlike Peierls transition and a Mott transition is controlled as a function of thickness. The characteristic spectral change associated with temperature-driven metal-insulator transition in VO$_2$ thick films holds down to 1.5 nm (roughly corresponding to five V atoms along the [001] direction), whereas VO$_2$ films of less than 1.0 nm exhibit insulating nature without V-V dimerization. These results suggest that the delicate balance between a Mott instability and a bandlike Peierls instability is modulated at a scale of a few nanometers by the dimensional crossover effects and confinement effects, which consequently induce the complicated electronic phase diagram of ultrathin VO$_2$ films.
△ Less
Submitted 1 May, 2020;
originally announced May 2020.
-
Topologically ordered zigzag nanoribbon: $e/2$ fractional edge charge, spin-charge separation, and ground state degeneracy
Authors:
S. -R. Eric Yang,
Min-Chul Cha,
Hye Jeong Lee,
Young Heon Kim
Abstract:
We numerically compute the density of states (DOS) of interacting disordered zigzag graphene nanoribbon (ZGNR) having midgap states showing $e/2$ fractional edge charges. The computed Hartree-Fock DOS is linear at the critical disorder strength where the gap vanishes. This implies an $I\mbox{-}V$ curve of $I\propto V^2$. Thus, $I\mbox{-}V$ curve measurement may yield evidence of fractional charges…
▽ More
We numerically compute the density of states (DOS) of interacting disordered zigzag graphene nanoribbon (ZGNR) having midgap states showing $e/2$ fractional edge charges. The computed Hartree-Fock DOS is linear at the critical disorder strength where the gap vanishes. This implies an $I\mbox{-}V$ curve of $I\propto V^2$. Thus, $I\mbox{-}V$ curve measurement may yield evidence of fractional charges in interacting disordered ZGNR. We show that even a weak disorder potential acts as a singular perturbation on zigzag edge electronic states, producing drastic changes in the energy spectrum. Spin-charge separation and fractional charges play a key role in the reconstruction of edge antiferromagnetism. Our results show that an interacting disordered ZGNR is a topologically ordered Mott-Anderson insulator.
△ Less
Submitted 22 July, 2020; v1 submitted 29 April, 2020;
originally announced April 2020.
-
Self-consistent Nonlinear Force-free Field Reconstruction from Weighted Boundary Conditions
Authors:
Alpha Mastrano,
Kai E. Yang,
Michael S. Wheatland
Abstract:
Vector magnetogram data are often used as photospheric boundary conditions for force-free coronal magnetic field extrapolations. In general, however, vector magnetogram data are not consistent with the force-free assumption. In this article, we demonstrate a way to deal with inconsistent boundary data, by generalizing the "self-consistency procedure" of Wheatland & Regnier (2009). In that procedur…
▽ More
Vector magnetogram data are often used as photospheric boundary conditions for force-free coronal magnetic field extrapolations. In general, however, vector magnetogram data are not consistent with the force-free assumption. In this article, we demonstrate a way to deal with inconsistent boundary data, by generalizing the "self-consistency procedure" of Wheatland & Regnier (2009). In that procedure, the inconsistency is resolved by an iterative process of constructing two solutions based on the values of the force-free parameter alpha on the two polarities of the field in the boundary (the P and N polarities), and taking uncertainty-weighted averages of the boundary alpha values in the P and N solutions. When the alpha values in the P and N regions are very different, the self-consistent solution may lose high alpha values from the boundary conditions. We show how, by altering the weighting of the uncertainties in the P or N boundary conditions, we can preserve high alpha values in the self-consistent solution. The weighted self-consistent extrapolation method is demonstrated on an analytic bipole field and applied to vector magnetogram data taken by the Helioseismic and Magnetic Imager (HMI) instrument for NOAA active region AR 12017 on 2014 March 29.
△ Less
Submitted 26 April, 2020;
originally announced April 2020.
-
Relative Magnetic Helicity Based on a Periodic Potential Field
Authors:
Kai E. Yang,
Michael S. Wheatland,
Stuart A. Gilchrist
Abstract:
Magnetic helicity is conserved under ideal magnetohydrodynamics (MHD) and quasi-conserved even under a resistive process. The standard definition for magnetic helicity cannot be applied directly to an open magnetic field in a volume, because it is gauge-dependent. Instead, the relative magnetic helicity is widely used. We find that the energy of a potential magnetic field in a rectangular domain w…
▽ More
Magnetic helicity is conserved under ideal magnetohydrodynamics (MHD) and quasi-conserved even under a resistive process. The standard definition for magnetic helicity cannot be applied directly to an open magnetic field in a volume, because it is gauge-dependent. Instead, the relative magnetic helicity is widely used. We find that the energy of a potential magnetic field in a rectangular domain with periodic lateral boundary conditions is less than that of the field with a fixed normal component on all six boundaries. To make use of this lower energy potential field in the analysis of relative magnetic helicity, we introducing a new definition for magnetic helicity for the magnetic field, which involves the periodic potential field. We apply this definition to a sequence of analytic solutions and a numerical simulation. The results show that our new gauge-invariant helicity is very close to the current-carrying part of the relative magnetic helicity of the original magnetic field. We find also that the ratio between the current-carrying helicity and the relative magnetic helicity for the original and our defined relative helicity show different behavior. It seems that the new helicity is more sensitive to the component of the field due to the electric current in the volume, which is the source for instabilities and solar eruptive phenomena.
△ Less
Submitted 18 April, 2020;
originally announced April 2020.
-
Targeted Attack for Deep Hashing based Retrieval
Authors:
Jiawang Bai,
Bin Chen,
Yiming Li,
Dongxian Wu,
Weiwei Guo,
Shu-tao Xia,
En-hui Yang
Abstract:
The deep hashing based retrieval method is widely adopted in large-scale image and video retrieval. However, there is little investigation on its security. In this paper, we propose a novel method, dubbed deep hashing targeted attack (DHTA), to study the targeted attack on such retrieval. Specifically, we first formulate the targeted attack as a point-to-set optimization, which minimizes the avera…
▽ More
The deep hashing based retrieval method is widely adopted in large-scale image and video retrieval. However, there is little investigation on its security. In this paper, we propose a novel method, dubbed deep hashing targeted attack (DHTA), to study the targeted attack on such retrieval. Specifically, we first formulate the targeted attack as a point-to-set optimization, which minimizes the average distance between the hash code of an adversarial example and those of a set of objects with the target label. Then we design a novel component-voting scheme to obtain an anchor code as the representative of the set of hash codes of objects with the target label, whose optimality guarantee is also theoretically derived. To balance the performance and perceptibility, we propose to minimize the Hamming distance between the hash code of the adversarial example and the anchor code under the $\ell^\infty$ restriction on the perturbation. Extensive experiments verify that DHTA is effective in attacking both deep hashing based image retrieval and video retrieval.
△ Less
Submitted 23 July, 2020; v1 submitted 15 April, 2020;
originally announced April 2020.
-
Federated Continual Learning with Weighted Inter-client Transfer
Authors:
Jaehong Yoon,
Wonyong Jeong,
Giwoong Lee,
Eunho Yang,
Sung Ju Hwang
Abstract:
There has been a surge of interest in continual learning and federated learning, both of which are important in deep neural networks in real-world scenarios. Yet little research has been done regarding the scenario where each client learns on a sequence of tasks from a private local data stream. This problem of federated continual learning poses new challenges to continual learning, such as utiliz…
▽ More
There has been a surge of interest in continual learning and federated learning, both of which are important in deep neural networks in real-world scenarios. Yet little research has been done regarding the scenario where each client learns on a sequence of tasks from a private local data stream. This problem of federated continual learning poses new challenges to continual learning, such as utilizing knowledge from other clients, while preventing interference from irrelevant knowledge. To resolve these issues, we propose a novel federated continual learning framework, Federated Weighted Inter-client Transfer (FedWeIT), which decomposes the network weights into global federated parameters and sparse task-specific parameters, and each client receives selective knowledge from other clients by taking a weighted combination of their task-specific parameters. FedWeIT minimizes interference between incompatible tasks, and also allows positive knowledge transfer across clients during learning. We validate our FedWeIT against existing federated learning and continual learning methods under varying degrees of task similarity across clients, and our model significantly outperforms them with a large reduction in the communication cost. Code is available at https://github.com/wyjeong/FedWeIT
△ Less
Submitted 14 June, 2021; v1 submitted 6 March, 2020;
originally announced March 2020.
-
Generalized Embedding Machines for Recommender Systems
Authors:
Enneng Yang,
Xin Xin,
Li Shen,
Guibing Guo
Abstract:
Factorization machine (FM) is an effective model for feature-based recommendation which utilizes inner product to capture second-order feature interactions. However, one of the major drawbacks of FM is that it couldn't capture complex high-order interaction signals. A common solution is to change the interaction function, such as stacking deep neural networks on the top of FM. In this work, we pro…
▽ More
Factorization machine (FM) is an effective model for feature-based recommendation which utilizes inner product to capture second-order feature interactions. However, one of the major drawbacks of FM is that it couldn't capture complex high-order interaction signals. A common solution is to change the interaction function, such as stacking deep neural networks on the top of FM. In this work, we propose an alternative approach to model high-order interaction signals in the embedding level, namely Generalized Embedding Machine (GEM). The embedding used in GEM encodes not only the information from the feature itself but also the information from other correlated features. Under such situation, the embedding becomes high-order. Then we can incorporate GEM with FM and even its advanced variants to perform feature interactions. More specifically, in this paper we utilize graph convolution networks (GCN) to generate high-order embeddings. We integrate GEM with several FM-based models and conduct extensive experiments on two real-world datasets. The results demonstrate significant improvement of GEM over corresponding baselines.
△ Less
Submitted 16 February, 2020;
originally announced February 2020.
-
Optical Design and Characterization of 40-GHz Detector and Module for the BICEP Array
Authors:
A. Soliman,
P. A. R. Ade,
Z. Ahmed,
M. Amiri,
D. Barkats,
R. Basu Thakur,
C. A. Bischoff,
J. J. Bock,
H. Boenish,
E. Bullock,
V. Buza,
J. Cheshire,
J. Connors,
J. Cornelison,
M. Crumrine,
A. Cukierman,
M. Dierickx,
L. Duband,
S. Fatigoni,
J. P. Filippini,
G. Hall,
M. Halpern,
S. Harrison,
S. Henderson,
S. R. Hildebrandt
, et al. (44 additional authors not shown)
Abstract:
Families of cosmic inflation models predict a primordial gravitational-wave background that imprints B-mode polarization pattern in the Cosmic Microwave Background (CMB). High sensitivity instruments with wide frequency coverage and well-controlled systematic errors are needed to constrain the faint B-mode amplitude. We have developed antenna-coupled Transition Edge Sensor (TES) arrays for high-se…
▽ More
Families of cosmic inflation models predict a primordial gravitational-wave background that imprints B-mode polarization pattern in the Cosmic Microwave Background (CMB). High sensitivity instruments with wide frequency coverage and well-controlled systematic errors are needed to constrain the faint B-mode amplitude. We have developed antenna-coupled Transition Edge Sensor (TES) arrays for high-sensitivity polarized CMB observations over a wide range of millimeter-wave bands. BICEP Array, the latest phase of the BICEP/Keck experiment series, is a multi-receiver experiment designed to search for inflationary B-mode polarization to a precision $σ$(r) between 0.002 and 0.004 after 3 full years of observations, depending on foreground complexity and the degree of lensing removal. We describe the electromagnetic design and measured performance of BICEP Array low-frequency 40-GHz detector, their packaging in focal plane modules, and optical characterization including efficiency and beam matching between polarization pairs. We summarize the design and simulated optical performance, including an approach to improve the optical efficiency due to mismatch losses. We report the measured beam maps for a new broad-band corrugation design to minimize beam differential ellipticity between polarization pairs caused by interactions with the module housing frame, which helps minimize polarized beam mismatch that converts CMB temperature to polarization ($T \rightarrow P$) anisotropy in CMB maps.
△ Less
Submitted 12 February, 2020;
originally announced February 2020.
-
Design and performance of the first BICEP Array receiver
Authors:
A. Schillaci,
P. A. R. Ade,
Z. Ahmed,
M. Amiri,
D. Barkats,
R. Basu Thakur,
C. A. Bischoff,
J. J. Bock,
H. Boenish,
E. Bullock,
V. Buza,
J. Cheshire,
J. Connors,
J. Cornelison,
M. Crumrine,
A. Cukierman,
M. Dierickx,
L. Duband,
S. Fatigoni,
J. P. Filippini,
G. Hall,
M. Halpern,
S. Harrison,
S. Henderson,
S. R. Hildebrandt
, et al. (44 additional authors not shown)
Abstract:
Branches of cosmic inflationary models, such as slow-roll inflation, predict a background of primordial gravitational waves that imprints a unique odd-parity B-mode pattern in the Cosmic Microwave Background (CMB) at amplitudes that are within experimental reach. The BICEP/Keck (BK) experiment targets this primordial signature, the amplitude of which is parameterized by the tensor-to-scalar ratio…
▽ More
Branches of cosmic inflationary models, such as slow-roll inflation, predict a background of primordial gravitational waves that imprints a unique odd-parity B-mode pattern in the Cosmic Microwave Background (CMB) at amplitudes that are within experimental reach. The BICEP/Keck (BK) experiment targets this primordial signature, the amplitude of which is parameterized by the tensor-to-scalar ratio r, by observing the polarized microwave sky through the exceptionally clean and stable atmosphere at the South Pole. B-mode measurements require an instrument with exquisite sensitivity, tight control of systematics, and wide frequency coverage to disentangle the primordial signal from the Galactic foregrounds. BICEP Array represents the most recent stage of the BK program, and comprises four BICEP3-class receivers observing at 30/40, 95, 150 and 220/270 GHz. The 30/40 GHz receiver will be deployed at the South Pole during the 2019/2020 austral summer. After 3 full years of observations with 30,000+ detectors, BICEP Array will measure primordial gravitational waves to a precision $σ(r)$ between 0.002 and 0.004, depending on foreground complexity and the degree of lensing removal. In this paper we give an overview of the instrument, highlighting the design features in terms of cryogenics, magnetic shielding, detectors and readout architecture as well as reporting on the integration and tests that are ongoing with the first receiver at 30/40 GHz.
△ Less
Submitted 12 February, 2020;
originally announced February 2020.
-
Characterizing the Sensitivity of 40 GHz TES Bolometers for BICEP Array
Authors:
C. Zhang,
P. A. R. Ade,
Z. Ahmed,
M. Amiri,
D. Barkats,
R. Basu Thakur,
C. A. Bischoff,
J. J. Bock,
H. Boenish,
E. Bullock,
V. Buza,
J. Cheshire,
J. Connors,
J. Cornelison,
M. Crumrine,
A. Cukierman,
M. Dierickx,
L. Duband,
S. Fatigoni,
J. P. Filippini,
G. Hall,
M. Halpern,
S. Harrison,
S. Henderson,
S. R. Hildebrandt
, et al. (44 additional authors not shown)
Abstract:
The BICEP/Keck (BK) experiment aims to detect the imprint of primordial gravitational waves in the Cosmic Microwave Background polarization, which would be direct evidence of the inflation theory. While the tensor-to-scalar ratio has been constrained to be r_0.05 < 0.06 at 95% c.l., further improvements on this upper limit are hindered by polarized Galactic foreground emissions and removal of grav…
▽ More
The BICEP/Keck (BK) experiment aims to detect the imprint of primordial gravitational waves in the Cosmic Microwave Background polarization, which would be direct evidence of the inflation theory. While the tensor-to-scalar ratio has been constrained to be r_0.05 < 0.06 at 95% c.l., further improvements on this upper limit are hindered by polarized Galactic foreground emissions and removal of gravitational lensing polarization. The 30/40 GHz receiver of the BICEP Array (BA) will deploy at the end of 2019 and will constrain the synchrotron foreground with unprecedented accuracy within the BK sky patch. We will show the design of the 30/40 GHz detectors and test results summarizing its performance. The low optical and atmospheric loading at these frequencies requires our TES detectors to have low saturation power in order to be photon-noise dominated. To realize the low thermal conductivity required from a 250 mK base temperature, we developed new bolometer leg designs. We will present the relevant measured detector parameters: G, Tc, Rn, Psat , and spectral bands, and noise spectra. We achieved a per bolometer NEP including all noise components of 2.07E-17 W/sqrt(Hz), including an anticipated photon noise level 1.54E-17 W/sqrt(Hz).
△ Less
Submitted 12 February, 2020;
originally announced February 2020.
-
Optical characterization of the Keck Array and BICEP3 CMB Polarimeters from 2016 to 2019
Authors:
The BICEP/Keck Collaboration,
:,
T. St Germaine,
P. A. R. Ade,
Z. Ahmed,
M. Amiri,
D. Barkats,
R. Basu Thakur,
C. A. Bischoff,
J. J. Bock,
H. Boenish,
E. Bullock,
V. Buza,
J. Cheshire,
J. Connors,
J. Cornelison,
M. Crumrine,
A. Cukierman,
M. Dierickx,
L. Duband,
S. Fatigoni,
J. P. Filippini,
S. Fliescher,
J. A. Grayson,
G. Hall
, et al. (50 additional authors not shown)
Abstract:
The BICEP/Keck experiment (BK) is a series of small-aperture refracting telescopes observing degree-scale Cosmic Microwave Background (CMB) polarization from the South Pole in search of a primordial $B$-mode signature. This $B$-mode signal arises from primordial gravitational waves interacting with the CMB, and has amplitude parametrized by the tensor-to-scalar ratio $r$. Since 2016, BICEP3 and th…
▽ More
The BICEP/Keck experiment (BK) is a series of small-aperture refracting telescopes observing degree-scale Cosmic Microwave Background (CMB) polarization from the South Pole in search of a primordial $B$-mode signature. This $B$-mode signal arises from primordial gravitational waves interacting with the CMB, and has amplitude parametrized by the tensor-to-scalar ratio $r$. Since 2016, BICEP3 and the Keck Array have been observing with 4800 total antenna-coupled transition-edge sensor detectors, with frequency bands spanning 95, 150, 220, and 270 GHz. Here we present the optical performance of these receivers from 2016 to 2019, including far-field beams measured in situ with an improved chopped thermal source and instrument spectral response measured with a field-deployable Fourier Transform Spectrometer. As a pair differencing experiment, an important systematic that must be controlled is the differential beam response between the co-located, orthogonally polarized detectors. We generate per-detector far-field beam maps and the corresponding differential beam mismatch that is used to estimate the temperature-to-polarization leakage in our CMB maps and to give feedback on detector and optics fabrication. The differential beam parameters presented here were estimated using improved low-level beam map analysis techniques, including efficient removal of non-Gaussian noise as well as improved spatial masking. These techniques help minimize systematic uncertainty in the beam analysis, with the goal of constraining the bias on $r$ induced by temperature-to-polarization leakage to be subdominant to the statistical uncertainty. This is essential as we progress to higher detector counts in the next generation of CMB experiments.
△ Less
Submitted 12 February, 2020;
originally announced February 2020.
-
Comprehensive understanding of water-driven graphene wrinkle life-cycle towards applications in flexible electronics: A computational study
Authors:
Jatin Kashyap,
Eui-Hyeok Yang,
Dibakar Datta
Abstract:
The presence of wrinkles in Graphene Nanoribbons (GNR) and other two-dimensional (2D) materials significantly alter their mechanical, electronic, optical properties, which can be either beneficial or detrimental. Experimentally, it has been observed that during the commonly used growth process of GNR, water molecules, sourced from ambient humidity, can be diffused in between GNR and the substrate.…
▽ More
The presence of wrinkles in Graphene Nanoribbons (GNR) and other two-dimensional (2D) materials significantly alter their mechanical, electronic, optical properties, which can be either beneficial or detrimental. Experimentally, it has been observed that during the commonly used growth process of GNR, water molecules, sourced from ambient humidity, can be diffused in between GNR and the substrate. The water diffusion causes wrinkle formation in GNR, which influences its properties. Furthermore, the diffused water eventually dries, creating the alteration not only in the geometry of Wrinkled Graphene Nanoribbons (WGNR) but also its features. Computational analysis of these phenomena can provide an atomistic-level understanding of the phenomena. Therefore, in this work, Molecular Dynamics (MD) simulations are performed to model the water diffusion and evaporation in between GNR and its substrate, and their effect on wrinkle formation and dynamics. Additionally, Density Functional Theory (DFT)-based analysis is used to characterize the difference in the electronic structure of WGNR caused by the change in wrinkle geometry. Our study reveals that the initially distributed wrinkles tend to coalesce to form a localized wrinkle whose configuration depends on the initial wrinkle geometry and the amount of diffused water. The wrinkle configuration changes upon drying, while it remains static until the complete drying. The movement of the localized wrinkle is the combination of three fundamental modes - bending, buckling, and sliding. The stress analysis reveals that the maximum stress is at the base of the wrinkle, and its magnitude is always below the plasticity limit. The DFT results provide insight into the potential of using the wrinkles to control the direction of electron flow for the applications in flexible electronics.
△ Less
Submitted 2 January, 2020;
originally announced January 2020.
-
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Authors:
Adam Paszke,
Sam Gross,
Francisco Massa,
Adam Lerer,
James Bradbury,
Gregory Chanan,
Trevor Killeen,
Zeming Lin,
Natalia Gimelshein,
Luca Antiga,
Alban Desmaison,
Andreas Köpf,
Edward Yang,
Zach DeVito,
Martin Raison,
Alykhan Tejani,
Sasank Chilamkurthy,
Benoit Steiner,
Lu Fang,
Junjie Bai,
Soumith Chintala
Abstract:
Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting…
▽ More
Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs.
In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.
We demonstrate the efficiency of individual subsystems, as well as the overall speed of PyTorch on several common benchmarks.
△ Less
Submitted 3 December, 2019;
originally announced December 2019.
-
Semi-Relaxed Quantization with DropBits: Training Low-Bit Neural Networks via Bit-wise Regularization
Authors:
Jung Hyun Lee,
Jihun Yun,
Sung Ju Hwang,
Eunho Yang
Abstract:
Network quantization, which aims to reduce the bit-lengths of the network weights and activations, has emerged as one of the key ingredients to reduce the size of neural networks for their deployments to resource-limited devices. In order to overcome the nature of transforming continuous activations and weights to discrete ones, recent study called Relaxed Quantization (RQ) [Louizos et al. 2019] s…
▽ More
Network quantization, which aims to reduce the bit-lengths of the network weights and activations, has emerged as one of the key ingredients to reduce the size of neural networks for their deployments to resource-limited devices. In order to overcome the nature of transforming continuous activations and weights to discrete ones, recent study called Relaxed Quantization (RQ) [Louizos et al. 2019] successfully employ the popular Gumbel-Softmax that allows this transformation with efficient gradient-based optimization. However, RQ with this Gumbel-Softmax relaxation still suffers from bias-variance trade-off depending on the temperature parameter of Gumbel-Softmax. To resolve the issue, we propose a novel method, Semi-Relaxed Quantization (SRQ) that uses multi-class straight-through estimator to effectively reduce the bias and variance, along with a new regularization technique, DropBits that replaces dropout regularization to randomly drop the bits instead of neurons to further reduce the bias of the multi-class straight-through estimator in SRQ. As a natural extension of DropBits, we further introduce the way of learning heterogeneous quantization levels to find proper bit-length for each layer using DropBits. We experimentally validate our method on various benchmark datasets and network architectures, and also support the quantized lottery ticket hypothesis: learning heterogeneous quantization levels outperforms the case using the same but fixed quantization levels from scratch.
△ Less
Submitted 7 September, 2021; v1 submitted 29 November, 2019;
originally announced November 2019.
-
Controlled edge dependent stacking of WS2-WS2 Homo- and WS2-WSe2 Hetero-structures: A Computational Study
Authors:
Kamalika Ghatak,
Kyung Nam Kang,
Eui-Hyeok Yang,
Dibakar Datta
Abstract:
Transition Metal Dichalcogenides (TMDs) are one of the most studied two-dimensional materials in the last 5-10 years due to their extremely interesting layer dependent properties. Despite the presence of vast research work on TMDs, the complex relationship between the electrochemical and physical properties make them the subject of further research. Our main objective is to provide a better insigh…
▽ More
Transition Metal Dichalcogenides (TMDs) are one of the most studied two-dimensional materials in the last 5-10 years due to their extremely interesting layer dependent properties. Despite the presence of vast research work on TMDs, the complex relationship between the electrochemical and physical properties make them the subject of further research. Our main objective is to provide a better insight into the electronic structure of TMDs. This will help us better understand the stability of the bilayer post-growth homo/hetero products based on the various edge-termination, and different stacking of the two layers. In this regard, two Tungsten (W) based non-periodic chalcogenide flakes (sulfides and selenides) were considered. An in-depth analysis of their different edge termination and stacking arrangement was performed via Density Functional Theory method using VASP software. Our finding indicates the preference of chalcogenide (c-) terminated structures over the metal (m-) terminated structures for both homo and hetero layers, and thus strongly suggests the nonexistence of the m-terminated TMDs bilayer products.
△ Less
Submitted 25 November, 2019;
originally announced November 2019.
-
Using natural language processing to extract health-related causality from Twitter messages
Authors:
Son Doan,
Elly W Yang,
Sameer Tilak,
Manabu Torii
Abstract:
Twitter messages (tweets) contain various types of information, which include health-related information. Analysis of health-related tweets would help us understand health conditions and concerns encountered in our daily life. In this work, we evaluated an approach to extracting causal relations from tweets using natural language processing (NLP) techniques. We focused on three health-related topi…
▽ More
Twitter messages (tweets) contain various types of information, which include health-related information. Analysis of health-related tweets would help us understand health conditions and concerns encountered in our daily life. In this work, we evaluated an approach to extracting causal relations from tweets using natural language processing (NLP) techniques. We focused on three health-related topics: stress", "insomnia", and "headache". We proposed a set of lexico-syntactic patterns based on dependency parser outputs to extract causal information. A large dataset consisting of 24 million tweets were used. The results show that our approach achieved an average precision between 74.59% and 92.27%. Analysis of extracted relations revealed interesting findings about health-related in Twitter.
△ Less
Submitted 15 November, 2019;
originally announced November 2019.
-
The Medium Energy (ME) X-ray telescope onboard the Insight-HXMT astronomy satellite
Authors:
Xuelei Cao,
Weichun Jiang,
Bin Meng,
Wanchang Zhang,
Tao Luo,
Sheng Yang,
Chunlei Zhang,
Yudong Gu,
Liang Sun,
Xiao**g Liu,
Jiawei Yang,
Xian Li,
Ying Tan,
Shaozhen Liu,
Yuanyuan Du,
Fangjun Lu,
Yupeng Xu,
Shuangnan Zhang,
Huanyu Wang,
Tipei Li,
Chengmo Zhang,
Xiangyang Wen,
Mingyu Ge,
Yupeng Zhou,
Shaolin Xiong
, et al. (12 additional authors not shown)
Abstract:
The Medium Energy X-ray telescope (ME) is one of the three main telescopes on board the Insight Hard X-ray Modulation Telescope (Insight-HXMT) astronomy satellite. ME contains 1728 pixels of Si-PIN detectors sensitive in 5-30 keV with a total geometrical area of 952 cm2. Application Specific Integrated Circuit (ASIC) chips, VA32TA6, is used to achieve low power consumption and low readout noise. T…
▽ More
The Medium Energy X-ray telescope (ME) is one of the three main telescopes on board the Insight Hard X-ray Modulation Telescope (Insight-HXMT) astronomy satellite. ME contains 1728 pixels of Si-PIN detectors sensitive in 5-30 keV with a total geometrical area of 952 cm2. Application Specific Integrated Circuit (ASIC) chips, VA32TA6, is used to achieve low power consumption and low readout noise. The collimators define three kinds of field of views (FOVs) for the telescope, 1°{\times}4°, 4°{\times}4°, and blocked ones. Combination of such FOVs can be used to estimate the in-orbit X-ray and particle background components. The energy resolution of ME is ~3 keV at 17.8 keV (FWHM) and the time resolution is 255 μs. In this paper, we introduce the design and performance of ME.
△ Less
Submitted 10 October, 2019;
originally announced October 2019.
-
Set-valued maps and some generalized metric spaces
Authors:
Er-Guang Yang
Abstract:
To give characterizations of monotonically countably paracompact spaces with set-valued maps, Yamazaki [22] introduced the notion of strictly increasing closed cover of a topological space with which the boundedness of a set-valued map was defined. In this paper, we show that most of generalized metric spaces such as stratifiable spaces, semi-metrizable spaces can be characterized with set-valued…
▽ More
To give characterizations of monotonically countably paracompact spaces with set-valued maps, Yamazaki [22] introduced the notion of strictly increasing closed cover of a topological space with which the boundedness of a set-valued map was defined. In this paper, we show that most of generalized metric spaces such as stratifiable spaces, semi-metrizable spaces can be characterized with set-valued maps with values into the family of all closed nonempty subsets of a space which has a strictly increasing closed cover. Moreover, as an application, we use the results obtained to give characterizations of the corresponding spaces with generalized real-valued functions.
△ Less
Submitted 7 October, 2019;
originally announced October 2019.
-
Observation of non-Abelian nodal links in photonics
Authors:
Erchan Yang,
Biao Yang,
Oubo You,
Hsun-chi Chan,
Peng Mao,
Qinghua Guo,
Shaojie Ma,
Lingbo Xia,
Dianyuan Fan,
Yuanjiang Xiang,
Shuang Zhang
Abstract:
In crystals, two bands may cross each other and form degeneracies along a closed loop in the three-dimensional momentum space, which is called nodal line. Nodal line degeneracy can be designed to exhibit various configurations such as nodal rings, chains, links and knots. Very recently, non-Abelian band topology was proposed in nodal link systems, where the nodal lines formed by consecutive pairs…
▽ More
In crystals, two bands may cross each other and form degeneracies along a closed loop in the three-dimensional momentum space, which is called nodal line. Nodal line degeneracy can be designed to exhibit various configurations such as nodal rings, chains, links and knots. Very recently, non-Abelian band topology was proposed in nodal link systems, where the nodal lines formed by consecutive pairs of bands exhibit interesting braiding structures and the underlying topological charges are described by quaternions. Here, we experimentally demonstrate non-Abelian nodal links in a biaxial hyperbolic metamaterial. The linked nodal lines threading through each other are formed by the crossings between three adjacent bands. Based on the non-Abelian charges, we further analyze various admissible nodal link configurations for the three-band system. On the interface between the metamaterial and air, surface bound states in the continuum (BICs) are observed, which serves as the symmetry-enforced derivative of drumhead surface states from the linked nodal lines. Our work serves as a direct observation of the global topological structures of nodal links, and provides a platform for studying non-Abelian topological charge in the momentum space.
△ Less
Submitted 20 March, 2020; v1 submitted 3 October, 2019;
originally announced October 2019.
-
Controllable Data Synthesis Method for Grammatical Error Correction
Authors:
Liner Yang,
Chencheng Wang,
Yun Chen,
Yong** Du,
Erhong Yang
Abstract:
Due to the lack of parallel data in current Grammatical Error Correction (GEC) task, models based on Sequence to Sequence framework cannot be adequately trained to obtain higher performance. We propose two data synthesis methods which can control the error rate and the ratio of error types on synthetic data. The first approach is to corrupt each word in the monolingual corpus with a fixed probabil…
▽ More
Due to the lack of parallel data in current Grammatical Error Correction (GEC) task, models based on Sequence to Sequence framework cannot be adequately trained to obtain higher performance. We propose two data synthesis methods which can control the error rate and the ratio of error types on synthetic data. The first approach is to corrupt each word in the monolingual corpus with a fixed probability, including replacement, insertion and deletion. Another approach is to train error generation models and further filtering the decoding results of the models. The experiments on different synthetic data show that the error rate is 40% and the ratio of error types is the same can improve the model performance better. Finally, we synthesize about 100 million data and achieve comparable performance as the state of the art, which uses twice as much data as we use.
△ Less
Submitted 24 December, 2021; v1 submitted 29 September, 2019;
originally announced September 2019.
-
AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference
Authors:
Thierry Tambe,
En-Yu Yang,
Zishen Wan,
Yuntian Deng,
Vijay Janapa Reddi,
Alexander Rush,
David Brooks,
Gu-Yeon Wei
Abstract:
Conventional hardware-friendly quantization methods, such as fixed-point or integer, tend to perform poorly at very low word sizes as their shrinking dynamic ranges cannot adequately capture the wide data distributions commonly seen in sequence transduction models. We present AdaptivFloat, a floating-point inspired number representation format for deep learning that dynamically maximizes and optim…
▽ More
Conventional hardware-friendly quantization methods, such as fixed-point or integer, tend to perform poorly at very low word sizes as their shrinking dynamic ranges cannot adequately capture the wide data distributions commonly seen in sequence transduction models. We present AdaptivFloat, a floating-point inspired number representation format for deep learning that dynamically maximizes and optimally clips its available dynamic range, at a layer granularity, in order to create faithful encoding of neural network parameters. AdaptivFloat consistently produces higher inference accuracies compared to block floating-point, uniform, IEEE-like float or posit encodings at very low precision ($\leq$ 8-bit) across a diverse set of state-of-the-art neural network topologies. And notably, AdaptivFloat is seen surpassing baseline FP32 performance by up to +0.3 in BLEU score and -0.75 in word error rate at weight bit widths that are $\leq$ 8-bit. Experimental results on a deep neural network (DNN) hardware accelerator, exploiting AdaptivFloat logic in its computational datapath, demonstrate per-operation energy and area that is 0.9$\times$ and 1.14$\times$, respectively, that of equivalent bit width integer-based accelerator variants.
△ Less
Submitted 11 February, 2020; v1 submitted 29 September, 2019;
originally announced September 2019.
-
Immunity of nanoscale magnetic tunnel junctions to ionizing radiation
Authors:
Eric Arturo Montoya,
Jen-Ru Chen,
Randy Ngelale,
Han Kyu Lee,
Hsin-Wei Tseng,
Lei Wan,
En Yang,
Patrick Braganca,
Ozdal Boyraz,
Nader Bagherzadeh,
Mikael Nilsson,
Ilya N. Krivorotov
Abstract:
Spin transfer torque magnetic random access memory (STT-MRAM) is a promising candidate for next generation memory as it is non-volatile, fast, and has unlimited endurance. Another important aspect of STT-MRAM is that its core component, the nanoscale magnetic tunneling junction (MTJ), is thought to be radiation hard, making it attractive for space and nuclear technology applications. However, stud…
▽ More
Spin transfer torque magnetic random access memory (STT-MRAM) is a promising candidate for next generation memory as it is non-volatile, fast, and has unlimited endurance. Another important aspect of STT-MRAM is that its core component, the nanoscale magnetic tunneling junction (MTJ), is thought to be radiation hard, making it attractive for space and nuclear technology applications. However, studies of the effects of high doses of ionizing radiation on STT-MRAM writing process are lacking. Here we report measurements of the impact of high doses of gamma and neutron radiation on nanoscale MTJs with perpendicular magnetic anistropy used in STT-MRAM. We characterize the tunneling magnetoresistance, the magnetic field switching, and the current-induced switching before and after irradiation. Our results demonstrate that all these key properties of nanoscale MTJs relevant to STT-MRAM applications are robust against ionizing radiation. Additionally, we perform experiments on thermally driven stochastic switching in the gamma ray environment. These results indicate that nanoscale MTJs are promising building blocks for radiation-hard non-von Neumann computing.
△ Less
Submitted 25 September, 2019;
originally announced September 2019.