-
Memory-Based Dual Gaussian Processes for Sequential Learning
Authors:
Paul E. Chang,
Prakhar Verma,
S. T. John,
Arno Solin,
Mohammad Emtiyaz Khan
Abstract:
Sequential learning with Gaussian processes (GPs) is challenging when access to past data is limited, for example, in continual and active learning. In such cases, errors can accumulate over time due to inaccuracies in the posterior, hyperparameters, and inducing points, making accurate learning challenging. Here, we present a method to keep all such errors in check using the recently proposed dua…
▽ More
Sequential learning with Gaussian processes (GPs) is challenging when access to past data is limited, for example, in continual and active learning. In such cases, errors can accumulate over time due to inaccuracies in the posterior, hyperparameters, and inducing points, making accurate learning challenging. Here, we present a method to keep all such errors in check using the recently proposed dual sparse variational GP. Our method enables accurate inference for generic likelihoods and improves learning by actively building and updating a memory of past data. We demonstrate its effectiveness in several applications involving Bayesian optimization, active learning, and continual learning.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
Adaptive Attractors: A Defense Strategy against ML Adversarial Collusion Attacks
Authors:
Jiyi Zhang,
Han Fang,
Ee-Chien Chang
Abstract:
In the seller-buyer setting on machine learning models, the seller generates different copies based on the original model and distributes them to different buyers, such that adversarial samples generated on one buyer's copy would likely not work on other copies. A known approach achieves this using attractor-based rewriter which injects different attractors to different copies. This induces differ…
▽ More
In the seller-buyer setting on machine learning models, the seller generates different copies based on the original model and distributes them to different buyers, such that adversarial samples generated on one buyer's copy would likely not work on other copies. A known approach achieves this using attractor-based rewriter which injects different attractors to different copies. This induces different adversarial regions in different copies, making adversarial samples generated on one copy not replicable on others. In this paper, we focus on a scenario where multiple malicious buyers collude to attack. We first give two formulations and conduct empirical studies to analyze effectiveness of collusion attack under different assumptions on the attacker's capabilities and properties of the attractors. We observe that existing attractor-based methods do not effectively mislead the colluders in the sense that adversarial samples found are influenced more by the original model instead of the attractors as number of colluders increases. Based on this observation, we propose using adaptive attractors whose weight is guided by a U-shape curve to cover the shortfalls. Experimentation results show that when using our approach, the attack success rate of a collusion attack converges to around 15% even when lots of copies are applied for collusion. In contrast, when using the existing attractor-based rewriter with fixed weight, the attack success rate increases linearly with the number of copies used for collusion.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
Authors:
Zechun Liu,
Barlas Oguz,
Changsheng Zhao,
Ernie Chang,
Pierre Stock,
Yashar Mehdad,
Yangyang Shi,
Raghuraman Krishnamoorthi,
Vikas Chandra
Abstract:
Several post-training quantization methods have been applied to large language models (LLMs), and have been shown to perform well down to 8-bits. We find that these methods break down at lower bit precision, and investigate quantization aware training for LLMs (LLM-QAT) to push quantization levels even further. We propose a data-free distillation method that leverages generations produced by the p…
▽ More
Several post-training quantization methods have been applied to large language models (LLMs), and have been shown to perform well down to 8-bits. We find that these methods break down at lower bit precision, and investigate quantization aware training for LLMs (LLM-QAT) to push quantization levels even further. We propose a data-free distillation method that leverages generations produced by the pre-trained model, which better preserves the original output distribution and allows quantizing any generative model independent of its training data, similar to post-training quantization methods. In addition to quantizing weights and activations, we also quantize the KV cache, which is critical for increasing throughput and support long sequence dependencies at current model sizes. We experiment with LLaMA models of sizes 7B, 13B, and 30B, at quantization levels down to 4-bits. We observe large improvements over training-free methods, especially in the low-bit settings.
△ Less
Submitted 29 May, 2023;
originally announced May 2023.
-
Finding Meaningful Distributions of ML Black-boxes under Forensic Investigation
Authors:
Jiyi Zhang,
Han Fang,
Hwee Kuan Lee,
Ee-Chien Chang
Abstract:
Given a poorly documented neural network model, we take the perspective of a forensic investigator who wants to find out the model's data domain (e.g. whether on face images or traffic signs). Although existing methods such as membership inference and model inversion can be used to uncover some information about an unknown model, they still require knowledge of the data domain to start with. In th…
▽ More
Given a poorly documented neural network model, we take the perspective of a forensic investigator who wants to find out the model's data domain (e.g. whether on face images or traffic signs). Although existing methods such as membership inference and model inversion can be used to uncover some information about an unknown model, they still require knowledge of the data domain to start with. In this paper, we propose solving this problem by leveraging on comprehensive corpus such as ImageNet to select a meaningful distribution that is close to the original training distribution and leads to high performance in follow-up investigations. The corpus comprises two components, a large dataset of samples and meta information such as hierarchical structure and textual information on the samples. Our goal is to select a set of samples from the corpus for the given model. The core of our method is an objective function that considers two criteria on the selected samples: the model functional properties (derived from the dataset), and semantics (derived from the metadata). We also give an algorithm to efficiently search the large space of all possible subsets w.r.t. the objective function. Experimentation results show that the proposed method is effective. For example, cloning a given model (originally trained with CIFAR-10) by using Caltech 101 can achieve 45.5% accuracy. By using datasets selected by our method, the accuracy is improved to 72.0%.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
Coronal Heating as Determined by the Solar Flare Frequency Distribution Obtained by Aggregating Case Studies
Authors:
James Paul Mason,
Alexandra Werth,
Colin G. West,
Allison A. Youngblood,
Donald L. Woodraska,
Courtney Peck,
Kevin Lacjak,
Florian G. Frick,
Moutamen Gabir,
Reema A. Alsinan,
Thomas Jacobsen,
Mohammad Alrubaie,
Kayla M. Chizmar,
Benjamin P. Lau,
Lizbeth Montoya Dominguez,
David Price,
Dylan R. Butler,
Connor J. Biron,
Nikita Feoktistov,
Kai Dewey,
N. E. Loomis,
Michal Bodzianowski,
Connor Kuybus,
Henry Dietrick,
Aubrey M. Wolfe
, et al. (977 additional authors not shown)
Abstract:
Flare frequency distributions represent a key approach to addressing one of the largest problems in solar and stellar physics: determining the mechanism that counter-intuitively heats coronae to temperatures that are orders of magnitude hotter than the corresponding photospheres. It is widely accepted that the magnetic field is responsible for the heating, but there are two competing mechanisms th…
▽ More
Flare frequency distributions represent a key approach to addressing one of the largest problems in solar and stellar physics: determining the mechanism that counter-intuitively heats coronae to temperatures that are orders of magnitude hotter than the corresponding photospheres. It is widely accepted that the magnetic field is responsible for the heating, but there are two competing mechanisms that could explain it: nanoflares or Alfvén waves. To date, neither can be directly observed. Nanoflares are, by definition, extremely small, but their aggregate energy release could represent a substantial heating mechanism, presuming they are sufficiently abundant. One way to test this presumption is via the flare frequency distribution, which describes how often flares of various energies occur. If the slope of the power law fitting the flare frequency distribution is above a critical threshold, $α=2$ as established in prior literature, then there should be a sufficient abundance of nanoflares to explain coronal heating. We performed $>$600 case studies of solar flares, made possible by an unprecedented number of data analysts via three semesters of an undergraduate physics laboratory course. This allowed us to include two crucial, but nontrivial, analysis methods: pre-flare baseline subtraction and computation of the flare energy, which requires determining flare start and stop times. We aggregated the results of these analyses into a statistical study to determine that $α= 1.63 \pm 0.03$. This is below the critical threshold, suggesting that Alfvén waves are an important driver of coronal heating.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
CoCoMo: Computational Consciousness Modeling for Generative and Ethical AI
Authors:
Edward Y. Chang
Abstract:
The CoCoMo model proposes a computational solution to the challenge of incorporating ethical and emotional intelligence considerations into AI systems, with the aim of creating AI agents that combine knowledge with compassion. To achieve this goal, CoCoMo prioritizes fairness, beneficence, non-maleficence, empathy, adaptability, transparency, and critical and exploratory thinking abilities. The mo…
▽ More
The CoCoMo model proposes a computational solution to the challenge of incorporating ethical and emotional intelligence considerations into AI systems, with the aim of creating AI agents that combine knowledge with compassion. To achieve this goal, CoCoMo prioritizes fairness, beneficence, non-maleficence, empathy, adaptability, transparency, and critical and exploratory thinking abilities. The model employs consciousness modeling, reinforcement learning, and prompt template formulation to support these desired traits. By incorporating ethical and emotional intelligence considerations, a generative AI model can potentially lead to improved fairness, reduced toxicity, and increased reliability.
△ Less
Submitted 8 April, 2023; v1 submitted 17 March, 2023;
originally announced April 2023.
-
The maximum refractive index of an atomic crystal $\unicode{x2013}$ from quantum optics to quantum chemistry
Authors:
Francesco Andreoli,
Bennet Windt,
Stefano Grava,
Gian Marcello Andolina,
Michael J. Gullans,
Alexander A. High,
Darrick E. Chang
Abstract:
All known optical materials have an index of refraction of order unity. Despite the tremendous implications that an ultrahigh index could have for optical technologies, little research has been done on why the refractive index of materials is universally small, and whether this observation is fundamental. Here, we investigate the index of an ordered arrangement of atoms, as a function of atomic de…
▽ More
All known optical materials have an index of refraction of order unity. Despite the tremendous implications that an ultrahigh index could have for optical technologies, little research has been done on why the refractive index of materials is universally small, and whether this observation is fundamental. Here, we investigate the index of an ordered arrangement of atoms, as a function of atomic density. At dilute densities, this problem falls into the realm of quantum optics, where atoms do not interact with one another except via the scattering of light. On the other hand, when the lattice constant becomes comparable to the Bohr radius, the electronic orbitals begin to overlap, giving rise to quantum chemistry. We present a minimal model that allows for a unifying theory of index spanning these two regimes. A key aspect is the treatment of multiple light scattering, which can be highly non-perturbative over a large density range, and which is the reason that conventional theories of the index break down. In the quantum optics regime, we show that ideal light-matter interactions can have a single-mode nature, allowing for a purely real refractive index that grows with density as $(N/V)^{1/3}$. At the onset of quantum chemistry, we show how two physical mechanisms (excited electron tunneling dynamics and the buildup of electronic density-density correlations) can open up inelastic or spatial multi-mode light scattering processes, which ultimately reduce the index back to order unity while introducing absorption. Around the onset of chemistry, our theory predicts that ultrahigh index ($n\sim 30$), low-loss materials could in principle be allowed by the laws of nature.
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
Prompting Large Language Models With the Socratic Method
Authors:
Edward Y. Chang
Abstract:
This paper presents a systematic approach to using the Socratic method in develo** prompt templates that effectively interact with large language models, including GPT-3. Various methods are examined, and those that yield precise answers and justifications while fostering creativity and imagination to enhance creative writing are identified. Techniques such as {\em definition}, {\em elenchus}, {…
▽ More
This paper presents a systematic approach to using the Socratic method in develo** prompt templates that effectively interact with large language models, including GPT-3. Various methods are examined, and those that yield precise answers and justifications while fostering creativity and imagination to enhance creative writing are identified. Techniques such as {\em definition}, {\em elenchus}, {\em dialectic}, {\em maieutics}, {\em generalization}, and {\em counterfactual reasoning} are discussed for their application in engineering prompt templates and their connections to inductive, deductive, and abductive reasoning. Through examples, the effectiveness of these dialogue and reasoning methods is demonstrated. An interesting observation is made that when the task's goal and user intent are conveyed to GPT-3 via ChatGPT before the start of a dialogue, the large language model seems to connect to the external context expressed in the intent and perform more effectively.
△ Less
Submitted 15 March, 2023; v1 submitted 17 February, 2023;
originally announced March 2023.
-
Sampling-based Exploration for Reinforcement Learning of Dexterous Manipulation
Authors:
Gagan Khandate,
Siqi Shang,
Eric T. Chang,
Tristan Luca Saidi,
Yang Liu,
Seth Matthew Dennis,
Johnson Adams,
Matei Ciocarlie
Abstract:
In this paper, we present a novel method for achieving dexterous manipulation of complex objects, while simultaneously securing the object without the use of passive support surfaces. We posit that a key difficulty for training such policies in a Reinforcement Learning framework is the difficulty of exploring the problem state space, as the accessible regions of this space form a complex structure…
▽ More
In this paper, we present a novel method for achieving dexterous manipulation of complex objects, while simultaneously securing the object without the use of passive support surfaces. We posit that a key difficulty for training such policies in a Reinforcement Learning framework is the difficulty of exploring the problem state space, as the accessible regions of this space form a complex structure along manifolds of a high-dimensional space. To address this challenge, we use two versions of the non-holonomic Rapidly-Exploring Random Trees algorithm; one version is more general, but requires explicit use of the environment's transition function, while the second version uses manipulation-specific kinematic constraints to attain better sample efficiency. In both cases, we use states found via sampling-based exploration to generate reset distributions that enable training control policies under full dynamic constraints via model-free Reinforcement Learning. We show that these policies are effective at manipulation problems of higher difficulty than previously shown, and also transfer effectively to real robots. Videos of the real-hand demonstrations can be found on the project website: https://sbrl.cs.columbia.edu/
△ Less
Submitted 23 May, 2023; v1 submitted 6 March, 2023;
originally announced March 2023.
-
On the Origin of Dust Structures in Protoplanetary Disks: Constraints from the Rossby Wave Instability
Authors:
Eonho Chang,
Andrew N. Youdin,
Leonardo Krapp
Abstract:
High resolution sub-mm observations of protoplanetary disks with ALMA have revealed that dust rings are common in large, bright disks. The leading explanation for these structures is dust-trap** in a local gas pressure maximum, caused by an embedded planet or other dynamical process. Independent of origin, such dust traps should be stable for many orbits to collect significant dust. However, rin…
▽ More
High resolution sub-mm observations of protoplanetary disks with ALMA have revealed that dust rings are common in large, bright disks. The leading explanation for these structures is dust-trap** in a local gas pressure maximum, caused by an embedded planet or other dynamical process. Independent of origin, such dust traps should be stable for many orbits to collect significant dust. However, ring-like perturbations in gas disks are also known to trigger the Rossby Wave Instability (RWI). We investigate whether axisymmetric pressure bumps can simultaneously trap dust and remain stable to the RWI. The answer depends on the thermodynamic properties of pressure bumps. For isothermal bumps, dust traps are RWI-stable for widths from ${\sim}1$ to several gas scale-heights. Adiabatic dust traps are stable over a smaller range of widths. For temperature bumps with no surface density component, however, all dust traps tend to be unstable. Smaller values of disk aspect ratio allow stable dust trap** at lower bump amplitudes and over a larger range of widths. We also report a new approximate criterion for RWI. Instability occurs when the radial oscillation frequency is $\lesssim75$\% of the Keplerian frequency, which differs from the well-known Lovelace necessary (but not sufficient) criterion for instability. Our results can guide ALMA observations of molecular gas by constraining the resolution and sensitivity needed to identify the pressure bumps thought to be responsible for dust rings.
△ Less
Submitted 6 March, 2023;
originally announced March 2023.
-
Metasurface-enhanced mid-infrared spectrochemical imaging of tissues
Authors:
S. Rosas,
K. A. Schoeller,
E. Chang,
H. Mei,
M. A. Kats,
K. W. Eliceiri,
X. Zhao,
F. Yesilkoy
Abstract:
Label-free and nondestructive mid-infrared vibrational hyperspectral imaging is emerging as an important ex-vivo tissue analysis tool, providing spatially resolved biochemical information critical to understanding physiological and pathological processes. However, the chemically complex and spatially heterogeneous composition of tissue specimens and the inherently weak interaction of infrared ligh…
▽ More
Label-free and nondestructive mid-infrared vibrational hyperspectral imaging is emerging as an important ex-vivo tissue analysis tool, providing spatially resolved biochemical information critical to understanding physiological and pathological processes. However, the chemically complex and spatially heterogeneous composition of tissue specimens and the inherently weak interaction of infrared light with biomolecules limit the analytical performance of infrared absorption spectroscopy. Here, we introduce an advanced mid-infrared spectrochemical tissue imaging modality using metasurfaces that support strong surface-localized electromagnetic fields to capture quantitative molecular maps of large-area murine brain-tissue sections. Our approach leverages polarization-multiplexed multi-resonance plasmonic metasurfaces to simultaneously detect many different functional biomolecules. The resulting surface-enhanced mid-infrared spectral imaging (SE-MIRSI) method eliminates the non-specific effects of bulk tissue morphology on the quantitative analysis of fingerprint spectra and improves the chemical selectivity. We show that the metasurface enhancement increases the retrieval of amide I and II absorption bands associated with secondary structures of proteins. Moreover, we demonstrate that plasmonic metasurfaces enhance the chemical contrast in infrared images and enable the detection of ultrathin tissue regions that are not otherwise visible to conventional mid-infrared spectral imaging. While we tested our approach on murine brain tissue sections, this chemical imaging method is well-suited for any tissue type, which significantly broadens the potential impacts of our method for both translational research and clinical histopathology.
△ Less
Submitted 26 April, 2023; v1 submitted 14 January, 2023;
originally announced January 2023.
-
Tracing the Origin of Adversarial Attack for Forensic Investigation and Deterrence
Authors:
Han Fang,
Jiyi Zhang,
Yupeng Qiu,
Ke Xu,
Chengfang Fang,
Ee-Chien Chang
Abstract:
Deep neural networks are vulnerable to adversarial attacks. In this paper, we take the role of investigators who want to trace the attack and identify the source, that is, the particular model which the adversarial examples are generated from. Techniques derived would aid forensic investigation of attack incidents and serve as deterrence to potential attacks. We consider the buyers-seller setting…
▽ More
Deep neural networks are vulnerable to adversarial attacks. In this paper, we take the role of investigators who want to trace the attack and identify the source, that is, the particular model which the adversarial examples are generated from. Techniques derived would aid forensic investigation of attack incidents and serve as deterrence to potential attacks. We consider the buyers-seller setting where a machine learning model is to be distributed to various buyers and each buyer receives a slightly different copy with same functionality. A malicious buyer generates adversarial examples from a particular copy $\mathcal{M}_i$ and uses them to attack other copies. From these adversarial examples, the investigator wants to identify the source $\mathcal{M}_i$. To address this problem, we propose a two-stage separate-and-trace framework. The model separation stage generates multiple copies of a model for a same classification task. This process injects unique characteristics into each copy so that adversarial examples generated have distinct and traceable features. We give a parallel structure which embeds a ``tracer'' in each copy, and a noise-sensitive training loss to achieve this goal. The tracing stage takes in adversarial examples and a few candidate models, and identifies the likely source. Based on the unique features induced by the noise-sensitive loss function, we could effectively trace the potential adversarial copy by considering the output logits from each tracer. Empirical results show that it is possible to trace the origin of the adversarial example and the mechanism can be applied to a wide range of architectures and datasets.
△ Less
Submitted 30 December, 2022;
originally announced January 2023.
-
Knowledge-Guided Data-Centric AI in Healthcare: Progress, Shortcomings, and Future Directions
Authors:
Edward Y. Chang
Abstract:
The success of deep learning is largely due to the availability of large amounts of training data that cover a wide range of examples of a particular concept or meaning. In the field of medicine, having a diverse set of training data on a particular disease can lead to the development of a model that is able to accurately predict the disease. However, despite the potential benefits, there have not…
▽ More
The success of deep learning is largely due to the availability of large amounts of training data that cover a wide range of examples of a particular concept or meaning. In the field of medicine, having a diverse set of training data on a particular disease can lead to the development of a model that is able to accurately predict the disease. However, despite the potential benefits, there have not been significant advances in image-based diagnosis due to a lack of high-quality annotated data. This article highlights the importance of using a data-centric approach to improve the quality of data representations, particularly in cases where the available data is limited. To address this "small-data" issue, we discuss four methods for generating and aggregating training data: data augmentation, transfer learning, federated learning, and GANs (generative adversarial networks). We also propose the use of knowledge-guided GANs to incorporate domain knowledge in the training data generation process. With the recent progress in large pre-trained language models, we believe it is possible to acquire high-quality knowledge that can be used to improve the effectiveness of knowledge-guided generative methods.
△ Less
Submitted 30 April, 2023; v1 submitted 27 December, 2022;
originally announced December 2022.
-
Purifier: Defending Data Inference Attacks via Transforming Confidence Scores
Authors:
Ziqi Yang,
Li** Wang,
Da Yang,
Jie Wan,
Ziming Zhao,
Ee-Chien Chang,
Fan Zhang,
Kui Ren
Abstract:
Neural networks are susceptible to data inference attacks such as the membership inference attack, the adversarial model inversion attack and the attribute inference attack, where the attacker could infer useful information such as the membership, the reconstruction or the sensitive attributes of a data sample from the confidence scores predicted by the target classifier. In this paper, we propose…
▽ More
Neural networks are susceptible to data inference attacks such as the membership inference attack, the adversarial model inversion attack and the attribute inference attack, where the attacker could infer useful information such as the membership, the reconstruction or the sensitive attributes of a data sample from the confidence scores predicted by the target classifier. In this paper, we propose a method, namely PURIFIER, to defend against membership inference attacks. It transforms the confidence score vectors predicted by the target classifier and makes purified confidence scores indistinguishable in individual shape, statistical distribution and prediction label between members and non-members. The experimental results show that PURIFIER helps defend membership inference attacks with high effectiveness and efficiency, outperforming previous defense methods, and also incurs negligible utility loss. Besides, our further experiments show that PURIFIER is also effective in defending adversarial model inversion attacks and attribute inference attacks. For example, the inversion error is raised about 4+ times on the Facescrub530 classifier, and the attribute inference accuracy drops significantly when PURIFIER is deployed in our experiment.
△ Less
Submitted 1 December, 2022;
originally announced December 2022.
-
Fantasizing with Dual GPs in Bayesian Optimization and Active Learning
Authors:
Paul E. Chang,
Prakhar Verma,
ST John,
Victor Picheny,
Henry Moss,
Arno Solin
Abstract:
Gaussian processes (GPs) are the main surrogate functions used for sequential modelling such as Bayesian Optimization and Active Learning. Their drawbacks are poor scaling with data and the need to run an optimization loop when using a non-Gaussian likelihood. In this paper, we focus on `fantasizing' batch acquisition functions that need the ability to condition on new fantasized data computationa…
▽ More
Gaussian processes (GPs) are the main surrogate functions used for sequential modelling such as Bayesian Optimization and Active Learning. Their drawbacks are poor scaling with data and the need to run an optimization loop when using a non-Gaussian likelihood. In this paper, we focus on `fantasizing' batch acquisition functions that need the ability to condition on new fantasized data computationally efficiently. By using a sparse Dual GP parameterization, we gain linear scaling with batch size as well as one-step updates for non-Gaussian likelihoods, thus extending sparse models to greedy batch fantasizing acquisition functions.
△ Less
Submitted 2 November, 2022;
originally announced November 2022.
-
Projecting Non-Fungible Token (NFT) Collections: A Contextual Generative Approach
Authors:
Wesley Joon-Wie Tann,
Akhil Vuputuri,
Ee-Chien Chang
Abstract:
Non-fungible tokens (NFTs) are digital assets stored on a blockchain representing real-world objects such as art or collectibles. An NFT collection comprises numerous tokens; each token can be transacted multiple times. It is a multibillion-dollar market where the number of collections has more than doubled in 2022. In this paper, we want to obtain a generative model that, given the early transact…
▽ More
Non-fungible tokens (NFTs) are digital assets stored on a blockchain representing real-world objects such as art or collectibles. An NFT collection comprises numerous tokens; each token can be transacted multiple times. It is a multibillion-dollar market where the number of collections has more than doubled in 2022. In this paper, we want to obtain a generative model that, given the early transactions history (first quarter Q1) of a newly minted collection, generates subsequent transactions (quarters Q2, Q3, Q4), where the generative model is trained using the transaction history of a few mature collections. The goal is to use the generated transactions to project the potential market value of this newly minted collection over the next few quarters. A technical challenge exists in that different collections have diverse characteristics, and the generative model should generate based on the appropriate "contexts" of the collection. Our method takes a two-step approach. First, it employs unsupervised learning on the early transactions to extract characteristics (which we call contexts) of NFT collections. Next, it generates future transactions of each token based on these contexts and the early transactions, projecting the target collection's potential market value. Comprehensive experiments demonstrate our contextual generative approach's NFT projection capabilities.
△ Less
Submitted 4 February, 2023; v1 submitted 14 October, 2022;
originally announced October 2022.
-
MDIA: A Benchmark for Multilingual Dialogue Generation in 46 Languages
Authors:
Qingyu Zhang,
Xiaoyu Shen,
Ernie Chang,
Jidong Ge,
Pengke Chen
Abstract:
Owing to the lack of corpora for low-resource languages, current works on dialogue generation have mainly focused on English. In this paper, we present mDIA, the first large-scale multilingual benchmark for dialogue generation across low- to high-resource languages. It covers real-life conversations in 46 languages across 19 language families. We present baseline results obtained by fine-tuning th…
▽ More
Owing to the lack of corpora for low-resource languages, current works on dialogue generation have mainly focused on English. In this paper, we present mDIA, the first large-scale multilingual benchmark for dialogue generation across low- to high-resource languages. It covers real-life conversations in 46 languages across 19 language families. We present baseline results obtained by fine-tuning the multilingual, non-dialogue-focused pre-trained model mT5 as well as English-centric, dialogue-focused pre-trained chatbot DialoGPT. The results show that mT5-based models perform better on sacreBLEU and BertScore but worse on diversity. Even though promising results are found in few-shot and zero-shot scenarios, there is a large gap between the generation quality in English and other languages. We hope that the release of mDIA could encourage more works on multilingual dialogue generation to promote language diversity.
△ Less
Submitted 27 August, 2022;
originally announced August 2022.
-
Unscented Kalman filter with stable embedding for simple, accurate and computationally efficient state estimation of systems on manifolds in Euclidean space
Authors:
Jae-Hyeon Park,
Dong Eui Chang
Abstract:
This paper proposes a simple, accurate and computationally efficient method to apply the ordinary unscented Kalman filter developed in Euclidean space to systems whose dynamics evolve on manifolds.We use the mathematical theory called stable embedding to make a variant of unscented Kalman filter that keeps state estimates in closeproximity to the manifold while exhibiting excellent estimation perf…
▽ More
This paper proposes a simple, accurate and computationally efficient method to apply the ordinary unscented Kalman filter developed in Euclidean space to systems whose dynamics evolve on manifolds.We use the mathematical theory called stable embedding to make a variant of unscented Kalman filter that keeps state estimates in closeproximity to the manifold while exhibiting excellent estimation performance. We confirm the performance of our devised filter by applying it to the satellite system model and comparing the performance with other unscented Kalman filters devised specifically for systems on manifolds. Our devised filter has a low estimation error, keeps the state estimates in close proximity to the manifold as expected, and consumes a minor amount of computation time. Also our devised filter is simple and easy to use because our filter directly employs the off-the-shelf standard unscented Kalman filter devised in Euclidean space without any particular manifold-structure-preserving discretization method or coordinate transformation.
△ Less
Submitted 30 November, 2022; v1 submitted 22 August, 2022;
originally announced August 2022.
-
Mixed Fault Tolerance Protocols with Trusted Execution Environment
Authors:
Mingyuan Gao,
Hung Dang,
Ee-Chien Chang,
Jialin Li
Abstract:
Blockchain systems are designed, built and operated in the presence of failures. There are two dominant failure models, namely crash fault and Byzantine fault. Byzantine fault tolerance (BFT) protocols offer stronger security guarantees, and thus are widely used in blockchain systems. However, their security guarantees come at a dear cost to their performance and scalability. Several works have im…
▽ More
Blockchain systems are designed, built and operated in the presence of failures. There are two dominant failure models, namely crash fault and Byzantine fault. Byzantine fault tolerance (BFT) protocols offer stronger security guarantees, and thus are widely used in blockchain systems. However, their security guarantees come at a dear cost to their performance and scalability. Several works have improved BFT protocols, and Trusted Execution Environment (TEE) has been shown to be an effective solution. However, existing such works typically assume that each participating node is equipped with TEE. For blockchain systems wherein participants typically have different hardware configurations, i.e., some nodes feature TEE while others do not, existing TEE-based BFT protocols are not applicable.
This work studies the setting wherein not all participating nodes feature TEE, under which we propose a new fault model called mixed fault. We explore a new approach to designing efficient distributed fault-tolerant protocols under the mixed fault model. In general, mixed fault tolerance (MFT) protocols assume a network of $n$ nodes, among which up to $f = \frac{n-2}{3}$ can be subject to mixed faults. We identify two key principles for designing efficient MFT protocols, namely, (i) prioritizing non-equivocating nodes in leading the protocol, and (ii) advocating the use of public-key cryptographic primitives that allow authenticated messages to be aggregated. We showcase these design principles by prescribing an MFT protocol, namely MRaft.
We implemented a prototype of MRaft using Intel SGX, integrated it into the CCF blockchain framework, conducted experiments, and showed that MFT protocols can obtain the same security guarantees as their BFT counterparts while still providing better performance (both transaction throughput and latency) and scalability.
△ Less
Submitted 3 August, 2022;
originally announced August 2022.
-
Nonlinear quantum logic with colliding graphene plasmons
Authors:
Giuseppe Calajò,
Philipp K. Jenke,
Lee A. Rozema,
Philip Walther,
Darrick E. Chang,
Joel D. Cox
Abstract:
Graphene has emerged as a promising platform to bring nonlinear quantum optics to the nanoscale, where a large intrinsic optical nonlinearity enables long-lived and actively tunable plasmon polaritons to strongly interact. Here we theoretically study the collision between two counter-propagating plasmons in a graphene nanoribbon, where transversal subwavelength confinement endows propagating plasm…
▽ More
Graphene has emerged as a promising platform to bring nonlinear quantum optics to the nanoscale, where a large intrinsic optical nonlinearity enables long-lived and actively tunable plasmon polaritons to strongly interact. Here we theoretically study the collision between two counter-propagating plasmons in a graphene nanoribbon, where transversal subwavelength confinement endows propagating plasmons with %large effective masses a flat band dispersion that enhances their interaction. This scenario presents interesting possibilities towards the implementation of multi-mode polaritonic gates that circumvent limitations imposed by the Shapiro no-go theorem for photonic gates in nonlinear optical fibers. As a paradigmatic example we demonstrate the feasibility of a high fidelity conditional Pi phase shift (CZ), where the gate performance is fundamentally limited only by the single-plasmon lifetime. These results open new exciting avenues towards quantum information and many-body applications with strongly-interacting polaritons.
△ Less
Submitted 18 March, 2023; v1 submitted 11 July, 2022;
originally announced July 2022.
-
De-END: Decoder-driven Watermarking Network
Authors:
Han Fang,
Zhaoyang Jia,
Yupeng Qiu,
Jiyi Zhang,
Weiming Zhang,
Ee-Chien Chang
Abstract:
With recent advances in machine learning, researchers are now able to solve traditional problems with new solutions. In the area of digital watermarking, deep-learning-based watermarking technique is being extensively studied. Most existing approaches adopt a similar encoder-driven scheme which we name END (Encoder-NoiseLayer-Decoder) architecture. In this paper, we revamp the architecture and cre…
▽ More
With recent advances in machine learning, researchers are now able to solve traditional problems with new solutions. In the area of digital watermarking, deep-learning-based watermarking technique is being extensively studied. Most existing approaches adopt a similar encoder-driven scheme which we name END (Encoder-NoiseLayer-Decoder) architecture. In this paper, we revamp the architecture and creatively design a decoder-driven watermarking network dubbed De-END which greatly outperforms the existing END-based methods. The motivation for designing De-END originated from the potential drawback we discovered in END architecture: The encoder may embed redundant features that are not necessary for decoding, limiting the performance of the whole network. We conducted a detailed analysis and found that such limitations are caused by unsatisfactory coupling between the encoder and decoder in END. De-END addresses such drawbacks by adopting a Decoder-Encoder-Noiselayer-Decoder architecture. In De-END, the host image is firstly processed by the decoder to generate a latent feature map instead of being directly fed into the encoder. This latent feature map is concatenated to the original watermark message and then processed by the encoder. This change in design is crucial as it makes the feature of encoder and decoder directly shared thus the encoder and decoder are better coupled. We conducted extensive experiments and the results show that this framework outperforms the existing state-of-the-art (SOTA) END-based deep learning watermarking both in visual quality and robustness. On the premise of the same decoder structure, the visual quality (measured by PSNR) of De-END improves by 1.6dB (45.16dB to 46.84dB), and extraction accuracy after JPEG compression (QF=50) distortion outperforms more than 4% (94.9% to 99.1%).
△ Less
Submitted 26 June, 2022;
originally announced June 2022.
-
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Authors:
Aarohi Srivastava,
Abhinav Rastogi,
Abhishek Rao,
Abu Awal Md Shoeb,
Abubakar Abid,
Adam Fisch,
Adam R. Brown,
Adam Santoro,
Aditya Gupta,
Adrià Garriga-Alonso,
Agnieszka Kluska,
Aitor Lewkowycz,
Akshat Agarwal,
Alethea Power,
Alex Ray,
Alex Warstadt,
Alexander W. Kocurek,
Ali Safaya,
Ali Tazarv,
Alice Xiang,
Alicia Parrish,
Allen Nie,
Aman Hussain,
Amanda Askell,
Amanda Dsouza
, et al. (426 additional authors not shown)
Abstract:
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur…
▽ More
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.
△ Less
Submitted 12 June, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
Forensic Artefact Discovery and Attribution from Android Cryptocurrency Wallet Applications
Authors:
Eugene Chang,
Paul Darcy,
Kim-Kwang Raymond Choo,
Nhien-An Le-Khac
Abstract:
Cryptocurrency has been (ab)used to purchase illicit goods and services such as drugs, weapons and child pornography (also referred to as child sexual abuse materials), and thus mobile devices (where cryptocurrency wallet applications are installed) are a potential source of evidence in a criminal investigation. Not surprisingly, there has been increased focus on the security of cryptocurrency wal…
▽ More
Cryptocurrency has been (ab)used to purchase illicit goods and services such as drugs, weapons and child pornography (also referred to as child sexual abuse materials), and thus mobile devices (where cryptocurrency wallet applications are installed) are a potential source of evidence in a criminal investigation. Not surprisingly, there has been increased focus on the security of cryptocurrency wallets, although forensic extraction and attribution of forensic artefacts from such wallets is understudied. In this paper, we examine Bitcoin and Dogecoin. The latter is increasingly popular partly due to endorsements from celebrities and being positioned as an introductory path to cryptocurrency for newcomers. Specifically, we demonstrate how one can acquire forensic artefacts from Android Bitcoin and Dogecoin cryptocurrency wallets, such as wallet IDs, transaction IDs, timestamp information, email addresses, cookies, and OAuth tokens.
△ Less
Submitted 29 May, 2022;
originally announced May 2022.
-
Low-Drift-Rate External Cavity Diode Laser
Authors:
Eddie H. Chang,
Jared Rivera,
Brian Bostwick,
Christian Schneider,
Peter Yu,
Eric R. Hudson
Abstract:
We present the design, construction, and simulation of a simple, low-cost external cavity diode laser with a measured free-running frequency drift rate of 1.4(1)~MHz/h at 852 nm. This performance is achieved via a compact, nearly monolithic aluminum structure to minimize temperature gradients across the laser cavity. We present thermal finite element method simulations which quantify the effects o…
▽ More
We present the design, construction, and simulation of a simple, low-cost external cavity diode laser with a measured free-running frequency drift rate of 1.4(1)~MHz/h at 852 nm. This performance is achieved via a compact, nearly monolithic aluminum structure to minimize temperature gradients across the laser cavity. We present thermal finite element method simulations which quantify the effects of temperature gradients, and suggest that the drift rate is likely limited by laser-diode aging.
△ Less
Submitted 25 May, 2022;
originally announced May 2022.
-
3D Segmentation Guided Style-based Generative Adversarial Networks for PET Synthesis
Authors:
Yang Zhou,
Zhiwen Yang,
Hui Zhang,
Eric I-Chao Chang,
Yubo Fan,
Yan Xu
Abstract:
Potential radioactive hazards in full-dose positron emission tomography (PET) imaging remain a concern, whereas the quality of low-dose images is never desirable for clinical use. So it is of great interest to translate low-dose PET images into full-dose. Previous studies based on deep learning methods usually directly extract hierarchical features for reconstruction. We notice that the importance…
▽ More
Potential radioactive hazards in full-dose positron emission tomography (PET) imaging remain a concern, whereas the quality of low-dose images is never desirable for clinical use. So it is of great interest to translate low-dose PET images into full-dose. Previous studies based on deep learning methods usually directly extract hierarchical features for reconstruction. We notice that the importance of each feature is different and they should be weighted dissimilarly so that tiny information can be captured by the neural network. Furthermore, the synthesis on some regions of interest is important in some applications. Here we propose a novel segmentation guided style-based generative adversarial network (SGSGAN) for PET synthesis. (1) We put forward a style-based generator employing style modulation, which specifically controls the hierarchical features in the translation process, to generate images with more realistic textures. (2) We adopt a task-driven strategy that couples a segmentation task with a generative adversarial network (GAN) framework to improve the translation performance. Extensive experiments show the superiority of our overall framework in PET synthesis, especially on those regions of interest.
△ Less
Submitted 18 May, 2022;
originally announced May 2022.
-
Transformer based multiple instance learning for weakly supervised histopathology image segmentation
Authors:
Ziniu Qian,
Kailu Li,
Maode Lai,
Eric I-Chao Chang,
Bingzheng Wei,
Yubo Fan,
Yan Xu
Abstract:
Hispathological image segmentation algorithms play a critical role in computer aided diagnosis technology. The development of weakly supervised segmentation algorithm alleviates the problem of medical image annotation that it is time-consuming and labor-intensive. As a subset of weakly supervised learning, Multiple Instance Learning (MIL) has been proven to be effective in segmentation. However, t…
▽ More
Hispathological image segmentation algorithms play a critical role in computer aided diagnosis technology. The development of weakly supervised segmentation algorithm alleviates the problem of medical image annotation that it is time-consuming and labor-intensive. As a subset of weakly supervised learning, Multiple Instance Learning (MIL) has been proven to be effective in segmentation. However, there is a lack of related information between instances in MIL, which limits the further improvement of segmentation performance. In this paper, we propose a novel weakly supervised method for pixel-level segmentation in histopathology images, which introduces Transformer into the MIL framework to capture global or long-range dependencies. The multi-head self-attention in the Transformer establishes the relationship between instances, which solves the shortcoming that instances are independent of each other in MIL. In addition, deep supervision is introduced to overcome the limitation of annotations in weakly supervised methods and make the better utilization of hierarchical information. The state-of-the-art results on the colon cancer dataset demonstrate the superiority of the proposed method compared with other weakly supervised methods. It is worth believing that there is a potential of our approach for various applications in medical images.
△ Less
Submitted 18 May, 2022;
originally announced May 2022.
-
Feedback Gradient Descent: Efficient and Stable Optimization with Orthogonality for DNNs
Authors:
Fanchen Bu,
Dong Eui Chang
Abstract:
The optimization with orthogonality has been shown useful in training deep neural networks (DNNs). To impose orthogonality on DNNs, both computational efficiency and stability are important. However, existing methods utilizing Riemannian optimization or hard constraints can only ensure stability while those using soft constraints can only improve efficiency. In this paper, we propose a novel metho…
▽ More
The optimization with orthogonality has been shown useful in training deep neural networks (DNNs). To impose orthogonality on DNNs, both computational efficiency and stability are important. However, existing methods utilizing Riemannian optimization or hard constraints can only ensure stability while those using soft constraints can only improve efficiency. In this paper, we propose a novel method, named Feedback Gradient Descent (FGD), to our knowledge, the first work showing high efficiency and stability simultaneously. FGD induces orthogonality based on the simple yet indispensable Euler discretization of a continuous-time dynamical system on the tangent bundle of the Stiefel manifold. In particular, inspired by a numerical integration method on manifolds called Feedback Integrators, we propose to instantiate it on the tangent bundle of the Stiefel manifold for the first time. In the extensive image classification experiments, FGD comprehensively outperforms the existing state-of-the-art methods in terms of accuracy, efficiency, and stability.
△ Less
Submitted 11 May, 2022;
originally announced May 2022.
-
A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation
Authors:
David Ifeoluwa Adelani,
Jesujoba Oluwadara Alabi,
Angela Fan,
Julia Kreutzer,
Xiaoyu Shen,
Machel Reid,
Dana Ruiter,
Dietrich Klakow,
Peter Nabende,
Ernie Chang,
Tajuddeen Gwadabe,
Freshia Sackey,
Bonaventure F. P. Dossou,
Chris Chinenye Emezue,
Colin Leong,
Michael Beukman,
Shamsuddeen Hassan Muhammad,
Guyo Dub Jarso,
Oreen Yousuf,
Andre Niyongabo Rubungo,
Gilles Hacheme,
Eric Peter Wairagala,
Muhammad Umair Nasir,
Benjamin Ayoade Ajibade,
Tunde Oluwaseyi Ajayi
, et al. (20 additional authors not shown)
Abstract:
Recent advances in the pre-training of language models leverage large-scale datasets to create multilingual models. However, low-resource languages are mostly left out in these datasets. This is primarily because many widely spoken languages are not well represented on the web and therefore excluded from the large-scale crawls used to create datasets. Furthermore, downstream users of these models…
▽ More
Recent advances in the pre-training of language models leverage large-scale datasets to create multilingual models. However, low-resource languages are mostly left out in these datasets. This is primarily because many widely spoken languages are not well represented on the web and therefore excluded from the large-scale crawls used to create datasets. Furthermore, downstream users of these models are restricted to the selection of languages originally chosen for pre-training. This work investigates how to optimally leverage existing pre-trained models to create low-resource translation systems for 16 African languages. We focus on two questions: 1) How can pre-trained models be used for languages not included in the initial pre-training? and 2) How can the resulting translation models effectively transfer to new domains? To answer these questions, we create a new African news corpus covering 16 languages, of which eight languages are not part of any existing evaluation dataset. We demonstrate that the most effective strategy for transferring both to additional languages and to additional domains is to fine-tune large pre-trained models on small quantities of high-quality translation data.
△ Less
Submitted 22 August, 2022; v1 submitted 4 May, 2022;
originally announced May 2022.
-
A model for malaria treatment evaluation in the presence of multiple species
Authors:
Camelia R. Walker,
Roslyn I. Hickson,
Edmond Chang,
Pengby Ngor,
Siv Sovannaroth,
Julie A. Simpson,
David J. Price,
James M. McCaw,
Ric N. Price,
Jennifer A. Flegg,
Angela Devine
Abstract:
Plasmodium (P.) falciparum and P. vivax are the two most common causes of malaria. While the majority of deaths and severe morbidity are due to P. falciparum, P. vivax poses a greater challenge to eliminating malaria outside of Africa due to its ability to form latent liver stage parasites (hypnozoites), which can cause relapsing episodes within an individual patient. In areas where P. falciparum…
▽ More
Plasmodium (P.) falciparum and P. vivax are the two most common causes of malaria. While the majority of deaths and severe morbidity are due to P. falciparum, P. vivax poses a greater challenge to eliminating malaria outside of Africa due to its ability to form latent liver stage parasites (hypnozoites), which can cause relapsing episodes within an individual patient. In areas where P. falciparum and P. vivax are co-endemic, individuals can carry parasites of both species simultaneously. These mixed infections complicate dynamics in several ways; treatment of mixed infections will simultaneously affect both species, P. falciparum can mask the detection of P. vivax, and it has been hypothesised that clearing P. falciparum may trigger a relapse of dormant P. vivax. When mixed infections are treated for only blood-stage parasites, patients are at risk of relapse infections due to P. vivax hypnozoites.
We present a stochastic mathematical model that captures interactions between P. falciparum and P. vivax, and incorporates both standard schizontocidal treatment (which targets blood-stage parasites) and radical treatment (which additionally targets liver-stage parasites). We apply this model to assess the implications of different treatment coverage of radical cure for mixed and P. vivax infections and a so-called "unified radical cure" treatment strategy for P. falciparum, P. vivax and mixed infections. We find that a unified radical cure strategy, with G6PD screening, leads to a substantially lower incidence of malaria cases and deaths overall. We perform a one-way sensitivity analysis to highlight important model parameters.
△ Less
Submitted 21 July, 2022; v1 submitted 3 May, 2022;
originally announced May 2022.
-
Scalable Private Decision Tree Evaluation with Sublinear Communication
Authors:
Jianli Bai,
Xiangfu Song,
Shujie Cui,
Ee-Chien Chang,
Giovanni Russello
Abstract:
Private decision tree evaluation (PDTE) allows a decision tree holder to run a secure protocol with a feature provider. By running the protocol, the feature provider will learn a classification result. Nothing more is revealed to either party. In most existing PDTE protocols, the required communication grows exponentially with the tree's depth $d$, which is highly inefficient for large trees. This…
▽ More
Private decision tree evaluation (PDTE) allows a decision tree holder to run a secure protocol with a feature provider. By running the protocol, the feature provider will learn a classification result. Nothing more is revealed to either party. In most existing PDTE protocols, the required communication grows exponentially with the tree's depth $d$, which is highly inefficient for large trees. This shortcoming motivated us to design a sublinear PDTE protocol with $O(d)$ communication complexity. The core of our construction is a shared oblivious selection (SOS) functionality, allowing two parties to perform a secret-shared oblivious read operation from an array. We provide two SOS protocols, both of which achieve sublinear communication and propose optimizations to further improve their efficiency. Our sublinear PDTE protocol is based on the proposed SOS functionality and we prove its security under a semi-honest adversary. We compare our protocol with the state-of-the-art, in terms of communication and computation, under various network settings. The performance evaluation shows that our protocol is practical and more scalable over large trees than existing solutions.
△ Less
Submitted 2 May, 2022;
originally announced May 2022.
-
Differential Cost Analysis with Simultaneous Potentials and Anti-potentials
Authors:
Đorđe Žikelić,
Bor-Yuh Evan Chang,
Pauline Bolignano,
Franco Raimondi
Abstract:
We present a novel approach to differential cost analysis that, given a program revision, attempts to statically bound the difference in resource usage, or cost, between the two program versions. Differential cost analysis is particularly interesting because of the many compelling applications for it, such as detecting resource-use regressions at code-review time or proving the absence of certain…
▽ More
We present a novel approach to differential cost analysis that, given a program revision, attempts to statically bound the difference in resource usage, or cost, between the two program versions. Differential cost analysis is particularly interesting because of the many compelling applications for it, such as detecting resource-use regressions at code-review time or proving the absence of certain side-channel vulnerabilities. One prior approach to differential cost analysis is to apply relational reasoning that conceptually constructs a product program on which one can over-approximate the difference in costs between the two program versions. However, a significant challenge in any relational approach is effectively aligning the program versions to get precise results. In this paper, our key insight is that we can avoid the need for and the limitations of program alignment if, instead, we bound the difference of two cost-bound summaries rather than directly bounding the concrete cost difference. In particular, our method computes a threshold value for the maximal difference in cost between two program versions simultaneously using two kinds of cost-bound summaries -- a potential function that evaluates to an upper bound for the cost incurred in the first program and an anti-potential function that evaluates to a lower bound for the cost incurred in the second. Our method has a number of desirable properties: it can be fully automated, it allows optimizing the threshold value on relative cost, it is suitable for programs that are not syntactically similar, and it supports non-determinism. We have evaluated an implementation of our approach on a number of program pairs collected from the literature, and we find that our method computes tight threshold values on relative cost in most examples.
△ Less
Submitted 7 April, 2022; v1 submitted 2 April, 2022;
originally announced April 2022.
-
Neural-FST Class Language Model for End-to-End Speech Recognition
Authors:
Antoine Bruguier,
Duc Le,
Rohit Prabhavalkar,
Dangna Li,
Zhe Liu,
Bo Wang,
Eun Chang,
Fuchun Peng,
Ozlem Kalinli,
Michael L. Seltzer
Abstract:
We propose Neural-FST Class Language Model (NFCLM) for end-to-end speech recognition, a novel method that combines neural network language models (NNLMs) and finite state transducers (FSTs) in a mathematically consistent framework. Our method utilizes a background NNLM which models generic background text together with a collection of domain-specific entities modeled as individual FSTs. Each outpu…
▽ More
We propose Neural-FST Class Language Model (NFCLM) for end-to-end speech recognition, a novel method that combines neural network language models (NNLMs) and finite state transducers (FSTs) in a mathematically consistent framework. Our method utilizes a background NNLM which models generic background text together with a collection of domain-specific entities modeled as individual FSTs. Each output token is generated by a mixture of these components; the mixture weights are estimated with a separately trained neural decider. We show that NFCLM significantly outperforms NNLM by 15.8% relative in terms of Word Error Rate. NFCLM achieves similar performance as traditional NNLM and FST shallow fusion while being less prone to overbiasing and 12 times more compact, making it more suitable for on-device usage.
△ Less
Submitted 31 January, 2022; v1 submitted 27 January, 2022;
originally announced January 2022.
-
Mitigating Adversarial Attacks by Distributing Different Copies to Different Users
Authors:
Jiyi Zhang,
Han Fang,
Wesley Joon-Wie Tann,
Ke Xu,
Chengfang Fang,
Ee-Chien Chang
Abstract:
Machine learning models are vulnerable to adversarial attacks. In this paper, we consider the scenario where a model is distributed to multiple buyers, among which a malicious buyer attempts to attack another buyer. The malicious buyer probes its copy of the model to search for adversarial samples and then presents the found samples to the victim's copy of the model in order to replicate the attac…
▽ More
Machine learning models are vulnerable to adversarial attacks. In this paper, we consider the scenario where a model is distributed to multiple buyers, among which a malicious buyer attempts to attack another buyer. The malicious buyer probes its copy of the model to search for adversarial samples and then presents the found samples to the victim's copy of the model in order to replicate the attack. We point out that by distributing different copies of the model to different buyers, we can mitigate the attack such that adversarial samples found on one copy would not work on another copy. We observed that training a model with different randomness indeed mitigates such replication to a certain degree. However, there is no guarantee and retraining is computationally expensive. A number of works extended the retraining method to enhance the differences among models. However, a very limited number of models can be produced using such methods and the computational cost becomes even higher. Therefore, we propose a flexible parameter rewriting method that directly modifies the model's parameters. This method does not require additional training and is able to generate a large number of copies in a more controllable manner, where each copy induces different adversarial regions. Experimentation studies show that rewriting can significantly mitigate the attacks while retaining high classification accuracy. For instance, on GTSRB dataset with respect to Hop Skip Jump attack, using attractor-based rewriter can reduce the success rate of replicating the attack to 0.5% while independently training copies with different randomness can reduce the success rate to 6.5%. From this study, we believe that there are many further directions worth exploring.
△ Less
Submitted 26 May, 2023; v1 submitted 30 November, 2021;
originally announced November 2021.
-
Tenodesis Grasp Emulator: Kinematic Assessment of Wrist-Driven Orthotic Control
Authors:
Erin Y. Chang,
Raghid Mardini,
Andrew I. W. McPherson,
Yuri Gloumakov,
Hannah S. Stuart
Abstract:
Wrist-driven orthotics have been designed to assist people with C6-7 spinal cord injury, however, the kinematic constraint imposed by such a control strategy can impede mobility and lead to abnormal body motion. This study characterizes body compensation using the novel Tenodesis Grasp Emulator, an adaptor orthotic that allows for the investigation of tenodesis gras** in subjects with unimpaired…
▽ More
Wrist-driven orthotics have been designed to assist people with C6-7 spinal cord injury, however, the kinematic constraint imposed by such a control strategy can impede mobility and lead to abnormal body motion. This study characterizes body compensation using the novel Tenodesis Grasp Emulator, an adaptor orthotic that allows for the investigation of tenodesis gras** in subjects with unimpaired hand function. Subjects perform a series of grasp-and-release tasks in order to compare normal (test control) and constrained wrist-driven modes, showing significant compensation as a result of the constraint. A motor-augmented mode is also compared against traditional wrist-driven operation, to explore the potential role of hybrid human-robot control. We find that both the passive wrist-driven and motor-augmented modes fulfill different roles throughout various tasks tested. Thus, we conclude that a flexible control scheme that can alter intervention based on the task at hand holds the potential to reduce compensation in future work.
△ Less
Submitted 9 November, 2023; v1 submitted 22 November, 2021;
originally announced November 2021.
-
Dual Parameterization of Sparse Variational Gaussian Processes
Authors:
Vincent Adam,
Paul E. Chang,
Mohammad Emtiyaz Khan,
Arno Solin
Abstract:
Sparse variational Gaussian process (SVGP) methods are a common choice for non-conjugate Gaussian process inference because of their computational benefits. In this paper, we improve their computational efficiency by using a dual parameterization where each data example is assigned dual parameters, similarly to site parameters used in expectation propagation. Our dual parameterization speeds-up in…
▽ More
Sparse variational Gaussian process (SVGP) methods are a common choice for non-conjugate Gaussian process inference because of their computational benefits. In this paper, we improve their computational efficiency by using a dual parameterization where each data example is assigned dual parameters, similarly to site parameters used in expectation propagation. Our dual parameterization speeds-up inference using natural gradient descent, and provides a tighter evidence lower bound for hyperparameter learning. The approach has the same memory cost as the current SVGP methods, but it is faster and more accurate.
△ Less
Submitted 19 January, 2022; v1 submitted 5 November, 2021;
originally announced November 2021.
-
Whole Brain Segmentation with Full Volume Neural Network
Authors:
Yeshu Li,
Jonathan Cui,
Yilun Sheng,
Xiao Liang,
**gdong Wang,
Eric I-Chao Chang,
Yan Xu
Abstract:
Whole brain segmentation is an important neuroimaging task that segments the whole brain volume into anatomically labeled regions-of-interest. Convolutional neural networks have demonstrated good performance in this task. Existing solutions, usually segment the brain image by classifying the voxels, or labeling the slices or the sub-volumes separately. Their representation learning is based on par…
▽ More
Whole brain segmentation is an important neuroimaging task that segments the whole brain volume into anatomically labeled regions-of-interest. Convolutional neural networks have demonstrated good performance in this task. Existing solutions, usually segment the brain image by classifying the voxels, or labeling the slices or the sub-volumes separately. Their representation learning is based on parts of the whole volume whereas their labeling result is produced by aggregation of partial segmentation. Learning and inference with incomplete information could lead to sub-optimal final segmentation result. To address these issues, we propose to adopt a full volume framework, which feeds the full volume brain image into the segmentation network and directly outputs the segmentation result for the whole brain volume. The framework makes use of complete information in each volume and can be implemented easily. An effective instance in this framework is given subsequently. We adopt the $3$D high-resolution network (HRNet) for learning spatially fine-grained representations and the mixed precision training scheme for memory-efficient training. Extensive experiment results on a publicly available $3$D MRI brain dataset show that our proposed model advances the state-of-the-art methods in terms of segmentation performance. Source code is publicly available at https://github.com/microsoft/VoxHRNet.
△ Less
Submitted 29 October, 2021;
originally announced October 2021.
-
Engineering the Radiative Dynamics of Thermalized Excitons with Metal Interfaces
Authors:
Grace H. Chen,
David Z. Li,
Amy Butcher,
Alexander A. High,
Darrick E. Chang
Abstract:
As a platform for optoelectronic devices based on exciton dynamics, monolayer transition metal dichalcogenides (TMDCs) are often placed near metal interfaces or inside planar cavities. While the radiative properties of point dipoles at metal interfaces has been studied extensively, those of excitons, which are delocalized and exhibit a temperature-dependent momentum distribution, lack a thorough t…
▽ More
As a platform for optoelectronic devices based on exciton dynamics, monolayer transition metal dichalcogenides (TMDCs) are often placed near metal interfaces or inside planar cavities. While the radiative properties of point dipoles at metal interfaces has been studied extensively, those of excitons, which are delocalized and exhibit a temperature-dependent momentum distribution, lack a thorough treatment. Here, we analyze the emission properties of excitons in TMDCs near planar metal interfaces and explore their dependence on exciton center-of-mass momentum, transition dipole orientation, and temperature. Defining a characteristic energy scale $k_B T_c = (\hbar k)^2/2m$~($k$ being the radiative wavevector and $m$ the exciton mass), we find that at temperatures $T\gg T_c$ and low densities where the momentum distribution can be characterized by Maxwell-Boltzmann statistics, the modified emission rates~(normalized to free space) behave similarly to point dipoles at temperatures $T\gg T_c$. This similarity in behavior arises due to the broad nature of wavevector components making up the exciton and point dipole emission. On the other hand, the narrow momentum distribution of excitons for $T<T_c$ can result in significantly different emission behavior as compared to point dipoles. These differences can be further amplified by considering excitons with a Bose Einstein distribution at high phase space densities. We find suppression or enhancement of emission relative to the point dipole case by several orders of magnitude. These insights can help optimize the performance of optoelectronic devices that incorporate 2D semiconductors near metal electrodes and can inform future studies of exciton radiative dynamics at low temperatures. Additionally, these studies show that nanoscale optical cavities are a viable pathway to generating long-lifetime exciton states in TMDCs.
△ Less
Submitted 11 October, 2021;
originally announced October 2021.
-
Emergence of solitons from many-body photon bound states in quantum nonlinear media
Authors:
Giuseppe Calajo,
Darrick E. Chang
Abstract:
Solitons are known to occur in the context of atom-light interaction via the well-known semi-classical phenomenon of self-induced transparency (SIT). Separately, in the regime where both light and atoms are fully treated quantum mechanically, quantum few-photon bound states are known to be a ubiquitous phenomenon that arises in different systems such as atoms coupled to chiral or bidirectional wav…
▽ More
Solitons are known to occur in the context of atom-light interaction via the well-known semi-classical phenomenon of self-induced transparency (SIT). Separately, in the regime where both light and atoms are fully treated quantum mechanically, quantum few-photon bound states are known to be a ubiquitous phenomenon that arises in different systems such as atoms coupled to chiral or bidirectional waveguides, and in Rydberg atomic media. In the specific case of two-level atoms coupled to a chiral waveguide, a recent analysis based on Bethe ansatz has established that SIT emerges from the quantum realm as a superposition of quantum many-photon bound states. Beyond this case, however, the nature of any connection between the full quantum many-body regime and semi-classical behavior has not been established. Here, we employ a general spin-model formulation of quantum atom-light interfaces to numerically investigate this problem, taking advantage of the fact that this approach readily allows for powerful many-body simulations based on matrix product states (MPS). We first analytically derive the two-photon bound state dispersion relation for a variety of atom-light interfaces, and then proceed to numerically investigate the multi-excitation bound states dynamics. Interestingly, for all the specific systems studied, we find that the large-photon number limit always coincides with the soliton phenomenon of self-induced transparency or immediate generalizations thereof.
△ Less
Submitted 9 April, 2022; v1 submitted 30 September, 2021;
originally announced October 2021.
-
Weakly Supervised Contrastive Learning for Chest X-Ray Report Generation
Authors:
An Yan,
Zexue He,
Xing Lu,
Jiang Du,
Eric Chang,
Amilcare Gentili,
Julian McAuley,
Chun-Nan Hsu
Abstract:
Radiology report generation aims at generating descriptive text from radiology images automatically, which may present an opportunity to improve radiology reporting and interpretation. A typical setting consists of training encoder-decoder models on image-report pairs with a cross entropy loss, which struggles to generate informative sentences for clinical diagnoses since normal findings dominate…
▽ More
Radiology report generation aims at generating descriptive text from radiology images automatically, which may present an opportunity to improve radiology reporting and interpretation. A typical setting consists of training encoder-decoder models on image-report pairs with a cross entropy loss, which struggles to generate informative sentences for clinical diagnoses since normal findings dominate the datasets. To tackle this challenge and encourage more clinically-accurate text outputs, we propose a novel weakly supervised contrastive loss for medical report generation. Experimental results demonstrate that our method benefits from contrasting target reports with incorrect but semantically-close ones. It outperforms previous work on both clinical correctness and text generation metrics for two public benchmarks.
△ Less
Submitted 24 September, 2021;
originally announced September 2021.
-
Graph Learning Augmented Heterogeneous Graph Neural Network for Social Recommendation
Authors:
Yiming Zhang,
Lingfei Wu,
Qi Shen,
Yitong Pang,
Zhihua Wei,
Fangli Xu,
Ethan Chang,
Bo Long
Abstract:
Social recommendation based on social network has achieved great success in improving the performance of recommendation system. Since social network (user-user relations) and user-item interactions are both naturally represented as graph-structured data, Graph Neural Networks (GNNs) have thus been widely applied for social recommendation. In this work, we propose an end-to-end heterogeneous global…
▽ More
Social recommendation based on social network has achieved great success in improving the performance of recommendation system. Since social network (user-user relations) and user-item interactions are both naturally represented as graph-structured data, Graph Neural Networks (GNNs) have thus been widely applied for social recommendation. In this work, we propose an end-to-end heterogeneous global graph learning framework, namely Graph Learning Augmented Heterogeneous Graph Neural Network (GL-HGNN) for social recommendation. GL-HGNN aims to learn a heterogeneous global graph that makes full use of user-user relations, user-item interactions and item-item similarities in a unified perspective. To this end, we design a Graph Learner (GL) method to learn and optimize user-user and item-item connections separately. Moreover, we employ a Heterogeneous Graph Neural Network (HGNN) to capture the high-order complex semantic relations from our learned heterogeneous global graph. To scale up the computation of graph learning, we further present the Anchor-based Graph Learner (AGL) to reduce computational complexity. Extensive experiments on four real-world datasets demonstrate the effectiveness of our model.
△ Less
Submitted 24 September, 2021;
originally announced September 2021.
-
Renormalization group analysis of near-field induced dephasing of optical spin waves in an atomic medium
Authors:
Stefano Grava,
Yizun He,
Saijun Wu,
Darrick E. Chang
Abstract:
While typical theories of atom-light interactions treat the atomic medium as being smooth, it is well-known that microscopic optical effects driven by atomic granularity, dipole-dipole interactions, and multiple scattering can lead to important effects. Recently, for example, it was experimentally observed that these ingredients can lead to a fundamental, density-dependent dephasing of optical spi…
▽ More
While typical theories of atom-light interactions treat the atomic medium as being smooth, it is well-known that microscopic optical effects driven by atomic granularity, dipole-dipole interactions, and multiple scattering can lead to important effects. Recently, for example, it was experimentally observed that these ingredients can lead to a fundamental, density-dependent dephasing of optical spin waves in a disordered atomic medium. Here, we go beyond the short-time and dilute limits considered previously, to develop a comprehensive theory of dephasing dynamics for arbitrary times and atomic densities. In particular, we develop a novel, non-perturbative theory based on strong disorder renormalization group, in order to quantitatively predict the dominant role that near-field optical interactions between nearby neighbors has in driving the dephasing process. This theory also enables one to capture the key features of the many-atom dephasing dynamics in terms of an effective single-atom model. These results should shed light on the limits imposed by near-field interactions on quantum optical phenomena in dense atomic media, and illustrate the promise of strong disorder renormalization group as a method of dealing with complex microscopic optical phenomena in such systems.
△ Less
Submitted 20 August, 2021;
originally announced August 2021.
-
Selectively-Amortized Resource Bounding (Extended Version)
Authors:
Tianhan Lu,
Bor-Yuh Evan Chang,
Ashutosh Trivedi
Abstract:
We consider the problem of automatically proving resource bounds. That is, we study how to prove that an integer-valued resource variable is bounded by a given program expression. Automatic resource-bound analysis has recently received significant attention because of a number of important applications (e.g., detecting performance bugs, preventing algorithmic-complexity attacks, identifying side-c…
▽ More
We consider the problem of automatically proving resource bounds. That is, we study how to prove that an integer-valued resource variable is bounded by a given program expression. Automatic resource-bound analysis has recently received significant attention because of a number of important applications (e.g., detecting performance bugs, preventing algorithmic-complexity attacks, identifying side-channel vulnerabilities), where the focus has often been on develo** precise amortized reasoning techniques to infer the most exact resource usage. While such innovations remain critical, we observe that fully precise amortization is not always necessary to prove a bound of interest. And in fact, by amortizing selectively, the needed supporting invariants can be simpler, making the invariant inference task more feasible and predictable. We present a framework for selectively-amortized analysis that mixes worst-case and amortized reasoning via a property decomposition and a program transformation. We show that proving bounds in any such decomposition yields a sound resource bound in the original program, and we give an algorithm for selecting a reasonable decomposition.
△ Less
Submitted 13 October, 2021; v1 submitted 18 August, 2021;
originally announced August 2021.
-
The SelectGen Challenge: Finding the Best Training Samples for Few-Shot Neural Text Generation
Authors:
Ernie Chang,
Xiaoyu Shen,
Alex Marin,
Vera Demberg
Abstract:
We propose a shared task on training instance selection for few-shot neural text generation. Large-scale pretrained language models have led to dramatic improvements in few-shot text generation. Nonetheless, almost all previous work simply applies random sampling to select the few-shot training instances. Little to no attention has been paid to the selection strategies and how they would affect mo…
▽ More
We propose a shared task on training instance selection for few-shot neural text generation. Large-scale pretrained language models have led to dramatic improvements in few-shot text generation. Nonetheless, almost all previous work simply applies random sampling to select the few-shot training instances. Little to no attention has been paid to the selection strategies and how they would affect model performance. The study of the selection strategy can help us to (1) make the most use of our annotation budget in downstream tasks and (2) better benchmark few-shot text generative models. We welcome submissions that present their selection strategies and the effects on the generation quality.
△ Less
Submitted 14 August, 2021;
originally announced August 2021.
-
Quantitative Parametric Map** of Tissues Properties from Standard Magnetic Resonance Imaging Enabled by Deep Learning
Authors:
Yan Wu,
Yajun Ma,
Youngwook Kee,
Nataliya Kovalchuk,
Dante Capaldi,
Hongyi Ren,
Steven Hancock,
Eric Chang,
Marcus Alley,
John Pauly,
Jiang Du,
Shreyas Vasanawala,
Lei Xing
Abstract:
Magnetic resonance imaging (MRI) offers superior soft tissue contrast and is widely used in biomedicine. However, conventional MRI is not quantitative, which presents a bottleneck in image analysis and digital healthcare. Typically, additional scans are required to disentangle the effect of multiple parameters of MR and extract quantitative tissue properties. Here we investigate a data-driven stra…
▽ More
Magnetic resonance imaging (MRI) offers superior soft tissue contrast and is widely used in biomedicine. However, conventional MRI is not quantitative, which presents a bottleneck in image analysis and digital healthcare. Typically, additional scans are required to disentangle the effect of multiple parameters of MR and extract quantitative tissue properties. Here we investigate a data-driven strategy Q^2 MRI (Qualitative and Quantitative MRI) to derive quantitative parametric maps from standard MR images without additional data acquisition. By taking advantage of the interdependency between various MRI parametric maps buried in training data, the proposed deep learning strategy enables accurate prediction of tissue relaxation properties as well as other biophysical and biochemical characteristics from a single or a few images with conventional T_1/T_2 weighting. Superior performance has been achieved in quantitative MR imaging of the knee and liver. Q^2 MRI promises to provide a powerful tool for a variety of biomedical applications and facilitate the next generation of digital medicine.
△ Less
Submitted 10 August, 2021;
originally announced August 2021.
-
Optomechanical strong coupling between a single cavity photon and a single atom
Authors:
Javier Argüello-Luengo,
Darrick E. Chang
Abstract:
Single atoms coupled to a cavity offer unique opportunities as quantum optomechanical devices because of their small mass and strong interaction with light. A particular regime of interest in optomechanics is that of "single-photon strong coupling," where motional displacements on the order of the zero-point uncertainty are sufficient to shift the cavity resonance frequency by more than its linewi…
▽ More
Single atoms coupled to a cavity offer unique opportunities as quantum optomechanical devices because of their small mass and strong interaction with light. A particular regime of interest in optomechanics is that of "single-photon strong coupling," where motional displacements on the order of the zero-point uncertainty are sufficient to shift the cavity resonance frequency by more than its linewidth. In many cavity QED platforms, however, this is unfeasible due to the large cavity linewidth. Here, we propose an alternative route in such systems, which instead relies on the coupling of atomic motion to the much narrower cavity-dressed atomic resonance frequency. We discuss and optimize the conditions in which the scattering properties of single photons from the atom-cavity system become highly entangled with the atomic motional wave function. We also analyze the prominent observable features of this optomechanical strong coupling, which include a per-photon motional heating that is significantly larger than the single-photon recoil energy, as well as mechanically-induced oscillations in time of the second-order correlation function of the emitted light. This physics should be realizable in current experimental setups, such as trapped atoms coupled to photonic crystal cavities, and more broadly opens the door to realizing qualitatively different phenomena beyond what has been observed in optomechanical systems thus far.
△ Less
Submitted 7 August, 2021;
originally announced August 2021.
-
Poisoning Online Learning Filters: DDoS Attacks and Countermeasures
Authors:
Wesley Joon-Wie Tann,
Ee-Chien Chang
Abstract:
The recent advancements in machine learning have led to a wave of interest in adopting online learning-based approaches for long-standing attack mitigation issues. In particular, DDoS attacks remain a significant threat to network service availability even after more than two decades. These attacks have been well studied under the assumption that malicious traffic originates from a single attack p…
▽ More
The recent advancements in machine learning have led to a wave of interest in adopting online learning-based approaches for long-standing attack mitigation issues. In particular, DDoS attacks remain a significant threat to network service availability even after more than two decades. These attacks have been well studied under the assumption that malicious traffic originates from a single attack profile. Based on this premise, malicious traffic characteristics are assumed to be considerably different from legitimate traffic. Consequently, online filtering methods are designed to learn network traffic distributions adaptively and rank requests according to their attack likelihood. During an attack, requests rated as malicious are precipitously dropped by the filters. In this paper, we conduct the first systematic study on the effects of data poisoning attacks on online DDoS filtering; introduce one such attack method, and propose practical protective countermeasures for these attacks. We investigate an adverse scenario where the attacker is "crafty", switching profiles during attacks and generating erratic attack traffic that is ever-shifting. This elusive attacker generates malicious requests by manipulating and shifting traffic distribution to poison the training data and corrupt the filters. To this end, we present a generative model MimicShift, capable of controlling traffic generation while retaining the originating traffic's intrinsic properties. Comprehensive experiments show that online learning filters are highly susceptible to poisoning attacks, sometimes performing much worse than a random filtering strategy in this attack scenario. At the same time, our proposed protective countermeasure diminishes the attack impact.
△ Less
Submitted 19 January, 2022; v1 submitted 27 July, 2021;
originally announced July 2021.
-
Modeling coexisting GSF and shear instabilities in rotating stars
Authors:
Eonho Chang,
Pascale Garaud
Abstract:
Zahn's widely-used model for turbulent mixing induced by rotational shear has recently been validated (with some caveats) in non-rotating shear flows. It is not clear, however, whether his model remains valid in the presence of rotation, even though this was its original purpose. Furthermore, new instabilities arise in rotating fluids, such as the Goldreich-Schubert-Fricke (GSF) instability. Which…
▽ More
Zahn's widely-used model for turbulent mixing induced by rotational shear has recently been validated (with some caveats) in non-rotating shear flows. It is not clear, however, whether his model remains valid in the presence of rotation, even though this was its original purpose. Furthermore, new instabilities arise in rotating fluids, such as the Goldreich-Schubert-Fricke (GSF) instability. Which instability dominates when more than one can be excited, and how they influence each other, were open questions that this paper answers. To do so, we use direct numerical simulations of diffusive stratified shear flows in a rotating triply-periodic Cartesian domain located at the equator of a star. We find that either the GSF instability or the shear instability tends to take over the other in controlling the system, suggesting that stellar evolution models only need to have a mixing prescription for each individual instability, together with a criterion to determine which one dominates. However, we also find that it is not always easy to predict which instability "wins" for given input parameters, because the diffusive shear instability is subcritical, and only takes place if there is a finite-amplitude turbulence ``primer'' to seed it. Interestingly, we find that the GSF instability can in some cases play the role of this primer, thereby providing a pathway to excite the subcritical shear instability. This can also drive relaxation oscillations, that may be observable. We conclude by proposing a new model for mixing in the equatorial regions of stellar radiative zones due to differential rotation.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
Heterogeneous Global Graph Neural Networks for Personalized Session-based Recommendation
Authors:
Yitong Pang,
Lingfei Wu,
Qi Shen,
Yiming Zhang,
Zhihua Wei,
Fangli Xu,
Ethan Chang,
Bo Long,
Jian Pei
Abstract:
Predicting the next interaction of a short-term interaction session is a challenging task in session-based recommendation. Almost all existing works rely on item transition patterns, and neglect the impact of user historical sessions while modeling user preference, which often leads to non-personalized recommendation. Additionally, existing personalized session-based recommenders capture user pref…
▽ More
Predicting the next interaction of a short-term interaction session is a challenging task in session-based recommendation. Almost all existing works rely on item transition patterns, and neglect the impact of user historical sessions while modeling user preference, which often leads to non-personalized recommendation. Additionally, existing personalized session-based recommenders capture user preference only based on the sessions of the current user, but ignore the useful item-transition patterns from other user's historical sessions. To address these issues, we propose a novel Heterogeneous Global Graph Neural Networks (HG-GNN) to exploit the item transitions over all sessions in a subtle manner for better inferring user preference from the current and historical sessions. To effectively exploit the item transitions over all sessions from users, we propose a novel heterogeneous global graph that contains item transitions of sessions, user-item interactions and global co-occurrence items. Moreover, to capture user preference from sessions comprehensively, we propose to learn two levels of user representations from the global graph via two graph augmented preference encoders. Specifically, we design a novel heterogeneous graph neural network (HGNN) on the heterogeneous global graph to learn the long-term user preference and item representations with rich semantics. Based on the HGNN, we propose the Current Preference Encoder and the Historical Preference Encoder to capture the different levels of user preference from the current and historical sessions, respectively. To achieve personalized recommendation, we integrate the representations of the user current preference and historical interests to generate the final user preference representation. Extensive experimental results on three real-world datasets show that our model outperforms other state-of-the-art methods.
△ Less
Submitted 26 February, 2022; v1 submitted 8 July, 2021;
originally announced July 2021.
-
Time-Aware Ancient Chinese Text Translation and Inference
Authors:
Ernie Chang,
Yow-Ting Shiue,
Hui-Syuan Yeh,
Vera Demberg
Abstract:
In this paper, we aim to address the challenges surrounding the translation of ancient Chinese text: (1) The linguistic gap due to the difference in eras results in translations that are poor in quality, and (2) most translations are missing the contextual information that is often very crucial to understanding the text. To this end, we improve upon past translation techniques by proposing the fol…
▽ More
In this paper, we aim to address the challenges surrounding the translation of ancient Chinese text: (1) The linguistic gap due to the difference in eras results in translations that are poor in quality, and (2) most translations are missing the contextual information that is often very crucial to understanding the text. To this end, we improve upon past translation techniques by proposing the following: We reframe the task as a multi-label prediction task where the model predicts both the translation and its particular era. We observe that this helps to bridge the linguistic gap as chronological context is also used as auxiliary information. % As a natural step of generalization, we pivot on the modern Chinese translations to generate multilingual outputs. %We show experimentally the efficacy of our framework in producing quality translation outputs and also validate our framework on a collected task-specific parallel corpus. We validate our framework on a parallel corpus annotated with chronology information and show experimentally its efficacy in producing quality translation outputs. We release both the code and the data https://github.com/orina1123/time-aware-ancient-text-translation for future research.
△ Less
Submitted 7 July, 2021;
originally announced July 2021.
-
On Training Instance Selection for Few-Shot Neural Text Generation
Authors:
Ernie Chang,
Xiaoyu Shen,
Hui-Syuan Yeh,
Vera Demberg
Abstract:
Large-scale pretrained language models have led to dramatic improvements in text generation. Impressive performance can be achieved by finetuning only on a small number of instances (few-shot setting). Nonetheless, almost all previous work simply applies random sampling to select the few-shot training instances. Little to no attention has been paid to the selection strategies and how they would af…
▽ More
Large-scale pretrained language models have led to dramatic improvements in text generation. Impressive performance can be achieved by finetuning only on a small number of instances (few-shot setting). Nonetheless, almost all previous work simply applies random sampling to select the few-shot training instances. Little to no attention has been paid to the selection strategies and how they would affect model performance. In this work, we present a study on training instance selection in few-shot neural text generation. The selection decision is made based only on the unlabeled data so as to identify the most worthwhile data points that should be annotated under some budget of labeling cost. Based on the intuition that the few-shot training instances should be diverse and representative of the entire data distribution, we propose a simple selection strategy with K-means clustering. We show that even with the naive clustering-based approach, the generation models consistently outperform random sampling on three text generation tasks: data-to-text generation, document summarization and question generation. We hope that this work will call for more attention on this largely unexplored area.
△ Less
Submitted 7 July, 2021;
originally announced July 2021.