Search | arXiv e-print repository

Wait, That Feels Familiar: Learning to Extrapolate Human Preferences for Preference Aligned Path Planning

Authors: Haresh Karnan, Elvin Yang, Garrett Warnell, Joydeep Biswas, Peter Stone

Abstract: Autonomous mobility tasks such as lastmile delivery require reasoning about operator indicated preferences over terrains on which the robot should navigate to ensure both robot safety and mission success. However, co** with out of distribution data from novel terrains or appearance changes due to lighting variations remains a fundamental problem in visual terrain adaptive navigation. Existing so… ▽ More Autonomous mobility tasks such as lastmile delivery require reasoning about operator indicated preferences over terrains on which the robot should navigate to ensure both robot safety and mission success. However, co** with out of distribution data from novel terrains or appearance changes due to lighting variations remains a fundamental problem in visual terrain adaptive navigation. Existing solutions either require labor intensive manual data recollection and labeling or use handcoded reward functions that may not align with operator preferences. In this work, we posit that operator preferences for visually novel terrains, which the robot should adhere to, can often be extrapolated from established terrain references within the inertial, proprioceptive, and tactile domain. Leveraging this insight, we introduce Preference extrApolation for Terrain awarE Robot Navigation, PATERN, a novel framework for extrapolating operator terrain preferences for visual navigation. PATERN learns to map inertial, proprioceptive, tactile measurements from the robots observations to a representation space and performs nearest neighbor search in this space to estimate operator preferences over novel terrains. Through physical robot experiments in outdoor environments, we assess PATERNs capability to extrapolate preferences and generalize to novel terrains and challenging lighting conditions. Compared to baseline approaches, our findings indicate that PATERN robustly generalizes to diverse terrains and varied lighting conditions, while navigating in a preference aligned manner. △ Less

Submitted 18 September, 2023; originally announced September 2023.

Journal ref: Under Submission to ICRA 2024

arXiv:2309.09123 [pdf, other]

Conditional Mutual Information Constrained Deep Learning for Classification

Authors: En-Hui Yang, Shayan Mohajer Hamidi, Linfeng Ye, Renhao Tan, Beverly Yang

Abstract: The concepts of conditional mutual information (CMI) and normalized conditional mutual information (NCMI) are introduced to measure the concentration and separation performance of a classification deep neural network (DNN) in the output probability distribution space of the DNN, where CMI and the ratio between CMI and NCMI represent the intra-class concentration and inter-class separation of the D… ▽ More The concepts of conditional mutual information (CMI) and normalized conditional mutual information (NCMI) are introduced to measure the concentration and separation performance of a classification deep neural network (DNN) in the output probability distribution space of the DNN, where CMI and the ratio between CMI and NCMI represent the intra-class concentration and inter-class separation of the DNN, respectively. By using NCMI to evaluate popular DNNs pretrained over ImageNet in the literature, it is shown that their validation accuracies over ImageNet validation data set are more or less inversely proportional to their NCMI values. Based on this observation, the standard deep learning (DL) framework is further modified to minimize the standard cross entropy function subject to an NCMI constraint, yielding CMI constrained deep learning (CMIC-DL). A novel alternating learning algorithm is proposed to solve such a constrained optimization problem. Extensive experiment results show that DNNs trained within CMIC-DL outperform the state-of-the-art models trained within the standard DL and other loss functions in the literature in terms of both accuracy and robustness against adversarial attacks. In addition, visualizing the evolution of learning process through the lens of CMI and NCMI is also advocated. △ Less

Submitted 16 September, 2023; originally announced September 2023.

arXiv:2309.06412 [pdf, other]

Quantum transport in a multi-path graphene Aharonov-Bohm inteferometer

Authors: Cynthia I. Osuala, Zitao Tang, Stefan Strauf, Eui-Hyeok Yang, Chunlei Qu

Abstract: We investigate the quantum transport dynamics of electrons in a multi-path Aharonov-Bohm interferometer comprising several parallel graphene nanoribbons. At low magnetic field strengths, the conductance displays a complex oscillatory behavior stemming from the interference of electron wave functions from different paths, reminiscent of the diffraction grating in optics. With increasing magnetic fi… ▽ More We investigate the quantum transport dynamics of electrons in a multi-path Aharonov-Bohm interferometer comprising several parallel graphene nanoribbons. At low magnetic field strengths, the conductance displays a complex oscillatory behavior stemming from the interference of electron wave functions from different paths, reminiscent of the diffraction grating in optics. With increasing magnetic field strength, certain nanoribbons experience transport blockade, leading to conventional Aharonov-Bohm oscillations arising from two-path interference. We also discuss the impact of edge effects and the influence of finite temperature. Our findings offer valuable insights for experimental investigations of quantum transport in multi-path devices and their potential application for interferometry and quantum sensing. △ Less

Submitted 12 September, 2023; originally announced September 2023.

Comments: 7 pages, 8 figures

arXiv:2309.03451 [pdf, other]

Cross-domain Sound Recognition for Efficient Underwater Data Analysis

Authors: Jeongsoo Park, Dong-Gyun Han, Hyoung Sul La, Sangmin Lee, Yoonchang Han, Eun-** Yang

Abstract: This paper presents a novel deep learning approach for analyzing massive underwater acoustic data by leveraging a model trained on a broad spectrum of non-underwater (aerial) sounds. Recognizing the challenge in labeling vast amounts of underwater data, we propose a two-fold methodology to accelerate this labor-intensive procedure. The first part of our approach involves PCA and UMAP visualizati… ▽ More This paper presents a novel deep learning approach for analyzing massive underwater acoustic data by leveraging a model trained on a broad spectrum of non-underwater (aerial) sounds. Recognizing the challenge in labeling vast amounts of underwater data, we propose a two-fold methodology to accelerate this labor-intensive procedure. The first part of our approach involves PCA and UMAP visualization of the underwater data using the feature vectors of an aerial sound recognition model. This enables us to cluster the data in a two dimensional space and listen to points within these clusters to understand their defining characteristics. This innovative method simplifies the process of selecting candidate labels for further training. In the second part, we train a neural network model using both the selected underwater data and the non-underwater dataset. We conducted a quantitative analysis to measure the precision, recall, and F1 score of our model for recognizing airgun sounds, a common type of underwater sound. The F1 score achieved by our model exceeded 84.3%, demonstrating the effectiveness of our approach in analyzing underwater acoustic data. The methodology presented in this paper holds significant potential to reduce the amount of labor required in underwater data analysis and opens up new possibilities for further research in the field of cross-domain data analysis. △ Less

Submitted 21 February, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

Comments: Accepted to APSIPA 2023

arXiv:2309.01325 [pdf, other]

Data-constrained Magnetohydrodynamic Simulation of an Intermediate Solar Filament Eruption

Authors: Yang Guo, **han Guo, Yiwei Ni, M. D. Ding, P. F. Chen, Chun Xia, Rony Keppens, Kai E. Yang

Abstract: Solar eruptive activities could occur in weak magnetic field environments and over large spatial scales, especially relevant to eruptions involving intermediate or quiescent solar filaments. To handle the large scales, we implement and apply a flux rope embedding method using regularized Biot-Savart laws in the spherical coordinate system. Combined with a potential field source surface model and a… ▽ More Solar eruptive activities could occur in weak magnetic field environments and over large spatial scales, especially relevant to eruptions involving intermediate or quiescent solar filaments. To handle the large scales, we implement and apply a flux rope embedding method using regularized Biot-Savart laws in the spherical coordinate system. Combined with a potential field source surface model and a magneto-frictional method, a nonlinear force-free field comprising a flux rope embedded in a potential field is constructed. Using the combined nonlinear force-free field as the initial condition, we then perform a zero-$β$ data-constrained magnetohydrodynamic (MHD) simulation for an M8.7 flare at 03:38 UT on 2012 January 23. The MHD model reproduces the eruption process, flare ribbon evolution (represented by the quasi-separatrix layer evolution) and kinematics of the flux rope. This approach could potentially model global-scale eruptions from weak field regions. △ Less

Submitted 3 September, 2023; originally announced September 2023.

Comments: 23 pages, 7 figures, accepted for publicaiton in ApJ

arXiv:2309.00023 [pdf, other]

Continual Learning From a Stream of APIs

Authors: Enneng Yang, Zhenyi Wang, Li Shen, Nan Yin, Tongliang Liu, Guibing Guo, Xingwei Wang, Dacheng Tao

Abstract: Continual learning (CL) aims to learn new tasks without forgetting previous tasks. However, existing CL methods require a large amount of raw data, which is often unavailable due to copyright considerations and privacy risks. Instead, stakeholders usually release pre-trained machine learning models as a service (MLaaS), which users can access via APIs. This paper considers two practical-yet-novel… ▽ More Continual learning (CL) aims to learn new tasks without forgetting previous tasks. However, existing CL methods require a large amount of raw data, which is often unavailable due to copyright considerations and privacy risks. Instead, stakeholders usually release pre-trained machine learning models as a service (MLaaS), which users can access via APIs. This paper considers two practical-yet-novel CL settings: data-efficient CL (DECL-APIs) and data-free CL (DFCL-APIs), which achieve CL from a stream of APIs with partial or no raw data. Performing CL under these two new settings faces several challenges: unavailable full raw data, unknown model parameters, heterogeneous models of arbitrary architecture and scale, and catastrophic forgetting of previous APIs. To overcome these issues, we propose a novel data-free cooperative continual distillation learning framework that distills knowledge from a stream of APIs into a CL model by generating pseudo data, just by querying APIs. Specifically, our framework includes two cooperative generators and one CL model, forming their training as an adversarial game. We first use the CL model and the current API as fixed discriminators to train generators via a derivative-free method. Generators adversarially generate hard and diverse synthetic data to maximize the response gap between the CL model and the API. Next, we train the CL model by minimizing the gap between the responses of the CL model and the black-box API on synthetic data, to transfer the API's knowledge to the CL model. Furthermore, we propose a new regularization term based on network similarity to prevent catastrophic forgetting of previous APIs.Our method performs comparably to classic CL with full raw data on the MNIST and SVHN in the DFCL-APIs setting. In the DECL-APIs setting, our method achieves 0.97x, 0.75x and 0.69x performance of classic CL on CIFAR10, CIFAR100, and MiniImageNet. △ Less

Submitted 31 August, 2023; originally announced September 2023.

arXiv:2308.10604 [pdf, other]

BackTrack: Robust template update via Backward Tracking of candidate template

Authors: Dongwook Lee, Wonjun Choi, Seohyung Lee, ByungIn Yoo, Eunho Yang, Seongju Hwang

Abstract: Variations of target appearance such as deformations, illumination variance, occlusion, etc., are the major challenges of visual object tracking that negatively impact the performance of a tracker. An effective method to tackle these challenges is template update, which updates the template to reflect the change of appearance in the target object during tracking. However, with template updates, in… ▽ More Variations of target appearance such as deformations, illumination variance, occlusion, etc., are the major challenges of visual object tracking that negatively impact the performance of a tracker. An effective method to tackle these challenges is template update, which updates the template to reflect the change of appearance in the target object during tracking. However, with template updates, inadequate quality of new templates or inappropriate timing of updates may induce a model drift problem, which severely degrades the tracking performance. Here, we propose BackTrack, a robust and reliable method to quantify the confidence of the candidate template by backward tracking it on the past frames. Based on the confidence score of candidates from BackTrack, we can update the template with a reliable candidate at the right time while rejecting unreliable candidates. BackTrack is a generic template update scheme and is applicable to any template-based trackers. Extensive experiments on various tracking benchmarks verify the effectiveness of BackTrack over existing template update algorithms, as it achieves SOTA performance on various tracking benchmarks. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: 14 pages, 7 figures

arXiv:2307.11362 [pdf, ps, other]

Ordered homomorphisms and kernels of ordered BCI-algebras

Authors: Eunsuk Yang, Eun Hwan Roh, Young Bae Jun

Abstract: Recently Yang-Roh-Jun introduced the notion of ordered BCI-algebras as a generalization of BCI-algebras. They also introduced the notions of homomorphisms and kernels of ordered BCI-algebras and investigated related properties. Here we extend their investigation to ordered homomorphisms, i.e., order-preserving homomorphisms. To this end, the notions of ordered homomorphism and kernel of ordered BC… ▽ More Recently Yang-Roh-Jun introduced the notion of ordered BCI-algebras as a generalization of BCI-algebras. They also introduced the notions of homomorphisms and kernels of ordered BCI-algebras and investigated related properties. Here we extend their investigation to ordered homomorphisms, i.e., order-preserving homomorphisms. To this end, the notions of ordered homomorphism and kernel of ordered BCI-algebras are first defined. Next, properties associated with (ordered) subalgebras, (ordered) filters and direct products of ordered BCI-algebras are addressed. △ Less

Submitted 21 July, 2023; originally announced July 2023.

MSC Class: 03B05; 03G25; 06F35

arXiv:2307.09218 [pdf, other]

A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning

Authors: Zhenyi Wang, Enneng Yang, Li Shen, Heng Huang

Abstract: Forgetting refers to the loss or deterioration of previously acquired information or knowledge. While the existing surveys on forgetting have primarily focused on continual learning, forgetting is a prevalent phenomenon observed in various other research domains within deep learning. Forgetting manifests in research fields such as generative models due to generator shifts, and federated learning d… ▽ More Forgetting refers to the loss or deterioration of previously acquired information or knowledge. While the existing surveys on forgetting have primarily focused on continual learning, forgetting is a prevalent phenomenon observed in various other research domains within deep learning. Forgetting manifests in research fields such as generative models due to generator shifts, and federated learning due to heterogeneous data distributions across clients. Addressing forgetting encompasses several challenges, including balancing the retention of old task knowledge with fast learning of new tasks, managing task interference with conflicting goals, and preventing privacy leakage, etc. Moreover, most existing surveys on continual learning implicitly assume that forgetting is always harmful. In contrast, our survey argues that forgetting is a double-edged sword and can be beneficial and desirable in certain cases, such as privacy-preserving scenarios. By exploring forgetting in a broader context, we aim to present a more nuanced understanding of this phenomenon and highlight its potential advantages. Through this comprehensive survey, we aspire to uncover potential solutions by drawing upon ideas and approaches from various fields that have dealt with forgetting. By examining forgetting beyond its conventional boundaries, in future work, we hope to encourage the development of novel strategies for mitigating, harnessing, or even embracing forgetting in real applications. A comprehensive list of papers about forgetting in various research fields is available at \url{https://github.com/EnnengYang/Awesome-Forgetting-in-Deep-Learning}. △ Less

Submitted 23 July, 2023; v1 submitted 16 July, 2023; originally announced July 2023.

arXiv:2307.04352 [pdf, other]

Phase Diagram and Crossover Phases of Topologically Ordered Graphene Zigzag Nanoribbons: Role of Localization Effects

Authors: Hoang Anh Le, In Hwan Lee, Young Heon Kim, S. -R. Eric Yang

Abstract: We computed the phase diagram of the zigzag graphene nanoribbons as a function of on-site repulsion, do**, and disorder strength. The topologically ordered phase undergoes topological phase transitions into crossover phases, which are new disordered phases with a nonuniversal topological entanglement entropy with significant variance. The topological order is destroyed by competition between loc… ▽ More We computed the phase diagram of the zigzag graphene nanoribbons as a function of on-site repulsion, do**, and disorder strength. The topologically ordered phase undergoes topological phase transitions into crossover phases, which are new disordered phases with a nonuniversal topological entanglement entropy with significant variance. The topological order is destroyed by competition between localization effects and on-site repulsion. We found that strong on-site repulsion and/or do** weakens the nonlocal correlations between the opposite zigzag edges. In one of the crossover phases, both $\frac{e^-}{2}$ fractional charges and spin-charge separation were absent; however, charge-transfer correlations between the zigzag edges were possible. Another crossover phase contains $\frac{e^-}{2}$ fractional charges, but no charge transfer correlations. In low-doped zigzag ribbons the interplay between electron localization and on-site repulsion contributes to the spatial separation of quasi-degenerate gap-edge states and protects the charge fractionalization against quantum fluctuations. In all these effects, mixed chiral gap-edge states play an important role. The properties of nontopological strongly disordered and strongly repulsive phases are also observed. Each phase of the phase diagram has a different zigzag-edge structure. Additionally, we investigated the tunneling of solitonic fractional charges under an applied voltage between the zigzag edges of undoped topologically ordered zigzag ribbons, and found that it may lead to a zero-bias tunneling anomaly. △ Less

Submitted 5 April, 2024; v1 submitted 10 July, 2023; originally announced July 2023.

Comments: Published version, 2024 J. Phys.: Condens. Matter 36 265604

arXiv:2307.02493 [pdf, other]

FREEDOM: Target Label & Source Data & Domain Information-Free Multi-Source Domain Adaptation for Unsupervised Personalization

Authors: Eunju Yang, Gyusang Cho, Chan-Hyun Youn

Abstract: From a service perspective, Multi-Source Domain Adaptation (MSDA) is a promising scenario to adapt a deployed model to a client's dataset. It can provide adaptation without a target label and support the case where a source dataset is constructed from multiple domains. However, it is impractical, wherein its training heavily relies on prior domain information of the multi-source dataset -- how man… ▽ More From a service perspective, Multi-Source Domain Adaptation (MSDA) is a promising scenario to adapt a deployed model to a client's dataset. It can provide adaptation without a target label and support the case where a source dataset is constructed from multiple domains. However, it is impractical, wherein its training heavily relies on prior domain information of the multi-source dataset -- how many domains exist and the domain label of each data sample. Moreover, MSDA requires both source and target datasets simultaneously (physically), causing storage limitations on the client device or data privacy issues by transferring client data to a server. For a more practical scenario of model adaptation from a service provider's point of view, we relax these constraints and present a novel problem scenario of Three-Free Domain Adaptation, namely TFDA, where 1) target labels, 2) source dataset, and mostly 3) source domain information (domain labels + the number of domains) are unavailable. Under the problem scenario, we propose a practical adaptation framework called FREEDOM. It leverages the power of the generative model, disentangling data into class and style aspects, where the style is defined as the class-independent information from the source data and designed with a nonparametric Bayesian approach. In the adaptation stage, FREEDOM aims to match the source class distribution with the target's under the philosophy that class distribution is consistent even if the style is different; after then, only part of the classification model is deployed as a personalized network. As a result, FREEDOM achieves state-of-the-art or comparable performance even without domain information, with reduced final model size on the target side, independent of the number of source domains. △ Less

Submitted 4 July, 2023; originally announced July 2023.

arXiv:2306.02796 [pdf, other]

MCTS: A Multi-Reference Chinese Text Simplification Dataset

Authors: Ruining Chong, Luming Lu, Liner Yang, **ran Nie, Zhenghao Liu, Shuo Wang, Shuhan Zhou, Yaoxin Li, Erhong Yang

Abstract: Text simplification aims to make the text easier to understand by applying rewriting transformations. There has been very little research on Chinese text simplification for a long time. The lack of generic evaluation data is an essential reason for this phenomenon. In this paper, we introduce MCTS, a multi-reference Chinese text simplification dataset. We describe the annotation process of the dat… ▽ More Text simplification aims to make the text easier to understand by applying rewriting transformations. There has been very little research on Chinese text simplification for a long time. The lack of generic evaluation data is an essential reason for this phenomenon. In this paper, we introduce MCTS, a multi-reference Chinese text simplification dataset. We describe the annotation process of the dataset and provide a detailed analysis. Furthermore, we evaluate the performance of several unsupervised methods and advanced large language models. We additionally provide Chinese text simplification parallel data that can be used for training, acquired by utilizing machine translation and English text simplification. We hope to build a basic understanding of Chinese text simplification through the foundational work and provide references for future research. All of the code and data are released at https://github.com/blcuicall/mcts/. △ Less

Submitted 5 June, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: Accepted to COLING 2024

arXiv:2306.01981 [pdf, other]

SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization

Authors: Changhun Kim, Joonhyung Park, Ha** Shim, Eunho Yang

Abstract: Automatic speech recognition (ASR) models are frequently exposed to data distribution shifts in many real-world scenarios, leading to erroneous predictions. To tackle this issue, an existing test-time adaptation (TTA) method has recently been proposed to adapt the pre-trained ASR model on unlabeled test instances without source data. Despite decent performance gain, this work relies solely on naiv… ▽ More Automatic speech recognition (ASR) models are frequently exposed to data distribution shifts in many real-world scenarios, leading to erroneous predictions. To tackle this issue, an existing test-time adaptation (TTA) method has recently been proposed to adapt the pre-trained ASR model on unlabeled test instances without source data. Despite decent performance gain, this work relies solely on naive greedy decoding and performs adaptation across timesteps at a frame level, which may not be optimal given the sequential nature of the model output. Motivated by this, we propose a novel TTA framework, dubbed SGEM, for general ASR models. To treat the sequential output, SGEM first exploits beam search to explore candidate output logits and selects the most plausible one. Then, it utilizes generalized entropy minimization and negative sampling as unsupervised objectives to adapt the model. SGEM achieves state-of-the-art performance for three mainstream ASR models under various domain shifts. △ Less

Submitted 21 June, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

Comments: INTERSPEECH 2023 Oral Presentation; Code is available at https://github.com/drumpt/SGEM

arXiv:2305.17864 [pdf, ps, other]

Laguerre inequality and determinantal inequality for the broken $k$-diamond partition function

Authors: Eve Y. Y. Yang

Abstract: In 2007, Andrews and Paule introduced the broken $k$-diamond partition function $Δ_{k}(n)$, which has received a lot of researches on the arithmetic propertises. In this paper, we will prove the broken $k$-diamond partition function satisfies the Laguerre inequalities of order $2$ and the determinantal inequalities of order $3$ for $k=1$ or $2$. Moreover, we conjectured the thresholds for the Lagu… ▽ More In 2007, Andrews and Paule introduced the broken $k$-diamond partition function $Δ_{k}(n)$, which has received a lot of researches on the arithmetic propertises. In this paper, we will prove the broken $k$-diamond partition function satisfies the Laguerre inequalities of order $2$ and the determinantal inequalities of order $3$ for $k=1$ or $2$. Moreover, we conjectured the thresholds for the Laguerre inequalities of order $m$ and the positivity of $m$-order determinants for $4\leq m\leq 14$ for the broken $k$-diamond partition function when $k=1$ or $2$. △ Less

Submitted 28 May, 2023; originally announced May 2023.

arXiv:2305.13831 [pdf, other]

ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models

Authors: Minki Kang, Wooseok Han, Sung Ju Hwang, Eunho Yang

Abstract: Emotional Text-To-Speech (TTS) is an important task in the development of systems (e.g., human-like dialogue agents) that require natural and emotional speech. Existing approaches, however, only aim to produce emotional TTS for seen speakers during training, without consideration of the generalization to unseen speakers. In this paper, we propose ZET-Speech, a zero-shot adaptive emotion-controllab… ▽ More Emotional Text-To-Speech (TTS) is an important task in the development of systems (e.g., human-like dialogue agents) that require natural and emotional speech. Existing approaches, however, only aim to produce emotional TTS for seen speakers during training, without consideration of the generalization to unseen speakers. In this paper, we propose ZET-Speech, a zero-shot adaptive emotion-controllable TTS model that allows users to synthesize any speaker's emotional speech using only a short, neutral speech segment and the target emotion label. Specifically, to enable a zero-shot adaptive TTS model to synthesize emotional speech, we propose domain adversarial learning and guidance methods on the diffusion model. Experimental results demonstrate that ZET-Speech successfully synthesizes natural and emotional speech with the desired emotion for both seen and unseen speakers. Samples are at https://ZET-Speech.github.io/ZET-Speech-Demo/. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: Accepted by INTERSPEECH 2023

arXiv:2305.06683 [pdf, other]

Cost-efficient Crowdsourcing for Span-based Sequence Labeling: Worker Selection and Data Augmentation

Authors: Yujie Wang, Chao Huang, Liner Yang, Zhixuan Fang, Ya** Huang, Yang Liu, Erhong Yang

Abstract: This paper introduces a novel worker selection algorithm, enhancing annotation quality and reducing costs in challenging span-based sequence labeling tasks in Natural Language Processing (NLP). Unlike previous studies targeting simpler tasks, this study contends with the complexities of label interdependencies in sequence labeling tasks. The proposed algorithm utilizes a Combinatorial Multi-Armed… ▽ More This paper introduces a novel worker selection algorithm, enhancing annotation quality and reducing costs in challenging span-based sequence labeling tasks in Natural Language Processing (NLP). Unlike previous studies targeting simpler tasks, this study contends with the complexities of label interdependencies in sequence labeling tasks. The proposed algorithm utilizes a Combinatorial Multi-Armed Bandit (CMAB) approach for worker selection. The challenge of dealing with imbalanced and small-scale datasets, which hinders offline simulation of worker selection, is tackled using an innovative data augmentation method termed shifting, expanding, and shrinking (SES). The SES method is designed specifically for sequence labeling tasks. Rigorous testing on CoNLL 2003 NER and Chinese OEI datasets showcased the algorithm's efficiency, with an increase in F1 score up to 100.04% of the expert-only baseline, alongside cost savings up to 65.97%. The paper also encompasses a dataset-independent test emulating annotation evaluation through a Bernoulli distribution, which still led to an impressive 97.56% F1 score of the expert baseline and 59.88% cost savings. This research addresses and overcomes numerous obstacles in worker selection for complex NLP tasks. △ Less

Submitted 11 May, 2023; originally announced May 2023.

arXiv:2305.04468 [pdf, other]

AnomalyBERT: Self-Supervised Transformer for Time Series Anomaly Detection using Data Degradation Scheme

Authors: Yungi Jeong, Eunseok Yang, Jung Hyun Ryu, Imseong Park, Myungjoo Kang

Abstract: Mechanical defects in real situations affect observation values and cause abnormalities in multivariate time series, such as sensor values or network data. To perceive abnormalities in such data, it is crucial to understand the temporal context and interrelation between variables simultaneously. The anomaly detection task for time series, especially for unlabeled data, has been a challenging probl… ▽ More Mechanical defects in real situations affect observation values and cause abnormalities in multivariate time series, such as sensor values or network data. To perceive abnormalities in such data, it is crucial to understand the temporal context and interrelation between variables simultaneously. The anomaly detection task for time series, especially for unlabeled data, has been a challenging problem, and we address it by applying a suitable data degradation scheme to self-supervised model training. We define four types of synthetic outliers and propose the degradation scheme in which a portion of input data is replaced with one of the synthetic outliers. Inspired by the self-attention mechanism, we design a Transformer-based architecture to recognize the temporal context and detect unnatural sequences with high efficiency. Our model converts multivariate data points into temporal representations with relative position bias and yields anomaly scores from these representations. Our method, AnomalyBERT, shows a great capability of detecting anomalies contained in complex time series and surpasses previous state-of-the-art methods on five real-world benchmarks. Our code is available at https://github.com/Jhryu30/AnomalyBERT. △ Less

Submitted 8 May, 2023; originally announced May 2023.

Comments: 11 pages, Presented at ICLR 2023 workshop on Machine Learning for IoT

arXiv:2305.00331 [pdf, other]

Synthetic Cross-language Information Retrieval Training Data

Authors: James Mayfield, Eugene Yang, Dawn Lawrie, Samuel Barham, Orion Weller, Marc Mason, Suraj Nair, Scott Miller

Abstract: A key stumbling block for neural cross-language information retrieval (CLIR) systems has been the paucity of training data. The appearance of the MS MARCO monolingual training set led to significant advances in the state of the art in neural monolingual retrieval. By translating the MS MARCO documents into other languages using machine translation, this resource has been made useful to the CLIR co… ▽ More A key stumbling block for neural cross-language information retrieval (CLIR) systems has been the paucity of training data. The appearance of the MS MARCO monolingual training set led to significant advances in the state of the art in neural monolingual retrieval. By translating the MS MARCO documents into other languages using machine translation, this resource has been made useful to the CLIR community. Yet such translation suffers from a number of problems. While MS MARCO is a large resource, it is of fixed size; its genre and domain of discourse are fixed; and the translated documents are not written in the language of a native speaker of the language, but rather in translationese. To address these problems, we introduce the JH-POLO CLIR training set creation methodology. The approach begins by selecting a pair of non-English passages. A generative large language model is then used to produce an English query for which the first passage is relevant and the second passage is not relevant. By repeating this process, collections of arbitrary size can be created in the style of MS MARCO but using naturally-occurring documents in any desired genre and domain of discourse. This paper describes the methodology in detail, shows its use in creating new CLIR training sets, and describes experiments using the newly created training data. △ Less

Submitted 29 April, 2023; originally announced May 2023.

Comments: 11 pages, 4 figures

arXiv:2304.12367 [pdf, other]

Overview of the TREC 2022 NeuCLIR Track

Authors: Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, Eugene Yang

Abstract: This is the first year of the TREC Neural CLIR (NeuCLIR) track, which aims to study the impact of neural approaches to cross-language information retrieval. The main task in this year's track was ad hoc ranked retrieval of Chinese, Persian, or Russian newswire documents using queries expressed in English. Topics were developed using standard TREC processes, except that topics developed by an annot… ▽ More This is the first year of the TREC Neural CLIR (NeuCLIR) track, which aims to study the impact of neural approaches to cross-language information retrieval. The main task in this year's track was ad hoc ranked retrieval of Chinese, Persian, or Russian newswire documents using queries expressed in English. Topics were developed using standard TREC processes, except that topics developed by an annotator for one language were assessed by a different annotator when evaluating that topic on a different language. There were 172 total runs submitted by twelve teams. △ Less

Submitted 24 September, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

Comments: 22 pages, 13 figures, 10 tables. Part of the Thirty-First Text REtrieval Conference (TREC 2022) Proceedings. Replace the misplaced Russian result table

arXiv:2304.10566 [pdf, other]

doi 10.3847/1538-4357/acd75b

Particle-in-Cell Simulations of Relativistic Magnetic Reconnection with Advanced Maxwell Solver Algorithms

Authors: Hannah Klion, Revathi Jambunathan, Michael E. Rowan, Eloise Yang, Donald Willcox, Jean-Luc Vay, Remi Lehe, Andrew Myers, Axel Huebl, Weiqun Zhang

Abstract: Relativistic magnetic reconnection is a non-ideal plasma process that is a source of non-thermal particle acceleration in many high-energy astrophysical systems. Particle-in-cell (PIC) methods are commonly used for simulating reconnection from first principles. While much progress has been made in understanding the physics of reconnection, especially in 2D, the adoption of advanced algorithms and… ▽ More Relativistic magnetic reconnection is a non-ideal plasma process that is a source of non-thermal particle acceleration in many high-energy astrophysical systems. Particle-in-cell (PIC) methods are commonly used for simulating reconnection from first principles. While much progress has been made in understanding the physics of reconnection, especially in 2D, the adoption of advanced algorithms and numerical techniques for efficiently modeling such systems has been limited. With the GPU-accelerated PIC code WarpX, we explore the accuracy and potential performance benefits of two advanced Maxwell solver algorithms: a non-standard finite difference scheme (CKC) and an ultrahigh-order pseudo-spectral method (PSATD). We find that for the relativistic reconnection problem, CKC and PSATD qualitatively and quantitatively match the standard Yee-grid finite-difference method. CKC and PSATD both admit a time step that is 40% longer than Yee, resulting in a ~40% faster time to solution for CKC, but no performance benefit for PSATD when using a current deposition scheme that satisfies Gauss's law. Relaxing this constraint maintains accuracy and yields a 30% speedup. Unlike Yee and CKC, PSATD is numerically stable at any time step, allowing for a larger time step than with the finite-difference methods. We found that increasing the time step 2.4-3 times over the standard Yee step still yields accurate results, but only translates to modest performance improvements over CKC due to the current deposition scheme used with PSATD. Further optimization of this scheme will likely improve the effective performance of PSATD. △ Less

Submitted 20 April, 2023; originally announced April 2023.

Comments: 19 pages, 10 figures. Submitted to ApJ

arXiv:2303.12486 [pdf, other]

Machine learning-informed structuro-elastoplasticity predicts ductility of disordered solids

Authors: Hongyi Xiao, Ge Zhang, Entao Yang, Robert J. S. Ivancic, Sean A. Ridout, Robert Riggleman, Douglas J. Durian, Andrea J. Liu

Abstract: All solids yield under sufficiently high mechanical loads. Below yield, the mechanical responses of all disordered solids are nearly alike, but above yield every different disordered solid responds in its own way. Brittle systems can shatter without warning, like ordinary window glass, or exhibit strain localization prior to fracture, like metallic or polymeric glasses. Ductile systems, e.g. foams… ▽ More All solids yield under sufficiently high mechanical loads. Below yield, the mechanical responses of all disordered solids are nearly alike, but above yield every different disordered solid responds in its own way. Brittle systems can shatter without warning, like ordinary window glass, or exhibit strain localization prior to fracture, like metallic or polymeric glasses. Ductile systems, e.g. foams like shaving cream or emulsions like mayonnaise, can flow indefinitely with no strain localization. While there are empirical strategies for tuning the degree of strain localization, there is no framework that explains their effectiveness or limitations. We show that Structuro-Elastoplastic (StEP) models provide microscopic understanding of how strain localization depends on the interplay of structure, plasticity and elasticity. △ Less

Submitted 22 March, 2023; originally announced March 2023.

arXiv:2303.11173 [pdf]

doi 10.3390/ma16103701

2D Magnetic Semiconductors via Substitutional Do** of Transition Metal Dichalcogenides

Authors: Mengqi Fang, Eui-Hyeok Yang

Abstract: Transition metal dichalcogenides (TMDs) are two-dimensional (2D) materials with remarkable electrical, optical and chemical properties. One promising strategy to tailor TMD properties of TMDs is to create alloys through dopant-induced modification. Dopants can introduce additional states within the bandgap of TMDs, leading to changes in their optical, electronic, and magnetic properties. This pape… ▽ More Transition metal dichalcogenides (TMDs) are two-dimensional (2D) materials with remarkable electrical, optical and chemical properties. One promising strategy to tailor TMD properties of TMDs is to create alloys through dopant-induced modification. Dopants can introduce additional states within the bandgap of TMDs, leading to changes in their optical, electronic, and magnetic properties. This paper overviews chemical vapor deposition (CVD) methods to introduce dopants into TMD monolayers. The advantages and limitations and their impacts on the doped TMDs' structural, electrical, optical, and magnetic properties are discussed. The dopants in TMDs modify the density and type of carriers in the material, thereby influencing the optical properties of the materials. The TMDs' magnetic moment and circular dichroism are also strongly affected by do**, which enhances the magnetic signal in the material. Finally, we highlight the different do**-induced magnetic properties of TMDs, including superexchange-induced ferromagnetism and valley Zeeman shift. Overall, this review paper provides a comprehensive summary of magnetic TMDs synthesized via CVD, which can guide future research on doped TMDs for various applications, such as spintronics, optoelectronics, and magnetic memory devices. △ Less

Submitted 20 March, 2023; originally announced March 2023.

arXiv:2302.13331 [pdf, other]

Learning Input-agnostic Manipulation Directions in StyleGAN with Text Guidance

Authors: Yoonjeon Kim, Hyunsu Kim, Junho Kim, Yunjey Choi, Eunho Yang

Abstract: With the advantages of fast inference and human-friendly flexible manipulation, image-agnostic style manipulation via text guidance enables new applications that were not previously available. The state-of-the-art text-guided image-agnostic manipulation method embeds the representation of each channel of StyleGAN independently in the Contrastive Language-Image Pre-training (CLIP) space, and provid… ▽ More With the advantages of fast inference and human-friendly flexible manipulation, image-agnostic style manipulation via text guidance enables new applications that were not previously available. The state-of-the-art text-guided image-agnostic manipulation method embeds the representation of each channel of StyleGAN independently in the Contrastive Language-Image Pre-training (CLIP) space, and provides it in the form of a Dictionary to quickly find out the channel-wise manipulation direction during inference time. However, in this paper we argue that this dictionary which is constructed by controlling single channel individually is limited to accommodate the versatility of text guidance since the collective and interactive relation among multiple channels are not considered. Indeed, we show that it fails to discover a large portion of manipulation directions that can be found by existing methods, which manually manipulates latent space without texts. To alleviate this issue, we propose a novel method that learns a Dictionary, whose entry corresponds to the representation of a single channel, by taking into account the manipulation effect coming from the interaction with multiple other channels. We demonstrate that our strategy resolves the inability of previous methods in finding diverse known directions from unsupervised methods and unknown directions from random text while maintaining the real-time inference speed and disentanglement ability. △ Less

Submitted 26 February, 2023; originally announced February 2023.

Comments: Accepted to ICLR 2023

arXiv:2302.13231 [pdf]

A Synthetic Texas Backbone Power System with Climate-Dependent Spatio-Temporal Correlated Profiles

Authors: ** Lu, Xingpeng Li, Hongyi Li, Taher Chegini, Carlos Gamarra, Y. C. Ethan Yang, Margaret Cook, Gavin Dillingham

Abstract: Most power system test cases only have electrical parameters and can be used only for studies based on a snapshot of system profiles. To facilitate more comprehensive and practical studies, a synthetic power system including spatio-temporal correlated profiles for the entire year of 2019 at one-hour resolution has been created in this work. This system, referred to as the synthetic Texas 123-bus b… ▽ More Most power system test cases only have electrical parameters and can be used only for studies based on a snapshot of system profiles. To facilitate more comprehensive and practical studies, a synthetic power system including spatio-temporal correlated profiles for the entire year of 2019 at one-hour resolution has been created in this work. This system, referred to as the synthetic Texas 123-bus backbone transmission (TX-123BT) system, has very similar temporal and spatial characteristics with the actual Electric Reliability Council of Texas (ERCOT) system. It has a backbone network consisting of only high-voltage transmission lines in Texas, which is obtained by the K-medoids clustering method. The climate data extracted from the North American Land Data Assimilation System (NLDAS) are used to create the climate-dependent profiles of renewable generation and transmission thermal limits. Two climate-dependent models are implemented to determine wind and solar power production pro-files respectively. In addition, two sets of climate-dependent dy-namic line rating (DLR) profiles are created with the actual climate information: (i) daily DLR and (ii) hourly DLR. Simulation results of security-constrained unit commitment (SCUC) conducted on each of the daily system profiles have validated the developed one-year hourly time series dataset. △ Less

Submitted 25 February, 2023; originally announced February 2023.

Comments: 10 pages, 14 figures, 12 tables

arXiv:2302.09560 [pdf, other]

Deep Selector-JPEG: Adaptive JPEG Image Compression for Computer Vision in Image classification with Human Vision Criteria

Authors: Hossam Amer, Sepideh Shaterian, En-hui Yang

Abstract: With limited storage/bandwidth resources, input images to Computer Vision (CV) applications that use Deep Neural Networks (DNNs) are often encoded with JPEG that is tailored to Human Vision (HV). This paper presents Deep Selector-JPEG, an adaptive JPEG compression method that targets image classification while satisfying HV criteria. For each image, Deep Selector-JPEG selects adaptively a Quality… ▽ More With limited storage/bandwidth resources, input images to Computer Vision (CV) applications that use Deep Neural Networks (DNNs) are often encoded with JPEG that is tailored to Human Vision (HV). This paper presents Deep Selector-JPEG, an adaptive JPEG compression method that targets image classification while satisfying HV criteria. For each image, Deep Selector-JPEG selects adaptively a Quality Factor (QF) to compress the image so that a good trade-off between the Compression Ratio (CR) and DNN classifier Accuracy (Rate-Accuracy performance) can be achieved over a set of images for a variety of DNN classifiers while the MS-SSIM of such compressed image is greater than a threshold value predetermined by HV with a high probability. Deep Selector-JPEG is designed via light-weighted or heavy-weighted selector architectures. Experimental results show that in comparison with JPEG at the same CR, Deep Selector-JPEG achieves better Rate-Accuracy performance over the ImageNet validation set for all tested DNN classifiers with gains in classification accuracy between 0.2% and 1% at the same CRs while satisfying HV constraints. Deep Selector-JPEG can also roughly provide the original classification accuracy at higher CRs. △ Less

Submitted 19 February, 2023; originally announced February 2023.

Comments: 4 pages, 2 figures

arXiv:2302.04143 [pdf, other]

Predicting Thrombectomy Recanalization from CT Imaging Using Deep Learning Models

Authors: Haoyue Zhang, Jennifer S. Polson, Eric J. Yang, Kambiz Nael, William Speier, Corey W. Arnold

Abstract: For acute ischemic stroke (AIS) patients with large vessel occlusions, clinicians must decide if the benefit of mechanical thrombectomy (MTB) outweighs the risks and potential complications following an invasive procedure. Pre-treatment computed tomography (CT) and angiography (CTA) are widely used to characterize occlusions in the brain vasculature. If a patient is deemed eligible, a modified tre… ▽ More For acute ischemic stroke (AIS) patients with large vessel occlusions, clinicians must decide if the benefit of mechanical thrombectomy (MTB) outweighs the risks and potential complications following an invasive procedure. Pre-treatment computed tomography (CT) and angiography (CTA) are widely used to characterize occlusions in the brain vasculature. If a patient is deemed eligible, a modified treatment in cerebral ischemia (mTICI) score will be used to grade how well blood flow is reestablished throughout and following the MTB procedure. An estimation of the likelihood of successful recanalization can support treatment decision-making. In this study, we proposed a fully automated prediction of a patient's recanalization score using pre-treatment CT and CTA imaging. We designed a spatial cross attention network (SCANet) that utilizes vision transformers to localize to pertinent slices and brain regions. Our top model achieved an average cross-validated ROC-AUC of 77.33 $\pm$ 3.9\%. This is a promising result that supports future applications of deep learning on CT and CTA for the identification of eligible AIS patients for MTB. △ Less

Submitted 17 April, 2024; v1 submitted 8 February, 2023; originally announced February 2023.

Comments: Medical Imaging with Deep Learning 2022 accepted short paper Jun 2022

Journal ref: Medical Imaging with Deep Learning 2022

arXiv:2301.11386 [pdf]

Task formulation for Extracting Social Determinants of Health from Clinical Narratives

Authors: Manabu Torii, Ian M. Finn, Son Doan, Paul Wang, Elly W. Yang, Daniel S. Zisook

Abstract: Objective: The 2022 n2c2 NLP Challenge posed identification of social determinants of health (SDOH) in clinical narratives. We present three systems that we developed for the Challenge and discuss the distinctive task formulation used in each of the three systems. Materials and Methods: The first system identifies target pieces of information independently using machine learning classifiers. The s… ▽ More Objective: The 2022 n2c2 NLP Challenge posed identification of social determinants of health (SDOH) in clinical narratives. We present three systems that we developed for the Challenge and discuss the distinctive task formulation used in each of the three systems. Materials and Methods: The first system identifies target pieces of information independently using machine learning classifiers. The second system uses a large language model (LLM) to extract complete structured outputs per document. The third system extracts candidate phrases using machine learning and identifies target relations with hand-crafted rules. Results: The three systems achieved F1 scores of 0.884, 0.831, and 0.663 in the Subtask A of the Challenge, which are ranked third, seventh, and eighth among the 15 participating teams. The review of the extraction results from our systems reveals characteristics of each approach and those of the SODH extraction task. Discussion: Phrases and relations annotated in the task is unique and diverse, not conforming to the conventional event extraction task. These annotations are difficult to model with limited training data. The system that extracts information independently, ignoring the annotated relations, achieves the highest F1 score. Meanwhile, LLM with its versatile capability achieves the high F1 score, while respecting the annotated relations. The rule-based system tackling relation extraction obtains the low F1 score, while it is the most explainable approach. Conclusion: The F1 scores of the three systems vary in this challenge setting, but each approach has advantages and disadvantages in a practical application. The selection of the approach depends not only on the F1 score but also on the requirements in the application. △ Less

Submitted 26 January, 2023; originally announced January 2023.

ACM Class: I.2.7

arXiv:2301.10352 [pdf, ps, other]

Capacity Analysis of Vector Symbolic Architectures

Authors: Kenneth L. Clarkson, Shashanka Ubaru, Elizabeth Yang

Abstract: Hyperdimensional computing (HDC) is a biologically-inspired framework which represents symbols with high-dimensional vectors, and uses vector operations to manipulate them. The ensemble of a particular vector space and a prescribed set of vector operations (including one addition-like for "bundling" and one outer-product-like for "binding") form a *vector symbolic architecture* (VSA). While VSAs h… ▽ More Hyperdimensional computing (HDC) is a biologically-inspired framework which represents symbols with high-dimensional vectors, and uses vector operations to manipulate them. The ensemble of a particular vector space and a prescribed set of vector operations (including one addition-like for "bundling" and one outer-product-like for "binding") form a *vector symbolic architecture* (VSA). While VSAs have been employed in numerous applications and have been studied empirically, many theoretical questions about VSAs remain open. We analyze the *representation capacities* of four common VSAs: MAP-I, MAP-B, and two VSAs based on sparse binary vectors. "Representation capacity' here refers to bounds on the dimensions of the VSA vectors required to perform certain symbolic tasks, such as testing for set membership $i \in S$ and estimating set intersection sizes $|X \cap Y|$ for two sets of symbols $X$ and $Y$, to a given degree of accuracy. We also analyze the ability of a novel variant of a Hopfield network (a simple model of associative memory) to perform some of the same tasks that are typically asked of VSAs. In addition to providing new bounds on VSA capacities, our analyses establish and leverage connections between VSAs, "sketching" (dimensionality reduction) algorithms, and Bloom filters. △ Less

Submitted 14 February, 2023; v1 submitted 24 January, 2023; originally announced January 2023.

arXiv:2301.00816 [pdf]

Thermo-optic phase shifter based on hydrogen-doped indium oxide microheater

Authors: Weiyu Tong, Erqi Yang, Yu Pang, Haobo Yang, Xin Qian, Ronggui Yang, Bin Hu, Jianji Dong, Xinliang Zhang

Abstract: Thermo-optic (TO) phase shifters are very fundamental units in large-scale active silicon photonic integrated circuits (PICs). However, due to the limitation of microheater materials with a trade-off between heating efficiency and absorption loss, designs reported so far typically suffer from slow response time, high power consumption, low yields, and so on. Here, we demonstrate an energy-efficien… ▽ More Thermo-optic (TO) phase shifters are very fundamental units in large-scale active silicon photonic integrated circuits (PICs). However, due to the limitation of microheater materials with a trade-off between heating efficiency and absorption loss, designs reported so far typically suffer from slow response time, high power consumption, low yields, and so on. Here, we demonstrate an energy-efficient, fast-response, and low-loss TO phase shifter by introducing hydrogen-doped indium oxide (IHO) films as microheater, and the optimized electron concentration with enhanced mobility endows the IHO high conductivity as well as high near-infrared (NIR) transparency, which allow it to directly contact the silicon waveguide without any insulating layer for efficient tuning and fast response. The TO phase shifter achieves a sub-microsecond response time (970 ns/980 ns) with a π phase shift power consumption of 9.6 mW. And the insertion loss introduced by the IHO microheater is ~ 0.5 dB. The proposed IHO-based microheaters with compatible processing technology illustrate the great potential of such material in the application of large-scale silicon PICs. △ Less

Submitted 2 January, 2023; originally announced January 2023.

Comments: 10 pages, 4 figures, journal

arXiv:2212.10448 [pdf, other]

Parameter-efficient Zero-shot Transfer for Cross-Language Dense Retrieval with Adapters

Authors: Eugene Yang, Suraj Nair, Dawn Lawrie, James Mayfield, Douglas W. Oard

Abstract: A popular approach to creating a zero-shot cross-language retrieval model is to substitute a monolingual pretrained language model in the retrieval model with a multilingual pretrained language model such as Multilingual BERT. This multilingual model is fined-tuned to the retrieval task with monolingual data such as English MS MARCO using the same training recipe as the monolingual retrieval model… ▽ More A popular approach to creating a zero-shot cross-language retrieval model is to substitute a monolingual pretrained language model in the retrieval model with a multilingual pretrained language model such as Multilingual BERT. This multilingual model is fined-tuned to the retrieval task with monolingual data such as English MS MARCO using the same training recipe as the monolingual retrieval model used. However, such transferred models suffer from mismatches in the languages of the input text during training and inference. In this work, we propose transferring monolingual retrieval models using adapters, a parameter-efficient component for a transformer network. By adding adapters pretrained on language tasks for a specific language with task-specific adapters, prior work has shown that the adapter-enhanced models perform better than fine-tuning the entire model when transferring across languages in various NLP tasks. By constructing dense retrieval models with adapters, we show that models trained with monolingual data are more effective than fine-tuning the entire model when transferring to a Cross Language Information Retrieval (CLIR) setting. However, we found that the prior suggestion of replacing the language adapters to match the target language at inference time is suboptimal for dense retrieval models. We provide an in-depth analysis of this discrepancy between other cross-language NLP tasks and CLIR. △ Less

Submitted 20 December, 2022; originally announced December 2022.

Comments: 15 pages, 1 figure

arXiv:2212.08262 [pdf, other]

Uniform Sequence Better: Time Interval Aware Data Augmentation for Sequential Recommendation

Authors: Yizhou Dang, Enneng Yang, Guibing Guo, Linying Jiang, Xingwei Wang, Xiaoxiao Xu, Qinghui Sun, Hong Liu

Abstract: Sequential recommendation is an important task to predict the next-item to access based on a sequence of interacted items. Most existing works learn user preference as the transition pattern from the previous item to the next one, ignoring the time interval between these two items. However, we observe that the time interval in a sequence may vary significantly different, and thus result in the ine… ▽ More Sequential recommendation is an important task to predict the next-item to access based on a sequence of interacted items. Most existing works learn user preference as the transition pattern from the previous item to the next one, ignoring the time interval between these two items. However, we observe that the time interval in a sequence may vary significantly different, and thus result in the ineffectiveness of user modeling due to the issue of \emph{preference drift}. In fact, we conducted an empirical study to validate this observation, and found that a sequence with uniformly distributed time interval (denoted as uniform sequence) is more beneficial for performance improvement than that with greatly varying time interval. Therefore, we propose to augment sequence data from the perspective of time interval, which is not studied in the literature. Specifically, we design five operators (Ti-Crop, Ti-Reorder, Ti-Mask, Ti-Substitute, Ti-Insert) to transform the original non-uniform sequence to uniform sequence with the consideration of variance of time intervals. Then, we devise a control strategy to execute data augmentation on item sequences in different lengths. Finally, we implement these improvements on a state-of-the-art model CoSeRec and validate our approach on four real datasets. The experimental results show that our approach reaches significantly better performance than the other 11 competing methods. Our implementation is available: https://github.com/KingGugu/TiCoSeRec. △ Less

Submitted 17 December, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

Comments: 9 pages, 4 figures, AAAI-2023

arXiv:2212.02802 [pdf, other]

Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding

Authors: Gyeongman Kim, Ha** Shim, Hyunsu Kim, Yunjey Choi, Junho Kim, Eunho Yang

Abstract: Inspired by the impressive performance of recent face image editing methods, several studies have been naturally proposed to extend these methods to the face video editing task. One of the main challenges here is temporal consistency among edited frames, which is still unresolved. To this end, we propose a novel face video editing framework based on diffusion autoencoders that can successfully ext… ▽ More Inspired by the impressive performance of recent face image editing methods, several studies have been naturally proposed to extend these methods to the face video editing task. One of the main challenges here is temporal consistency among edited frames, which is still unresolved. To this end, we propose a novel face video editing framework based on diffusion autoencoders that can successfully extract the decomposed features - for the first time as a face video editing model - of identity and motion from a given video. This modeling allows us to edit the video by simply manipulating the temporally invariant feature to the desired direction for the consistency. Another unique strength of our model is that, since our model is based on diffusion models, it can satisfy both reconstruction and edit capabilities at the same time, and is robust to corner cases in wild face videos (e.g. occluded faces) unlike the existing GAN-based methods. △ Less

Submitted 27 March, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

Comments: CVPR 2023. Our project page: https://diff-video-ae.github.io

arXiv:2211.15909 [pdf, other]

doi 10.3847/1538-4357/aca6e1

Investigating pre-eruptive magnetic properties at the footprints of erupting magnetic flux ropes

Authors: Wensi Wang, Jiong Qiu, Rui Liu, Chunming Zhu, Kai E Yang, Qiang Hu, Yuming Wang

Abstract: It is well established that solar eruptions are powered by free magnetic energy stored in current-carrying magnetic field in the corona. It has also been generally accepted that magnetic flux ropes (MFRs) are a critical component of many coronal mass ejections (CMEs). What remains controversial is whether MFRs are present well before the eruption. Our aim is to identify progenitors of MFRs, and in… ▽ More It is well established that solar eruptions are powered by free magnetic energy stored in current-carrying magnetic field in the corona. It has also been generally accepted that magnetic flux ropes (MFRs) are a critical component of many coronal mass ejections (CMEs). What remains controversial is whether MFRs are present well before the eruption. Our aim is to identify progenitors of MFRs, and investigate pre-eruptive magnetic properties associated with these progenitors. Here we analyze 28 MFRs erupting within 45 deg from the disk center from 2010 to 2015. All MFRs'feet are well identified by conjugate coronal dimmings. We then calculate magnetic properties at the feet of the MFRs, prior to their eruptions, using Helioseismic and Magnetic Imager (HMI) vector magnetograms. Our results show that only 8 erupting MFRs are associated with significant non-neutralized electric currents, 4 of which also exhibit pre-eruptive dimmings at the foot-prints. Twist and current distributions are asymmetric at the two feet of these MFRs. The presence of pre-eruption dimmings associated with non-neutralized currents suggests the pre-existing MFRs. Furthermore, evolution of conjugate dimmings and electric currents within the foot-prints can provide clues about the internal structure of MFRs and their formation mechanism. △ Less

Submitted 28 November, 2022; originally announced November 2022.

Comments: 40 pages, 13 figures, 6 tables, Accepted for publication in ApJ

arXiv:2211.15055 [pdf, other]

doi 10.1609/aaai.v37i9.26275

AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task Learning

Authors: Enneng Yang, Junwei Pan, Ximei Wang, Haibin Yu, Li Shen, Xihua Chen, Lei Xiao, Jie Jiang, Guibing Guo

Abstract: Multi-task learning (MTL) models have demonstrated impressive results in computer vision, natural language processing, and recommender systems. Even though many approaches have been proposed, how well these approaches balance different tasks on each parameter still remains unclear. In this paper, we propose to measure the task dominance degree of a parameter by the total updates of each task on th… ▽ More Multi-task learning (MTL) models have demonstrated impressive results in computer vision, natural language processing, and recommender systems. Even though many approaches have been proposed, how well these approaches balance different tasks on each parameter still remains unclear. In this paper, we propose to measure the task dominance degree of a parameter by the total updates of each task on this parameter. Specifically, we compute the total updates by the exponentially decaying Average of the squared Updates (AU) on a parameter from the corresponding task.Based on this novel metric, we observe that many parameters in existing MTL methods, especially those in the higher shared layers, are still dominated by one or several tasks. The dominance of AU is mainly due to the dominance of accumulative gradients from one or several tasks. Motivated by this, we propose a Task-wise Adaptive learning rate approach, AdaTask in short, to separate the \emph{accumulative gradients} and hence the learning rate of each task for each parameter in adaptive learning rate approaches (e.g., AdaGrad, RMSProp, and Adam). Comprehensive experiments on computer vision and recommender system MTL datasets demonstrate that AdaTask significantly improves the performance of dominated tasks, resulting SOTA average task-wise performance. Analysis on both synthetic and real-world datasets shows AdaTask balance parameters in every shared layer well. △ Less

Submitted 18 May, 2023; v1 submitted 27 November, 2022; originally announced November 2022.

Comments: 14 pages, 8 figures

Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2023). Vol. 37. No. 9. 2023

arXiv:2211.14540 [pdf, other]

Lexical Complexity Controlled Sentence Generation

Authors: **ran Nie, Liner Yang, Yun Chen, Cunliang Kong, Junhui Zhu, Erhong Yang

Abstract: Text generation rarely considers the control of lexical complexity, which limits its more comprehensive practical application. We introduce a novel task of lexical complexity controlled sentence generation, which aims at keywords to sentence generation with desired complexity levels. It has enormous potential in domains such as grade reading, language teaching and acquisition. The challenge of thi… ▽ More Text generation rarely considers the control of lexical complexity, which limits its more comprehensive practical application. We introduce a novel task of lexical complexity controlled sentence generation, which aims at keywords to sentence generation with desired complexity levels. It has enormous potential in domains such as grade reading, language teaching and acquisition. The challenge of this task is to generate fluent sentences only using the words of given complexity levels. We propose a simple but effective approach for this task based on complexity embedding. Compared with potential solutions, our approach fuses the representations of the word complexity levels into the model to get better control of lexical complexity. And we demonstrate the feasibility of the approach for both training models from scratch and fine-tuning the pre-trained models. To facilitate the research, we develop two datasets in English and Chinese respectively, on which extensive experiments are conducted. Results show that our approach better controls lexical complexity and generates higher quality sentences than baseline methods. △ Less

Submitted 26 November, 2022; originally announced November 2022.

arXiv:2211.10985 [pdf, ps, other]

The quadratic Artin conductor of a motivic spectrum

Authors: Fangzhou **, Enlin Yang

Abstract: Given a motivic spectrum $K$ over a smooth proper scheme which is dualizable over an open subscheme, we define its quadratic Artin conductor under some assumptions, and prove a formula relating the quadratic Euler characteristic of $K$, the rank of $K$ and the quadratic Artin conductor. As a consequence, we obtain a quadratic refinement of the classical Grothendieck-Ogg-Shafarevich formula. Given a motivic spectrum $K$ over a smooth proper scheme which is dualizable over an open subscheme, we define its quadratic Artin conductor under some assumptions, and prove a formula relating the quadratic Euler characteristic of $K$, the rank of $K$ and the quadratic Artin conductor. As a consequence, we obtain a quadratic refinement of the classical Grothendieck-Ogg-Shafarevich formula. △ Less

Submitted 29 June, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

MSC Class: 11E81; 14C17; 14E22; 14F42

arXiv:2210.00158 [pdf, ps, other]

Local and global expansion in random geometric graphs

Authors: Siqi Liu, Sidhanth Mohanty, Tselil Schramm, Elizabeth Yang

Abstract: Consider a random geometric 2-dimensional simplicial complex $X$ sampled as follows: first, sample $n$ vectors $\boldsymbol{u_1},\ldots,\boldsymbol{u_n}$ uniformly at random on $\mathbb{S}^{d-1}$; then, for each triple $i,j,k \in [n]$, add $\{i,j,k\}$ and all of its subsets to $X$ if and only if… ▽ More Consider a random geometric 2-dimensional simplicial complex $X$ sampled as follows: first, sample $n$ vectors $\boldsymbol{u_1},\ldots,\boldsymbol{u_n}$ uniformly at random on $\mathbb{S}^{d-1}$; then, for each triple $i,j,k \in [n]$, add $\{i,j,k\}$ and all of its subsets to $X$ if and only if $\langle{\boldsymbol{u_i},\boldsymbol{u_j}}\rangle \ge τ, \langle{\boldsymbol{u_i},\boldsymbol{u_k}}\rangle \ge τ$, and $\langle \boldsymbol{u_j}, \boldsymbol{u_k}\rangle \ge τ$. We prove that for every $\varepsilon > 0$, there exists a choice of $d = Θ(\log n)$ and $τ= τ(\varepsilon,d)$ so that with high probability, $X$ is a high-dimensional expander of average degree $n^\varepsilon$ in which each $1$-link has spectral gap bounded away from $\frac{1}{2}$. To our knowledge, this is the first demonstration of a natural distribution over $2$-dimensional expanders of arbitrarily small polynomial average degree and spectral link expansion better than $\frac{1}{2}$. All previously known constructions are algebraic. This distribution also furnishes an example of simplicial complexes for which the trickle-down theorem is nearly tight. En route, we prove general bounds on the spectral expansion of random induced subgraphs of arbitrary vertex transitive graphs, which may be of independent interest. For example, one consequence is an almost-sharp bound on the second eigenvalue of random $n$-vertex geometric graphs on $\mathbb{S}^{d-1}$, which was previously unknown for most $n,d$ pairs. △ Less

Submitted 30 September, 2022; originally announced October 2022.

Comments: 59 pages

arXiv:2209.15208 [pdf, other]

Scale-invariant Bayesian Neural Networks with Connectivity Tangent Kernel

Authors: SungYub Kim, Sihwan Park, Kyungsu Kim, Eunho Yang

Abstract: Explaining generalizations and preventing over-confident predictions are central goals of studies on the loss landscape of neural networks. Flatness, defined as loss invariability on perturbations of a pre-trained solution, is widely accepted as a predictor of generalization in this context. However, the problem that flatness and generalization bounds can be changed arbitrarily according to the sc… ▽ More Explaining generalizations and preventing over-confident predictions are central goals of studies on the loss landscape of neural networks. Flatness, defined as loss invariability on perturbations of a pre-trained solution, is widely accepted as a predictor of generalization in this context. However, the problem that flatness and generalization bounds can be changed arbitrarily according to the scale of a parameter was pointed out, and previous studies partially solved the problem with restrictions: Counter-intuitively, their generalization bounds were still variant for the function-preserving parameter scaling transformation or limited only to an impractical network structure. As a more fundamental solution, we propose new prior and posterior distributions invariant to scaling transformations by \textit{decomposing} the scale and connectivity of parameters, thereby allowing the resulting generalization bound to describe the generalizability of a broad class of networks with the more practical class of transformations such as weight decay with batch normalization. We also show that the above issue adversely affects the uncertainty calibration of Laplace approximation and propose a solution using our invariant posterior. We empirically demonstrate our posterior provides effective flatness and calibration measures with low complexity in such a practical parameter transformation case, supporting its practical effectiveness in line with our rationale. △ Less

Submitted 29 September, 2022; originally announced September 2022.

arXiv:2209.14614 [pdf, other]

COMPILING: A Benchmark Dataset for Chinese Complexity Controllable Definition Generation

Authors: Jiaxin Yuan, Cunliang Kong, Chenhui Xie, Liner Yang, Erhong Yang

Abstract: The definition generation task aims to generate a word's definition within a specific context automatically. However, owing to the lack of datasets for different complexities, the definitions produced by models tend to keep the same complexity level. This paper proposes a novel task of generating definitions for a word with controllable complexity levels. Correspondingly, we introduce COMPILING, a… ▽ More The definition generation task aims to generate a word's definition within a specific context automatically. However, owing to the lack of datasets for different complexities, the definitions produced by models tend to keep the same complexity level. This paper proposes a novel task of generating definitions for a word with controllable complexity levels. Correspondingly, we introduce COMPILING, a dataset given detailed information about Chinese definitions, and each definition is labeled with its complexity levels. The COMPILING dataset includes 74,303 words and 106,882 definitions. To the best of our knowledge, it is the largest dataset of the Chinese definition generation task. We select various representative generation methods as baselines for this task and conduct evaluations, which illustrates that our dataset plays an outstanding role in assisting models in generating different complexity-level definitions. We believe that the COMPILING dataset will benefit further research in complexity controllable definition generation. △ Less

Submitted 29 September, 2022; originally announced September 2022.

Comments: Accepted by CCL 2022

arXiv:2209.11086 [pdf, ps, other]

Cohomological Milnor formula and Saito's conjecture on characteristic classes

Authors: Enlin Yang, Yigeng Zhao

Abstract: We confirm the quasi-projective case of Saito's conjecture, namely that the cohomological characteristic classes defined by Abbes and Saito can be computed in terms of the characteristic cycles. We construct a cohomological characteristic class supported on the non-acyclicity locus of a separated morphism relatively to a constructible sheaf. As applications of the functorial properties of this c… ▽ More We confirm the quasi-projective case of Saito's conjecture, namely that the cohomological characteristic classes defined by Abbes and Saito can be computed in terms of the characteristic cycles. We construct a cohomological characteristic class supported on the non-acyclicity locus of a separated morphism relatively to a constructible sheaf. As applications of the functorial properties of this class, we prove cohomological analogs of the Milnor formula and the conductor formula for constructible sheaves on (not necessarily smooth) varieties. △ Less

Submitted 2 October, 2023; v1 submitted 22 September, 2022; originally announced September 2022.

Comments: 53 pages; Proved the generalized fibration formula for the non-acyclicity class. Revised some proofs

arXiv:2209.01335 [pdf, other]

Neural Approaches to Multilingual Information Retrieval

Authors: Dawn Lawrie, Eugene Yang, Douglas W. Oard, James Mayfield

Abstract: Providing access to information across languages has been a goal of Information Retrieval (IR) for decades. While progress has been made on Cross Language IR (CLIR) where queries are expressed in one language and documents in another, the multilingual (MLIR) task to create a single ranked list of documents across many languages is considerably more challenging. This paper investigates whether adva… ▽ More Providing access to information across languages has been a goal of Information Retrieval (IR) for decades. While progress has been made on Cross Language IR (CLIR) where queries are expressed in one language and documents in another, the multilingual (MLIR) task to create a single ranked list of documents across many languages is considerably more challenging. This paper investigates whether advances in neural document translation and pretrained multilingual neural language models enable improvements in the state of the art over earlier MLIR techniques. The results show that although combining neural document translation with neural ranking yields the best Mean Average Precision (MAP), 98% of that MAP score can be achieved with an 84% reduction in indexing time by using a pretrained XLM-R multilingual language model to index documents in their native language, and that 2% difference in effectiveness is not statistically significant. Key to achieving these results for MLIR is to fine-tune XLM-R using mixed-language batches from neural translations of MS MARCO passages. △ Less

Submitted 9 February, 2023; v1 submitted 3 September, 2022; originally announced September 2022.

Comments: 17 pages, 3 figures, accepted at ECIR 2023

arXiv:2208.12409 [pdf, ps, other]

doi 10.1038/s41598-022-18731-6

New disordered anyon phase of doped graphene zigzag nanoribbon

Authors: Young Heon Kim, Hye Jeong Lee, Hyng-Yong Lee, S. -R. Eric Yang

Abstract: We investigate interacting disordered zigzag nanoribbons at low do**, using the Hubbard model to treat electron interactions within the density matrix renormalization group and Hartree-Fock method. Extra electrons that are inserted into an interacting disordered zigzag nanoribbon divide into anyons. Furthermore, the fractional charges form a new disordered anyon phase with a highly distorted edg… ▽ More We investigate interacting disordered zigzag nanoribbons at low do**, using the Hubbard model to treat electron interactions within the density matrix renormalization group and Hartree-Fock method. Extra electrons that are inserted into an interacting disordered zigzag nanoribbon divide into anyons. Furthermore, the fractional charges form a new disordered anyon phase with a highly distorted edge spin density wave, containing numerous localized magnetic moments residing on the zigzag edges, thereby displaying spin-charge separation and a strong non-local correlation between the opposite zigzag edges. We make the following new predictions, which can be experimentally tested: (1) In the low do** case and weak disorder regime, the soft gap in the tunneling density of states is replaced by a sharp peak at the midgap energy with two accompanying peaks. The $e^-/2$ fractional charges that reside on the boundary of the zigzag edges are responsible for these peaks. (2) We find that the midgap peak disappears as the do** concentration increases. The presence of $e-/2$ fractional charges will be strongly supported by the detection of these peaks. Doped zigzag ribbons may also exhibit unusual transport, magnetic, and inter-edge tunneling properties. △ Less

Submitted 25 August, 2022; originally announced August 2022.

Comments: 16 pages, 1~11 pages are main paper and 12~16 pages are supplementary material. 8 figures(for main article), 4 figures (for supplementary material)

Journal ref: Sci Rep 12, 14551 (2022)

arXiv:2208.11989 [pdf, ps, other]

The pro-Chern-Schwarz-MacPherson class in Borel-Moore motivic homology

Authors: Fangzhou **, Peng Sun, Enlin Yang

Abstract: We show that the zero-dimensional part of the pro-Chern-Schwarz-MacPherson class defined by Aluffi is equal to the pro-characteristic class in limit Borel-Moore motivic homology. A similar construction also produces a quadratic refinement of this class in the limit Borel-Moore Milnor-Witt homology. We show that the zero-dimensional part of the pro-Chern-Schwarz-MacPherson class defined by Aluffi is equal to the pro-characteristic class in limit Borel-Moore motivic homology. A similar construction also produces a quadratic refinement of this class in the limit Borel-Moore Milnor-Witt homology. △ Less

Submitted 25 September, 2022; v1 submitted 25 August, 2022; originally announced August 2022.

MSC Class: 14F42; 14C17

arXiv:2207.03075 [pdf, other]

Towards the Practical Utility of Federated Learning in the Medical Domain

Authors: Seongjun Yang, Hyeonji Hwang, Daeyoung Kim, Radhika Dua, Jong-Yeup Kim, Eunho Yang, Edward Choi

Abstract: Federated learning (FL) is an active area of research. One of the most suitable areas for adopting FL is the medical domain, where patient privacy must be respected. Previous research, however, does not provide a practical guide to applying FL in the medical domain. We propose empirical benchmarks and experimental settings for three representative medical datasets with different modalities: longit… ▽ More Federated learning (FL) is an active area of research. One of the most suitable areas for adopting FL is the medical domain, where patient privacy must be respected. Previous research, however, does not provide a practical guide to applying FL in the medical domain. We propose empirical benchmarks and experimental settings for three representative medical datasets with different modalities: longitudinal electronic health records, skin cancer images, and electrocardiogram signals. The likely users of FL such as medical institutions and IT companies can take these benchmarks as guides for adopting FL and minimize their trial and error. For each dataset, each client data is from a different source to preserve real-world heterogeneity. We evaluate six FL algorithms designed for addressing data heterogeneity among clients, and a hybrid algorithm combining the strengths of two representative FL algorithms. Based on experiment results from three modalities, we discover that simple FL algorithms tend to outperform more sophisticated ones, while the hybrid algorithm consistently shows good, if not the best performance. We also find that a frequent global model update leads to better performance under a fixed training iteration budget. As the number of participating clients increases, higher cost is incurred due to increased IT administrators and GPUs, but the performance consistently increases. We expect future users will refer to these empirical benchmarks to design the FL experiments in the medical domain considering their clinical tasks and obtain stronger performance with lower costs. △ Less

Submitted 19 May, 2023; v1 submitted 7 July, 2022; originally announced July 2022.

Comments: Accepted to the Main conference of CHIL2023

arXiv:2207.00089 [pdf]

doi 10.1038/s41551-023-01057-7

Rapid and stain-free quantification of viral plaque via lens-free holography and deep learning

Authors: Tairan Liu, Yuzhu Li, Hatice Ceylan Koydemir, Yijie Zhang, Ethan Yang, Merve Eryilmaz, Hongda Wang, **gxi Li, Bijie Bai, Guangdong Ma, Aydogan Ozcan

Abstract: We present a rapid and stain-free quantitative viral plaque assay using lensfree holographic imaging and deep learning. This cost-effective, compact, and automated device significantly reduces the incubation time needed for traditional plaque assays while preserving their advantages over other virus quantification methods. This device captures ~0.32 Giga-pixel/hour phase information of the objects… ▽ More We present a rapid and stain-free quantitative viral plaque assay using lensfree holographic imaging and deep learning. This cost-effective, compact, and automated device significantly reduces the incubation time needed for traditional plaque assays while preserving their advantages over other virus quantification methods. This device captures ~0.32 Giga-pixel/hour phase information of the objects per test well, covering an area of ~30x30 mm^2, in a label-free manner, eliminating staining entirely. We demonstrated the success of this computational method using vesicular stomatitis virus (VSV), herpes simplex virus (HSV-1) and encephalomyocarditis virus (EMCV). Using a neural network, this stain-free device automatically detected the first cell lysing events due to the VSV viral replication as early as 5 hours after the incubation, and achieved >90% detection rate for the VSV plaque-forming units (PFUs) with 100% specificity in <20 hours, providing major time savings compared to the traditional plaque assays that take at least 48 hours. Similarly, this stain-free device reduced the needed incubation time by ~48 hours for HSV-1 and ~20 hours for EMCV, achieving >90% detection rate with 100% specificity. We also demonstrated that this data-driven plaque assay offers the capability of quantifying the infected area of the cell monolayer, performing automated counting and quantification of PFUs and virus-infected areas over a 10-fold larger dynamic range of virus concentration than standard viral plaque assays. This compact, low-cost, automated PFU quantification device can be broadly used in virology research, vaccine development, and clinical applications. △ Less

Submitted 22 June, 2023; v1 submitted 30 June, 2022; originally announced July 2022.

Comments: 24 Pages, 6 Figures

Journal ref: Nature Biomedical Engineering (2023)

arXiv:2206.12917 [pdf, other]

TAM: Topology-Aware Margin Loss for Class-Imbalanced Node Classification

Authors: Jaeyun Song, Joonhyung Park, Eunho Yang

Abstract: Learning unbiased node representations under class-imbalanced graph data is challenging due to interactions between adjacent nodes. Existing studies have in common that they compensate the minor class nodes `as a group' according to their overall quantity (ignoring node connections in graph), which inevitably increase the false positive cases for major nodes. We hypothesize that the increase in th… ▽ More Learning unbiased node representations under class-imbalanced graph data is challenging due to interactions between adjacent nodes. Existing studies have in common that they compensate the minor class nodes `as a group' according to their overall quantity (ignoring node connections in graph), which inevitably increase the false positive cases for major nodes. We hypothesize that the increase in these false positive cases is highly affected by the label distribution around each node and confirm it experimentally. In addition, in order to handle this issue, we propose Topology-Aware Margin (TAM) to reflect local topology on the learning objective. Our method compares the connectivity pattern of each node with the class-averaged counter-part and adaptively adjusts the margin accordingly based on that. Our method consistently exhibits superiority over the baselines on various node classification benchmark datasets with representative GNN architectures. △ Less

Submitted 26 June, 2022; originally announced June 2022.

Comments: Accepted to ICML 2022; First two authors equally contributed

arXiv:2206.10779 [pdf, other]

Not Just Streaks: Towards Ground Truth for Single Image Deraining

Authors: Yunhao Ba, Howard Zhang, Ethan Yang, Akira Suzuki, Arnold Pfahnl, Chethan Chinder Chandrappa, Celso de Melo, Suya You, Stefano Soatto, Alex Wong, Achuta Kadambi

Abstract: We propose a large-scale dataset of real-world rainy and clean image pairs and a method to remove degradations, induced by rain streaks and rain accumulation, from the image. As there exists no real-world dataset for deraining, current state-of-the-art methods rely on synthetic data and thus are limited by the sim2real domain gap; moreover, rigorous evaluation remains a challenge due to the absenc… ▽ More We propose a large-scale dataset of real-world rainy and clean image pairs and a method to remove degradations, induced by rain streaks and rain accumulation, from the image. As there exists no real-world dataset for deraining, current state-of-the-art methods rely on synthetic data and thus are limited by the sim2real domain gap; moreover, rigorous evaluation remains a challenge due to the absence of a real paired dataset. We fill this gap by collecting a real paired deraining dataset through meticulous control of non-rain variations. Our dataset enables paired training and quantitative evaluation for diverse real-world rain phenomena (e.g. rain streaks and rain accumulation). To learn a representation robust to rain phenomena, we propose a deep neural network that reconstructs the underlying scene by minimizing a rain-robust loss between rainy and clean images. Extensive experiments demonstrate that our model outperforms the state-of-the-art deraining methods on real rainy images under various conditions. Project website: https://visual.ee.ucla.edu/gt_rain.htm/. △ Less

Submitted 28 August, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

arXiv:2206.01999 [pdf, other]

MSR: Making Self-supervised learning Robust to Aggressive Augmentations

Authors: Yingbin Bai, Erkun Yang, Zhaoqing Wang, Yuxuan Du, Bo Han, Cheng Deng, Dadong Wang, Tongliang Liu

Abstract: Most recent self-supervised learning methods learn visual representation by contrasting different augmented views of images. Compared with supervised learning, more aggressive augmentations have been introduced to further improve the diversity of training pairs. However, aggressive augmentations may distort images' structures leading to a severe semantic shift problem that augmented views of the s… ▽ More Most recent self-supervised learning methods learn visual representation by contrasting different augmented views of images. Compared with supervised learning, more aggressive augmentations have been introduced to further improve the diversity of training pairs. However, aggressive augmentations may distort images' structures leading to a severe semantic shift problem that augmented views of the same image may not share the same semantics, thus degrading the transfer performance. To address this problem, we propose a new SSL paradigm, which counteracts the impact of semantic shift by balancing the role of weak and aggressively augmented pairs. Specifically, semantically inconsistent pairs are of minority and we treat them as noisy pairs. Note that deep neural networks (DNNs) have a crucial memorization effect that DNNs tend to first memorize clean (majority) examples before overfitting to noisy (minority) examples. Therefore, we set a relatively large weight for aggressively augmented data pairs at the early learning stage. With the training going on, the model begins to overfit noisy pairs. Accordingly, we gradually reduce the weights of aggressively augmented pairs. In doing so, our method can better embrace the aggressive augmentations and neutralize the semantic shift problem. Experiments show that our model achieves 73.1% top-1 accuracy on ImageNet-1K with ResNet-50 for 200 epochs, which is a 2.5% improvement over BYOL. Moreover, experiments also demonstrate that the learned representations can transfer well for various downstream tasks. △ Less

Submitted 4 June, 2022; originally announced June 2022.

arXiv:2205.08678 [pdf, other]

Structuro-elasto-plasticity (StEP) model for plasticity in disordered solids

Authors: Ge Zhang, Hongyi Xiao, Entao Yang, Robert J. S. Ivancic, Sean A. Ridout, Robert A. Riggleman, Douglas J. Durian, Andrea J. Liu

Abstract: Elastoplastic lattice models for the response of solids to deformation typically incorporate structure only implicitly via a local yield strain that is assigned to each site. However, the local yield strain can change in response to a nearby or even distant plastic event in the system. This interplay is key to understanding phenomena such as avalanches in which one plastic event can trigger anothe… ▽ More Elastoplastic lattice models for the response of solids to deformation typically incorporate structure only implicitly via a local yield strain that is assigned to each site. However, the local yield strain can change in response to a nearby or even distant plastic event in the system. This interplay is key to understanding phenomena such as avalanches in which one plastic event can trigger another, leading to a cascade of events, but typically is neglected in elastoplastic models. To include the interplay one could calculate the local yield strain for a given particulate system and follow its evolution, but this is expensive and requires knowledge of particle interactions, which is often hard to extract from experiments. Instead, we introduce a structural quantity, "softness," obtained using machine learning to correlate with imminent plastic rearrangements. We show that softness also correlates with local yield strain. We incorporate softness to construct a "structuro-elasto-plasticity" model that reproduces particle simulation results quantitatively for several observable quantities, confirming that we capture the influence of the interplay of local structure, plasticity, and elasticity on material response. △ Less

Submitted 17 May, 2022; originally announced May 2022.

arXiv:2205.03325 [pdf, other]

OMU: A Probabilistic 3D Occupancy Map** Accelerator for Real-time OctoMap at the Edge

Authors: Tianyu Jia, En-Yu Yang, Yu-Shun Hsiao, Jonathan Cruz, David Brooks, Gu-Yeon Wei, Vijay Janapa Reddi

Abstract: Autonomous machines (e.g., vehicles, mobile robots, drones) require sophisticated 3D map** to perceive the dynamic environment. However, maintaining a real-time 3D map is expensive both in terms of compute and memory requirements, especially for resource-constrained edge machines. Probabilistic OctoMap is a reliable and memory-efficient 3D dense map model to represent the full environment, with… ▽ More Autonomous machines (e.g., vehicles, mobile robots, drones) require sophisticated 3D map** to perceive the dynamic environment. However, maintaining a real-time 3D map is expensive both in terms of compute and memory requirements, especially for resource-constrained edge machines. Probabilistic OctoMap is a reliable and memory-efficient 3D dense map model to represent the full environment, with dynamic voxel node pruning and expansion capacity. This paper presents the first efficient accelerator solution, i.e. OMU, to enable real-time probabilistic 3D map** at the edge. To improve the performance, the input map voxels are updated via parallel PE units for data parallelism. Within each PE, the voxels are stored using a specially developed data structure in parallel memory banks. In addition, a pruning address manager is designed within each PE unit to reuse the pruned memory addresses. The proposed 3D map** accelerator is implemented and evaluated using a commercial 12 nm technology. Compared to the ARM Cortex-A57 CPU in the Nvidia Jetson TX2 platform, the proposed accelerator achieves up to 62$\times$ performance and 708$\times$ energy efficiency improvement. Furthermore, the accelerator provides 63 FPS throughput, more than 2$\times$ higher than a real-time requirement, enabling real-time perception for 3D map**. △ Less

Submitted 6 May, 2022; originally announced May 2022.

Comments: 2022 Design Automation and Test in Europe Conference (DATE), March 14-23, 2022, Virtual

Showing 51–100 of 333 results for author: Yang, E