Search | arXiv e-print repository

Rethinking Superpixel Segmentation from Biologically Inspired Mechanisms

Authors: Tingyu Zhao, Bo Peng, Yuan Sun, Daipeng Yang, Zhenguang Zhang, Xi Wu

Abstract: Recently, advancements in deep learning-based superpixel segmentation methods have brought about improvements in both the efficiency and the performance of segmentation. However, a significant challenge remains in generating superpixels that strictly adhere to object boundaries while conveying rich visual significance, especially when cross-surface color correlations may interfere with objects. Dr… ▽ More Recently, advancements in deep learning-based superpixel segmentation methods have brought about improvements in both the efficiency and the performance of segmentation. However, a significant challenge remains in generating superpixels that strictly adhere to object boundaries while conveying rich visual significance, especially when cross-surface color correlations may interfere with objects. Drawing inspiration from neural structure and visual mechanisms, we propose a biological network architecture comprising an Enhanced Screening Module (ESM) and a novel Boundary-Aware Label (BAL) for superpixel segmentation. The ESM enhances semantic information by simulating the interactive projection mechanisms of the visual cortex. Additionally, the BAL emulates the spatial frequency characteristics of visual cortical cells to facilitate the generation of superpixels with strong boundary adherence. We demonstrate the effectiveness of our approach through evaluations on both the BSDS500 dataset and the NYUv2 dataset. △ Less

Submitted 11 October, 2023; v1 submitted 23 September, 2023; originally announced September 2023.

arXiv:2309.09831 [pdf, other]

Pivotal Estimation of Linear Discriminant Analysis in High Dimensions

Authors: Ethan X. Fang, Yajun Mei, Yuyang Shi, Qunzhi Xu, Tuo Zhao

Abstract: We consider the linear discriminant analysis problem in the high-dimensional settings. In this work, we propose PANDA(PivotAl liNear Discriminant Analysis), a tuning-insensitive method in the sense that it requires very little effort to tune the parameters. Moreover, we prove that PANDA achieves the optimal convergence rate in terms of both the estimation error and misclassification rate. Our theo… ▽ More We consider the linear discriminant analysis problem in the high-dimensional settings. In this work, we propose PANDA(PivotAl liNear Discriminant Analysis), a tuning-insensitive method in the sense that it requires very little effort to tune the parameters. Moreover, we prove that PANDA achieves the optimal convergence rate in terms of both the estimation error and misclassification rate. Our theoretical results are backed up by thorough numerical studies using both simulated and real datasets. In comparison with the existing methods, we observe that our proposed PANDA yields equal or better performance, and requires substantially less effort in parameter tuning. △ Less

Submitted 18 September, 2023; originally announced September 2023.

arXiv:2309.06149 [pdf, ps, other]

Two involutions on binary trees and generalizations

Authors: Yang Li, Zhicong Lin, Tongyuan Zhao

Abstract: This paper investigates two involutions on binary trees. One is the mirror symmetry of binary trees which combined with the classical bijection $\varphi$ between binary trees and plane trees answers an open problem posed by Bai and Chen. This involution can be generalized to weakly increasing trees, which admits to merge two recent equidistributions found by Bai--Chen and Chen--Fu, respectively. T… ▽ More This paper investigates two involutions on binary trees. One is the mirror symmetry of binary trees which combined with the classical bijection $\varphi$ between binary trees and plane trees answers an open problem posed by Bai and Chen. This involution can be generalized to weakly increasing trees, which admits to merge two recent equidistributions found by Bai--Chen and Chen--Fu, respectively. The other one is constructed to answer a bijective problem on di-sk trees asked by Fu--Lin--Wang and can be generalized naturally to rooted labeled trees. This second involution combined with $\varphi$ leads to a new statistic on plane trees whose distribution gives the Catalan's triangle. Moreover, a quadruple equidistribution on plane trees involving this new statistic is proved via a recursive bijection. △ Less

Submitted 12 September, 2023; originally announced September 2023.

Comments: 24 pages, 16 figures

arXiv:2309.02632 [pdf, other]

Deep Reinforcement Learning from Hierarchical Preference Design

Authors: Alexander Bukharin, Yixiao Li, Pengcheng He, Tuo Zhao

Abstract: Reward design is a fundamental, yet challenging aspect of reinforcement learning (RL). Researchers typically utilize feedback signals from the environment to handcraft a reward function, but this process is not always effective due to the varying scale and intricate dependencies of the feedback signals. This paper shows by exploiting certain structures, one can ease the reward design process. Spec… ▽ More Reward design is a fundamental, yet challenging aspect of reinforcement learning (RL). Researchers typically utilize feedback signals from the environment to handcraft a reward function, but this process is not always effective due to the varying scale and intricate dependencies of the feedback signals. This paper shows by exploiting certain structures, one can ease the reward design process. Specifically, we propose a hierarchical reward modeling framework -- HERON for scenarios: (I) The feedback signals naturally present hierarchy; (II) The reward is sparse, but with less important surrogate feedback to help policy learning. Both scenarios allow us to design a hierarchical decision tree induced by the importance ranking of the feedback signals to compare RL trajectories. With such preference data, we can then train a reward model for policy learning. We apply HERON to several RL applications, and we find that our framework can not only train high performing agents on a variety of difficult tasks, but also provide additional benefits such as improved sample efficiency and robustness. Our code is available at \url{https://github.com/abukharin3/HERON}. △ Less

Submitted 10 June, 2024; v1 submitted 5 September, 2023; originally announced September 2023.

Comments: 28 Pages, 14 figures

arXiv:2309.02610 [pdf, other]

T-SaS: Toward Shift-aware Dynamic Adaptation for Streaming Data

Authors: Weijieying Ren, Tianxiang Zhao, Wei Qin, Kunpeng Liu

Abstract: In many real-world scenarios, distribution shifts exist in the streaming data across time steps. Many complex sequential data can be effectively divided into distinct regimes that exhibit persistent dynamics. Discovering the shifted behaviors and the evolving patterns underlying the streaming data are important to understand the dynamic system. Existing methods typically train one robust model to… ▽ More In many real-world scenarios, distribution shifts exist in the streaming data across time steps. Many complex sequential data can be effectively divided into distinct regimes that exhibit persistent dynamics. Discovering the shifted behaviors and the evolving patterns underlying the streaming data are important to understand the dynamic system. Existing methods typically train one robust model to work for the evolving data of distinct distributions or sequentially adapt the model utilizing explicitly given regime boundaries. However, there are two challenges: (1) shifts in data streams could happen drastically and abruptly without precursors. Boundaries of distribution shifts are usually unavailable, and (2) training a shared model for all domains could fail to capture varying patterns. This paper aims to solve the problem of sequential data modeling in the presence of sudden distribution shifts that occur without any precursors. Specifically, we design a Bayesian framework, dubbed as T-SaS, with a discrete distribution-modeling variable to capture abrupt shifts of data. Then, we design a model that enable adaptation with dynamic network selection conditioned on that discrete variable. The proposed method learns specific model parameters for each distribution by learning which neurons should be activated in the full network. A dynamic masking strategy is adopted here to support inter-distribution transfer through the overlap** of a set of sparse networks. Extensive experiments show that our proposed method is superior in both accurately detecting shift boundaries to get segments of varying distributions and effectively adapting to downstream forecast or classification tasks. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: CIKM 2023

arXiv:2309.00738 [pdf, other]

Rethinking the Power of Graph Canonization in Graph Representation Learning with Stability

Authors: Zehao Dong, Muhan Zhang, Philip R. O. Payne, Michael A Province, Carlos Cruchaga, Tianyu Zhao, Fuhai Li, Yixin Chen

Abstract: The expressivity of Graph Neural Networks (GNNs) has been studied broadly in recent years to reveal the design principles for more powerful GNNs. Graph canonization is known as a typical approach to distinguish non-isomorphic graphs, yet rarely adopted when develo** expressive GNNs. This paper proposes to maximize the expressivity of GNNs by graph canonization, then the power of such GNNs is stu… ▽ More The expressivity of Graph Neural Networks (GNNs) has been studied broadly in recent years to reveal the design principles for more powerful GNNs. Graph canonization is known as a typical approach to distinguish non-isomorphic graphs, yet rarely adopted when develo** expressive GNNs. This paper proposes to maximize the expressivity of GNNs by graph canonization, then the power of such GNNs is studies from the perspective of model stability. A stable GNN will map similar graphs to close graph representations in the vectorial space, and the stability of GNNs is critical to generalize their performance to unseen graphs. We theoretically reveal the trade-off of expressivity and stability in graph-canonization-enhanced GNNs. Then we introduce a notion of universal graph canonization as the general solution to address the trade-off and characterize a widely applicable sufficient condition to solve the universal graph canonization. A comprehensive set of experiments demonstrates the effectiveness of the proposed method. In many popular graph benchmark datasets, graph canonization successfully enhances GNNs and provides highly competitive performance, indicating the capability and great potential of proposed method in general graph representation learning. In graph datasets where the sufficient condition holds, GNNs enhanced by universal graph canonization consistently outperform GNN baselines and successfully improve the SOTA performance up to $31\%$, providing the optimal solution to numerous challenging real-world graph analytical tasks like gene network representation learning in bioinformatics. △ Less

Submitted 9 February, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

arXiv:2308.16422 [pdf, other]

doi 10.1103/PhysRevD.109.084054

Dilated convolutional neural network for detecting extreme-mass-ratio inspirals

Authors: Tianyu Zhao, Yue Zhou, Ruijun Shi, Zhoujian Cao, Zhixiang Ren

Abstract: The detection of Extreme Mass Ratio Inspirals (EMRIs) is intricate due to their complex waveforms, extended duration, and low signal-to-noise ratio (SNR), making them more challenging to be identified compared to compact binary coalescences. While matched filtering-based techniques are known for their computational demands, existing deep learning-based methods primarily handle time-domain data and… ▽ More The detection of Extreme Mass Ratio Inspirals (EMRIs) is intricate due to their complex waveforms, extended duration, and low signal-to-noise ratio (SNR), making them more challenging to be identified compared to compact binary coalescences. While matched filtering-based techniques are known for their computational demands, existing deep learning-based methods primarily handle time-domain data and are often constrained by data duration and SNR. In addition, most existing work ignores time-delay interferometry (TDI) and applies the long-wavelength approximation in detector response calculations, thus limiting their ability to handle laser frequency noise. In this study, we introduce DECODE, an end-to-end model focusing on EMRI signal detection by sequence modeling in the frequency domain. Centered around a dilated causal convolutional neural network, trained on synthetic data considering TDI-1.5 detector response, DECODE can efficiently process a year's worth of multichannel TDI data with an SNR of around 50. We evaluate our model on 1-year data with accumulated SNR ranging from 50 to 120 and achieve a true positive rate of 96.3% at a false positive rate of 1%, kee** an inference time of less than 0.01 seconds. With the visualization of three showcased EMRI signals for interpretability and generalization, DECODE exhibits strong potential for future space-based gravitational wave data analyses. △ Less

Submitted 14 May, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: 11 pages, 5 figures, and 2 tables

Journal ref: Phys. Rev. D 109, 084054 (2024)

arXiv:2308.13917 [pdf, other]

Transfer Learning for Microstructure Segmentation with CS-UNet: A Hybrid Algorithm with Transformer and CNN Encoders

Authors: Khaled Alrfou, Tian Zhao, Amir Kordijazi

Abstract: Transfer learning improves the performance of deep learning models by initializing them with parameters pre-trained on larger datasets. Intuitively, transfer learning is more effective when pre-training is on the in-domain datasets. A recent study by NASA has demonstrated that the microstructure segmentation with encoder-decoder algorithms benefits more from CNN encoders pre-trained on microscopy… ▽ More Transfer learning improves the performance of deep learning models by initializing them with parameters pre-trained on larger datasets. Intuitively, transfer learning is more effective when pre-training is on the in-domain datasets. A recent study by NASA has demonstrated that the microstructure segmentation with encoder-decoder algorithms benefits more from CNN encoders pre-trained on microscopy images than from those pre-trained on natural images. However, CNN models only capture the local spatial relations in images. In recent years, attention networks such as Transformers are increasingly used in image analysis to capture the long-range relations between pixels. In this study, we compare the segmentation performance of Transformer and CNN models pre-trained on microscopy images with those pre-trained on natural images. Our result partially confirms the NASA study that the segmentation performance of out-of-distribution images (taken under different imaging and sample conditions) is significantly improved when pre-training on microscopy images. However, the performance gain for one-shot and few-shot learning is more modest with Transformers. We also find that for image segmentation, the combination of pre-trained Transformers and CNN encoders are consistently better than pre-trained CNN encoders alone. Our dataset (of about 50,000 images) combines the public portion of the NASA dataset with additional images we collected. Even with much less training data, our pre-trained models have significantly better performance for image segmentation. This result suggests that Transformers and CNN complement each other and when pre-trained on microscopy images, they are more beneficial to the downstream tasks. △ Less

Submitted 26 August, 2023; originally announced August 2023.

Comments: 21 pages, 8 figures, 11 tables

arXiv:2308.13513 [pdf, other]

Unveiling the Role of Message Passing in Dual-Privacy Preservation on GNNs

Authors: Tianyi Zhao, Hui Hu, Lu Cheng

Abstract: Graph Neural Networks (GNNs) are powerful tools for learning representations on graphs, such as social networks. However, their vulnerability to privacy inference attacks restricts their practicality, especially in high-stake domains. To address this issue, privacy-preserving GNNs have been proposed, focusing on preserving node and/or link privacy. This work takes a step back and investigates how… ▽ More Graph Neural Networks (GNNs) are powerful tools for learning representations on graphs, such as social networks. However, their vulnerability to privacy inference attacks restricts their practicality, especially in high-stake domains. To address this issue, privacy-preserving GNNs have been proposed, focusing on preserving node and/or link privacy. This work takes a step back and investigates how GNNs contribute to privacy leakage. Through theoretical analysis and simulations, we identify message passing under structural bias as the core component that allows GNNs to \textit{propagate} and \textit{amplify} privacy leakage. Building upon these findings, we propose a principled privacy-preserving GNN framework that effectively safeguards both node and link privacy, referred to as dual-privacy preservation. The framework comprises three major modules: a Sensitive Information Obfuscation Module that removes sensitive information from node embeddings, a Dynamic Structure Debiasing Module that dynamically corrects the structural bias, and an Adversarial Learning Module that optimizes the privacy-utility trade-off. Experimental results on four benchmark datasets validate the effectiveness of the proposed model in protecting both node and link privacy while preserving high utility for downstream tasks, such as node classification. △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: CIKM 2023

arXiv:2308.13177 [pdf, other]

How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection

Authors: Yiyang Yao, Peng Liu, Tiancheng Zhao, Qianqian Zhang, Jiajia Liao, Chunxin Fang, Kyusong Lee, Qing Wang

Abstract: Object detection (OD) in computer vision has made significant progress in recent years, transitioning from closed-set labels to open-vocabulary detection (OVD) based on large-scale vision-language pre-training (VLP). However, current evaluation methods and datasets are limited to testing generalization over object types and referral expressions, which do not provide a systematic, fine-grained, and… ▽ More Object detection (OD) in computer vision has made significant progress in recent years, transitioning from closed-set labels to open-vocabulary detection (OVD) based on large-scale vision-language pre-training (VLP). However, current evaluation methods and datasets are limited to testing generalization over object types and referral expressions, which do not provide a systematic, fine-grained, and accurate benchmark of OVD models' abilities. In this paper, we propose a new benchmark named OVDEval, which includes 9 sub-tasks and introduces evaluations on commonsense knowledge, attribute understanding, position understanding, object relation comprehension, and more. The dataset is meticulously created to provide hard negatives that challenge models' true understanding of visual and linguistic input. Additionally, we identify a problem with the popular Average Precision (AP) metric when benchmarking models on these fine-grained label datasets and propose a new metric called Non-Maximum Suppression Average Precision (NMS-AP) to address this issue. Extensive experimental results show that existing top OVD models all fail on the new tasks except for simple object types, demonstrating the value of the proposed dataset in pinpointing the weakness of current OVD models and guiding future research. Furthermore, the proposed NMS-AP metric is verified by experiments to provide a much more truthful evaluation of OVD models, whereas traditional AP metrics yield deceptive results. Data is available at \url{https://github.com/om-ai-lab/OVDEval} △ Less

Submitted 18 December, 2023; v1 submitted 25 August, 2023; originally announced August 2023.

Comments: Long paper accepted at AAAI 2024

arXiv:2308.13159 [pdf, ps, other]

Almost sure scattering for defocusing energy critical Hartree equation on $\R^5$

Authors: Liying Tao, Tengfei Zhao

Abstract: We consider the defocusing energy-critical Hartree equation $i\pa_tu+Δu=(|\cdot|^{-4}\ast|u|^2)u$ in spatial dimension $d=5$ and prove almost sure scattering with initial data $u_0\in H^s_x(\R^5)$ for any $s\in\R$. The proof relies on the modified interaction Morawetz estimate, the stability theories, the ``Narrowed'' Wiener randomization. We are inspired to consider this problem by the work of Sh… ▽ More We consider the defocusing energy-critical Hartree equation $i\pa_tu+Δu=(|\cdot|^{-4}\ast|u|^2)u$ in spatial dimension $d=5$ and prove almost sure scattering with initial data $u_0\in H^s_x(\R^5)$ for any $s\in\R$. The proof relies on the modified interaction Morawetz estimate, the stability theories, the ``Narrowed'' Wiener randomization. We are inspired to consider this problem by the work of Shen-Soffer-Wu \cite{Shen-Soffer-Wu 1}, which treated the analogous problem for the energy-critical Schrödinger equation. The new ingredient in this paper are that we take an alternative proof to give the interaction Morawetz estimate. And the nonlocal nonlinearity term will bring some difficulties. △ Less

Submitted 24 August, 2023; originally announced August 2023.

arXiv:2308.12952 [pdf, other]

BridgeData V2: A Dataset for Robot Learning at Scale

Authors: Homer Walke, Kevin Black, Abraham Lee, Moo ** Kim, Max Du, Chongyi Zheng, Tony Zhao, Philippe Hansen-Estruch, Quan Vuong, Andre He, Vivek Myers, Kuan Fang, Chelsea Finn, Sergey Levine

Abstract: We introduce BridgeData V2, a large and diverse dataset of robotic manipulation behaviors designed to facilitate research on scalable robot learning. BridgeData V2 contains 60,096 trajectories collected across 24 environments on a publicly available low-cost robot. BridgeData V2 provides extensive task and environment variability, leading to skills that can generalize across environments, domains,… ▽ More We introduce BridgeData V2, a large and diverse dataset of robotic manipulation behaviors designed to facilitate research on scalable robot learning. BridgeData V2 contains 60,096 trajectories collected across 24 environments on a publicly available low-cost robot. BridgeData V2 provides extensive task and environment variability, leading to skills that can generalize across environments, domains, and institutions, making the dataset a useful resource for a broad range of researchers. Additionally, the dataset is compatible with a wide variety of open-vocabulary, multi-task learning methods conditioned on goal images or natural language instructions. In our experiments, we train 6 state-of-the-art imitation learning and offline reinforcement learning methods on our dataset, and find that they succeed on a suite of tasks requiring varying amounts of generalization. We also demonstrate that the performance of these methods improves with more data and higher capacity models, and that training on a greater variety of skills leads to improved generalization. By publicly sharing BridgeData V2 and our pre-trained models, we aim to accelerate research in scalable robot learning methods. Project page at https://rail-berkeley.github.io/bridgedata △ Less

Submitted 17 January, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

Comments: 9 pages

arXiv:2308.11627 [pdf, other]

Non-Intrusive Electric Load Monitoring Approach Based on Current Feature Visualization for Smart Energy Management

Authors: Yiwen Xu, Dengfeng Liu, Liangtao Huang, Zhiquan Lin, Tiesong Zhao, Sam Kwong

Abstract: The state-of-the-art smart city has been calling for an economic but efficient energy management over large-scale network, especially for the electric power system. It is a critical issue to monitor, analyze and control electric loads of all users in system. In this paper, we employ the popular computer vision techniques of AI to design a non-invasive load monitoring method for smart electric ener… ▽ More The state-of-the-art smart city has been calling for an economic but efficient energy management over large-scale network, especially for the electric power system. It is a critical issue to monitor, analyze and control electric loads of all users in system. In this paper, we employ the popular computer vision techniques of AI to design a non-invasive load monitoring method for smart electric energy management. First of all, we utilize both signal transforms (including wavelet transform and discrete Fourier transform) and Gramian Angular Field (GAF) methods to map one-dimensional current signals onto two-dimensional color feature images. Second, we propose to recognize all electric loads from color feature images using a U-shape deep neural network with multi-scale feature extraction and attention mechanism. Third, we design our method as a cloud-based, non-invasive monitoring of all users, thereby saving energy cost during electric power system control. Experimental results on both public and our private datasets have demonstrated our method achieves superior performances than its peers, and thus supports efficient energy management over large-scale Internet of Things (IoT). △ Less

Submitted 8 August, 2023; originally announced August 2023.

arXiv:2308.11257 [pdf, other]

HopPG: Self-Iterative Program Generation for Multi-Hop Question Answering over Heterogeneous Knowledge

Authors: Yingyao Wang, Yongwei Zhou, Chaoqun Duan, Junwei Bao, Tiejun Zhao

Abstract: The semantic parsing-based method is an important research branch for knowledge-based question answering. It usually generates executable programs lean upon the question and then conduct them to reason answers over a knowledge base. Benefit from this inherent mechanism, it has advantages in the performance and the interpretability. However, traditional semantic parsing methods usually generate a c… ▽ More The semantic parsing-based method is an important research branch for knowledge-based question answering. It usually generates executable programs lean upon the question and then conduct them to reason answers over a knowledge base. Benefit from this inherent mechanism, it has advantages in the performance and the interpretability. However, traditional semantic parsing methods usually generate a complete program before executing it, which struggles with multi-hop question answering over heterogeneous knowledge. On one hand, generating a complete multi-hop program relies on multiple heterogeneous supporting facts, and it is difficult for generators to understand these facts simultaneously. On the other hand, this way ignores the semantic information of the intermediate answers at each hop, which is beneficial for subsequent generation. To alleviate these challenges, we propose a self-iterative framework for multi-hop program generation (HopPG) over heterogeneous knowledge, which leverages the previous execution results to retrieve supporting facts and generate subsequent programs hop by hop. We evaluate our model on MMQA-T^2, and the experimental results show that HopPG outperforms existing semantic-parsing-based baselines, especially on the multi-hop questions. △ Less

Submitted 10 September, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

arXiv:2308.02530 [pdf, other]

Gated Driver Attention Predictor

Authors: Tianci Zhao, Xue Bai, Jianwu Fang, Jianru Xue

Abstract: Driver attention prediction implies the intention understanding of where the driver intends to go and what object the driver concerned about, which commonly provides a driving task-guided traffic scene understanding. Some recent works explore driver attention prediction in critical or accident scenarios and find a positive role in hel** accident prediction, while the promotion ability is constra… ▽ More Driver attention prediction implies the intention understanding of where the driver intends to go and what object the driver concerned about, which commonly provides a driving task-guided traffic scene understanding. Some recent works explore driver attention prediction in critical or accident scenarios and find a positive role in hel** accident prediction, while the promotion ability is constrained by the prediction accuracy of driver attention maps. In this work, we explore the network connection gating mechanism for driver attention prediction (Gate-DAP). Gate-DAP aims to learn the importance of different spatial, temporal, and modality information in driving scenarios with various road types, occasions, and light and weather conditions. The network connection gating in Gate-DAP consists of a spatial encoding network gating, long-short-term memory network gating, and information type gating modules. Each connection gating operation is plug-and-play and can be flexibly assembled, which makes the architecture of Gate-DAP transparent for evaluating different spatial, temporal, and information types for driver attention prediction. Evaluations on DADA-2000 and BDDA datasets verify the superiority of the proposed method with the comparison with state-of-the-art approaches. The code is available on https://github.com/JWFangit/Gate-DAP. △ Less

Submitted 31 July, 2023; originally announced August 2023.

Comments: Accepted by ITSC2023

arXiv:2307.16457 [pdf, other]

A Benchmark for Understanding Dialogue Safety in Mental Health Support

Authors: Huachuan Qiu, Tong Zhao, Anqi Li, Shuai Zhang, Hongliang He, Zhenzhong Lan

Abstract: Dialogue safety remains a pervasive challenge in open-domain human-machine interaction. Existing approaches propose distinctive dialogue safety taxonomies and datasets for detecting explicitly harmful responses. However, these taxonomies may not be suitable for analyzing response safety in mental health support. In real-world interactions, a model response deemed acceptable in casual conversations… ▽ More Dialogue safety remains a pervasive challenge in open-domain human-machine interaction. Existing approaches propose distinctive dialogue safety taxonomies and datasets for detecting explicitly harmful responses. However, these taxonomies may not be suitable for analyzing response safety in mental health support. In real-world interactions, a model response deemed acceptable in casual conversations might have a negligible positive impact on users seeking mental health support. To address these limitations, this paper aims to develop a theoretically and factually grounded taxonomy that prioritizes the positive impact on help-seekers. Additionally, we create a benchmark corpus with fine-grained labels for each dialogue session to facilitate further research. We analyze the dataset using popular language models, including BERT-base, RoBERTa-large, and ChatGPT, to detect and understand unsafe responses within the context of mental health support. Our study reveals that ChatGPT struggles to detect safety categories with detailed safety definitions in a zero- and few-shot paradigm, whereas the fine-tuned model proves to be more suitable. The developed dataset and findings serve as valuable benchmarks for advancing research on dialogue safety in mental health support, with significant implications for improving the design and deployment of conversation agents in real-world applications. We release our code and data here: https://github.com/qiuhuachuan/DialogueSafety. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: accepted to The 12th CCF International Conference on Natural Language Processing and Chinese Computing (NLPCC2023)

arXiv:2307.16416 [pdf, other]

MRA-GNN: Minutiae Relation-Aware Model over Graph Neural Network for Fingerprint Embedding

Authors: Yapeng Su, Tong Zhao, Zicheng Zhang

Abstract: Deep learning has achieved remarkable results in fingerprint embedding, which plays a critical role in modern Automated Fingerprint Identification Systems. However, previous works including CNN-based and Transformer-based approaches fail to exploit the nonstructural data, such as topology and correlation in fingerprints, which is essential to facilitate the identifiability and robustness of embedd… ▽ More Deep learning has achieved remarkable results in fingerprint embedding, which plays a critical role in modern Automated Fingerprint Identification Systems. However, previous works including CNN-based and Transformer-based approaches fail to exploit the nonstructural data, such as topology and correlation in fingerprints, which is essential to facilitate the identifiability and robustness of embedding. To address this challenge, we propose a novel paradigm for fingerprint embedding, called Minutiae Relation-Aware model over Graph Neural Network (MRA-GNN). Our proposed approach incorporates a GNN-based framework in fingerprint embedding to encode the topology and correlation of fingerprints into descriptive features, achieving fingerprint representation in the form of graph embedding. Specifically, we reinterpret fingerprint data and their relative connections as vertices and edges respectively, and introduce a minutia graph and fingerprint graph to represent the topological relations and correlation structures of fingerprints. We equip MRA-GNN with a Topological relation Reasoning Module (TRM) and Correlation-Aware Module (CAM) to learn the fingerprint embedding from these graphs successfully. To tackle the over-smoothing problem in GNN models, we incorporate Feed-Forward Module and graph residual connections into proposed modules. The experimental results demonstrate that our proposed approach outperforms state-of-the-art methods on various fingerprint datasets, indicating the effectiveness of our approach in exploiting nonstructural information of fingerprints. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: 10 pages, 6 figures, accepted by IJCB 2023

arXiv:2307.14326 [pdf, other]

Waypoint-Based Imitation Learning for Robotic Manipulation

Authors: Lucy Xiaoyang Shi, Archit Sharma, Tony Z. Zhao, Chelsea Finn

Abstract: While imitation learning methods have seen a resurgent interest for robotic manipulation, the well-known problem of compounding errors continues to afflict behavioral cloning (BC). Waypoints can help address this problem by reducing the horizon of the learning problem for BC, and thus, the errors compounded over time. However, waypoint labeling is underspecified, and requires additional human supe… ▽ More While imitation learning methods have seen a resurgent interest for robotic manipulation, the well-known problem of compounding errors continues to afflict behavioral cloning (BC). Waypoints can help address this problem by reducing the horizon of the learning problem for BC, and thus, the errors compounded over time. However, waypoint labeling is underspecified, and requires additional human supervision. Can we generate waypoints automatically without any additional human supervision? Our key insight is that if a trajectory segment can be approximated by linear motion, the endpoints can be used as waypoints. We propose Automatic Waypoint Extraction (AWE) for imitation learning, a preprocessing module to decompose a demonstration into a minimal set of waypoints which when interpolated linearly can approximate the trajectory up to a specified error threshold. AWE can be combined with any BC algorithm, and we find that AWE can increase the success rate of state-of-the-art algorithms by up to 25% in simulation and by 4-28% on real-world bimanual manipulation tasks, reducing the decision making horizon by up to a factor of 10. Videos and code are available at https://lucys0.github.io/awe/ △ Less

Submitted 26 July, 2023; originally announced July 2023.

Comments: The first two authors contributed equally

arXiv:2307.12975 [pdf, ps, other]

Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems

Authors: Xiang Ji, Huazheng Wang, Minshuo Chen, Tuo Zhao, Mengdi Wang

Abstract: For a real-world decision-making problem, the reward function often needs to be engineered or learned. A popular approach is to utilize human feedback to learn a reward function for training. The most straightforward way to do so is to ask humans to provide ratings for state-action pairs on an absolute scale and take these ratings as reward samples directly. Another popular way is to ask humans to… ▽ More For a real-world decision-making problem, the reward function often needs to be engineered or learned. A popular approach is to utilize human feedback to learn a reward function for training. The most straightforward way to do so is to ask humans to provide ratings for state-action pairs on an absolute scale and take these ratings as reward samples directly. Another popular way is to ask humans to rank a small set of state-action pairs by preference and learn a reward function from these preference data. Recently, preference-based methods have demonstrated substantial success in empirical applications such as InstructGPT. In this work, we develop a theoretical comparison between these human feedback approaches in offline contextual bandits and show how human bias and uncertainty in feedback modelings can affect the theoretical guarantees of these approaches. Through this, our results seek to provide a theoretical explanation for the empirical successes of preference-based methods from a modeling perspective. △ Less

Submitted 28 October, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

arXiv:2307.11699 [pdf, other]

Co-Design with Myself: A Brain-Computer Interface Design Tool that Predicts Live Emotion to Enhance Metacognitive Monitoring of Designers

Authors: Qi Yang, Shuo Feng, Tianlin Zhao, Saleh Kalantari

Abstract: Intuition, metacognition, and subjective uncertainty interact in complex ways to shape the creative design process. Design intuition, a designer's innate ability to generate creative ideas and solutions based on implicit knowledge and experience, is often evaluated and refined through metacognitive monitoring. This self-awareness and management of cognitive processes can be triggered by subjective… ▽ More Intuition, metacognition, and subjective uncertainty interact in complex ways to shape the creative design process. Design intuition, a designer's innate ability to generate creative ideas and solutions based on implicit knowledge and experience, is often evaluated and refined through metacognitive monitoring. This self-awareness and management of cognitive processes can be triggered by subjective uncertainty, reflecting the designer's self-assessed confidence in their decisions. Despite their significance, few creativity support tools have targeted the enhancement of these intertwined components using biofeedback, particularly the affect associated with these processes. In this study, we introduce "Multi-Self," a BCI-VR design tool designed to amplify metacognitive monitoring in architectural design. Multi-Self evaluates designers' affect (valence and arousal) to their work, providing real-time, visual biofeedback. A proof-of-concept pilot study with 24 participants assessed its feasibility. While feedback accuracy responses were mixed, most participants found the tool useful, reporting that it sparked metacognitive monitoring, encouraged exploration of the design space, and helped modulate subjective uncertainty. △ Less

Submitted 21 July, 2023; originally announced July 2023.

arXiv:2307.10636 [pdf, other]

doi 10.1145/3581783.3612831

Learning and Evaluating Human Preferences for Conversational Head Generation

Authors: Mohan Zhou, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao, Tao Mei

Abstract: A reliable and comprehensive evaluation metric that aligns with manual preference assessments is crucial for conversational head video synthesis methods development. Existing quantitative evaluations often fail to capture the full complexity of human preference, as they only consider limited evaluation dimensions. Qualitative evaluations and user studies offer a solution but are time-consuming and… ▽ More A reliable and comprehensive evaluation metric that aligns with manual preference assessments is crucial for conversational head video synthesis methods development. Existing quantitative evaluations often fail to capture the full complexity of human preference, as they only consider limited evaluation dimensions. Qualitative evaluations and user studies offer a solution but are time-consuming and labor-intensive. This limitation hinders the advancement of conversational head generation algorithms and systems. In this paper, we propose a novel learning-based evaluation metric named Preference Score (PS) for fitting human preference according to the quantitative evaluations across different dimensions. PS can serve as a quantitative evaluation without the need for human annotation. Experimental results validate the superiority of Preference Score in aligning with human perception, and also demonstrate robustness and generalizability to unseen data, making it a valuable tool for advancing conversation head generation. We expect this metric could facilitate new advances in conversational head generation. Project Page: https://https://github.com/dc3ea9f/PreferenceScore. △ Less

Submitted 2 August, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

Comments: Accepted by ACM Multimedia 2023

arXiv:2307.08209 [pdf, other]

Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection

Authors: Tianchen Zhao, Xuefei Ning, Ke Hong, Zhongyuan Qiu, Pu Lu, Yali Zhao, Linfeng Zhang, Lipu Zhou, Guohao Dai, Huazhong Yang, Yu Wang

Abstract: Voxel-based methods have achieved state-of-the-art performance for 3D object detection in autonomous driving. However, their significant computational and memory costs pose a challenge for their application to resource-constrained vehicles. One reason for this high resource consumption is the presence of a large number of redundant background points in Lidar point clouds, resulting in spatial redu… ▽ More Voxel-based methods have achieved state-of-the-art performance for 3D object detection in autonomous driving. However, their significant computational and memory costs pose a challenge for their application to resource-constrained vehicles. One reason for this high resource consumption is the presence of a large number of redundant background points in Lidar point clouds, resulting in spatial redundancy in both 3D voxel and dense BEV map representations. To address this issue, we propose an adaptive inference framework called Ada3D, which focuses on exploiting the input-level spatial redundancy. Ada3D adaptively filters the redundant input, guided by a lightweight importance predictor and the unique properties of the Lidar point cloud. Additionally, we utilize the BEV features' intrinsic sparsity by introducing the Sparsity Preserving Batch Normalization. With Ada3D, we achieve 40% reduction for 3D voxels and decrease the density of 2D BEV feature maps from 100% to 20% without sacrificing accuracy. Ada3D reduces the model computational and memory cost by 5x, and achieves 1.52x/1.45x end-to-end GPU latency and 1.5x/4.5x GPU peak memory optimization for the 3D and 2D backbone respectively. △ Less

Submitted 8 August, 2023; v1 submitted 16 July, 2023; originally announced July 2023.

Comments: Accepted at ICCV2023

arXiv:2307.06736 [pdf, other]

MPR-Net:Multi-Scale Pattern Reproduction Guided Universality Time Series Interpretable Forecasting

Authors: Tianlong Zhao, Xiang Ma, Xuemei Li, Caiming Zhang

Abstract: Time series forecasting has received wide interest from existing research due to its broad applications and inherent challenging. The research challenge lies in identifying effective patterns in historical series and applying them to future forecasting. Advanced models based on point-wise connected MLP and Transformer architectures have strong fitting power, but their secondary computational compl… ▽ More Time series forecasting has received wide interest from existing research due to its broad applications and inherent challenging. The research challenge lies in identifying effective patterns in historical series and applying them to future forecasting. Advanced models based on point-wise connected MLP and Transformer architectures have strong fitting power, but their secondary computational complexity limits practicality. Additionally, those structures inherently disrupt the temporal order, reducing the information utilization and making the forecasting process uninterpretable. To solve these problems, this paper proposes a forecasting model, MPR-Net. It first adaptively decomposes multi-scale historical series patterns using convolution operation, then constructs a pattern extension forecasting method based on the prior knowledge of pattern reproduction, and finally reconstructs future patterns into future series using deconvolution operation. By leveraging the temporal dependencies present in the time series, MPR-Net not only achieves linear time complexity, but also makes the forecasting process interpretable. By carrying out sufficient experiments on more than ten real data sets of both short and long term forecasting tasks, MPR-Net achieves the state of the art forecasting performance, as well as good generalization and robustness performance. △ Less

Submitted 13 July, 2023; originally announced July 2023.

arXiv:2307.05857 [pdf, other]

FAIRO: Fairness-aware Adaptation in Sequential-Decision Making for Human-in-the-Loop Systems

Authors: Tianyu Zhao, Mojtaba Taherisadr, Salma Elmalaki

Abstract: Achieving fairness in sequential-decision making systems within Human-in-the-Loop (HITL) environments is a critical concern, especially when multiple humans with different behavior and expectations are affected by the same adaptation decisions in the system. This human variability factor adds more complexity since policies deemed fair at one point in time may become discriminatory over time due to… ▽ More Achieving fairness in sequential-decision making systems within Human-in-the-Loop (HITL) environments is a critical concern, especially when multiple humans with different behavior and expectations are affected by the same adaptation decisions in the system. This human variability factor adds more complexity since policies deemed fair at one point in time may become discriminatory over time due to variations in human preferences resulting from inter- and intra-human variability. This paper addresses the fairness problem from an equity lens, considering human behavior variability, and the changes in human preferences over time. We propose FAIRO, a novel algorithm for fairness-aware sequential-decision making in HITL adaptation, which incorporates these notions into the decision-making process. In particular, FAIRO decomposes this complex fairness task into adaptive sub-tasks based on individual human preferences through leveraging the Options reinforcement learning framework. We design FAIRO to generalize to three types of HITL application setups that have the shared adaptation decision problem. Furthermore, we recognize that fairness-aware policies can sometimes conflict with the application's utility. To address this challenge, we provide a fairness-utility tradeoff in FAIRO, allowing system designers to balance the objectives of fairness and utility based on specific application requirements. Extensive evaluations of FAIRO on the three HITL applications demonstrate its generalizability and effectiveness in promoting fairness while accounting for human variability. On average, FAIRO can improve fairness compared with other methods across all three applications by 35.36%. △ Less

Submitted 6 November, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

arXiv:2307.02090 [pdf, other]

Interactive Conversational Head Generation

Authors: Mohan Zhou, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao

Abstract: We introduce a new conversation head generation benchmark for synthesizing behaviors of a single interlocutor in a face-to-face conversation. The capability to automatically synthesize interlocutors which can participate in long and multi-turn conversations is vital and offer benefits for various applications, including digital humans, virtual agents, and social robots. While existing research pri… ▽ More We introduce a new conversation head generation benchmark for synthesizing behaviors of a single interlocutor in a face-to-face conversation. The capability to automatically synthesize interlocutors which can participate in long and multi-turn conversations is vital and offer benefits for various applications, including digital humans, virtual agents, and social robots. While existing research primarily focuses on talking head generation (one-way interaction), hindering the ability to create a digital human for conversation (two-way) interaction due to the absence of listening and interaction parts. In this work, we construct two datasets to address this issue, ``ViCo'' for independent talking and listening head generation tasks at the sentence level, and ``ViCo-X'', for synthesizing interlocutors in multi-turn conversational scenarios. Based on ViCo and ViCo-X, we define three novel tasks targeting the interaction modeling during the face-to-face conversation: 1) responsive listening head generation making listeners respond actively to the speaker with non-verbal signals, 2) expressive talking head generation guiding speakers to be aware of listeners' behaviors, and 3) conversational head generation to integrate the talking/listening ability in one interlocutor. Along with the datasets, we also propose corresponding baseline solutions to the three aforementioned tasks. Experimental results show that our baseline method could generate responsive and vivid agents that can collaborate with real person to fulfil the whole conversation. Project page: https://vico.solutions/. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: arXiv admin note: text overlap with arXiv:2112.13548

arXiv:2307.02049 [pdf]

Graph Neural Network-based Power Flow Model

Authors: Mingjian Tuo, Xingpeng Li, Tianxia Zhao

Abstract: Power flow analysis plays a crucial role in examining the electricity flow within a power system network. By performing power flow calculations, the system's steady-state variables, including voltage magnitude, phase angle at each bus, active/reactive power flow across branches, can be determined. While the widely used DC power flow model offers speed and robustness, it may yield inaccurate line f… ▽ More Power flow analysis plays a crucial role in examining the electricity flow within a power system network. By performing power flow calculations, the system's steady-state variables, including voltage magnitude, phase angle at each bus, active/reactive power flow across branches, can be determined. While the widely used DC power flow model offers speed and robustness, it may yield inaccurate line flow results for certain transmission lines. This issue becomes more critical when dealing with renewable energy sources such as wind farms, which are often located far from the main grid. Obtaining precise line flow results for these critical lines is vital for next operations. To address these challenges, data-driven approaches leverage historical grid profiles. In this paper, a graph neural network (GNN) model is trained using historical power system data to predict power flow outcomes. The GNN model enables rapid estimation of line flows. A comprehensive performance analysis is conducted, comparing the proposed GNN-based power flow model with the traditional DC power flow model, as well as deep neural network (DNN) and convolutional neural network (CNN). The results on test systems demonstrate that the proposed GNN-based power flow model provides more accurate solutions with high efficiency comparing to benchmark models. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: arXiv admin note: text overlap with arXiv:2112.08418

arXiv:2307.01649 [pdf, other]

Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks

Authors: Kaiqi Zhang, Zixuan Zhang, Minshuo Chen, Yuma Takeda, Mengdi Wang, Tuo Zhao, Yu-Xiang Wang

Abstract: Convolutional residual neural networks (ConvResNets), though overparameterized, can achieve remarkable prediction performance in practice, which cannot be well explained by conventional wisdom. To bridge this gap, we study the performance of ConvResNeXts, which cover ConvResNets as a special case, trained with weight decay from the perspective of nonparametric classification. Our analysis allows f… ▽ More Convolutional residual neural networks (ConvResNets), though overparameterized, can achieve remarkable prediction performance in practice, which cannot be well explained by conventional wisdom. To bridge this gap, we study the performance of ConvResNeXts, which cover ConvResNets as a special case, trained with weight decay from the perspective of nonparametric classification. Our analysis allows for infinitely many building blocks in ConvResNeXts, and shows that weight decay implicitly enforces sparsity on these blocks. Specifically, we consider a smooth target function supported on a low-dimensional manifold, then prove that ConvResNeXts can adapt to the function smoothness and low-dimensional structures and efficiently learn the function without suffering from the curse of dimensionality. Our findings partially justify the advantage of overparameterized ConvResNeXts over conventional machine learning models. △ Less

Submitted 17 February, 2024; v1 submitted 4 July, 2023; originally announced July 2023.

Comments: 20 pages, 1 figure

arXiv:2307.00863 [pdf, ps, other]

Thompson Sampling under Bernoulli Rewards with Local Differential Privacy

Authors: Bo Jiang, Tianchi Zhao, Ming Li

Abstract: This paper investigates the problem of regret minimization for multi-armed bandit (MAB) problems with local differential privacy (LDP) guarantee. Given a fixed privacy budget $ε$, we consider three privatizing mechanisms under Bernoulli scenario: linear, quadratic and exponential mechanisms. Under each mechanism, we derive stochastic regret bound for Thompson Sampling algorithm. Finally, we simula… ▽ More This paper investigates the problem of regret minimization for multi-armed bandit (MAB) problems with local differential privacy (LDP) guarantee. Given a fixed privacy budget $ε$, we consider three privatizing mechanisms under Bernoulli scenario: linear, quadratic and exponential mechanisms. Under each mechanism, we derive stochastic regret bound for Thompson Sampling algorithm. Finally, we simulate to illustrate the convergence of different mechanisms under different privacy budgets. △ Less

Submitted 3 July, 2023; originally announced July 2023.

Comments: Accepted by ICML 22 workshop

arXiv:2306.16564 [pdf, other]

Pareto Optimal Learning for Estimating Large Language Model Errors

Authors: Theodore Zhao, Mu Wei, J. Samuel Preston, Hoifung Poon

Abstract: Large Language Models (LLMs) have shown impressive abilities in many applications. When a concrete and precise answer is desired, it is important to have a quantitative estimation of the potential error rate. However, this can be challenging due to the text-in-text-out nature of generative models. We present a method based on Pareto optimization that generates a risk score to estimate the probabil… ▽ More Large Language Models (LLMs) have shown impressive abilities in many applications. When a concrete and precise answer is desired, it is important to have a quantitative estimation of the potential error rate. However, this can be challenging due to the text-in-text-out nature of generative models. We present a method based on Pareto optimization that generates a risk score to estimate the probability of error in an LLM response by integrating multiple sources of information. We prove theoretically that the error estimator optimized in our framework aligns with the LLM and the information sources in an Pareto optimal manner. Experimental results show that the risk scores estimated by our method are well correlated with the true LLM error rate, thus facilitating error correction. By dynamically combining with prompting strategies such as self-verification and information retrieval, we demonstrate the proposed method can be utilized to increase the performance of an LLM, surpassing state-of-the-art task specific models. △ Less

Submitted 22 May, 2024; v1 submitted 28 June, 2023; originally announced June 2023.

arXiv:2306.16181 [pdf, other]

Learning to Pan-sharpening with Memories of Spatial Details

Authors: Maoxun Yuan, Tianyi Zhao, Bo Li, Xingxing Wei

Abstract: Pan-sharpening, as one of the most commonly used techniques in remote sensing systems, aims to inject spatial details from panchromatic images into multispectral images (MS) to obtain high-resolution multispectral images. Since deep learning has received widespread attention because of its powerful fitting ability and efficient feature extraction, a variety of pan-sharpening methods have been prop… ▽ More Pan-sharpening, as one of the most commonly used techniques in remote sensing systems, aims to inject spatial details from panchromatic images into multispectral images (MS) to obtain high-resolution multispectral images. Since deep learning has received widespread attention because of its powerful fitting ability and efficient feature extraction, a variety of pan-sharpening methods have been proposed to achieve remarkable performance. However, current pan-sharpening methods usually require the paired panchromatic (PAN) and MS images as input, which limits their usage in some scenarios. To address this issue, in this paper we observe that the spatial details from PAN images are mainly high-frequency cues, i.e., the edges reflect the contour of input PAN images. This motivates us to develop a PAN-agnostic representation to store some base edges, so as to compose the contour for the corresponding PAN image via them. As a result, we can perform the pan-sharpening task with only the MS image when inference. To this end, a memory-based network is adapted to extract and memorize the spatial details during the training phase and is used to replace the process of obtaining spatial information from PAN images when inference, which is called Memory-based Spatial Details Network (MSDN). Finally, we integrate the proposed MSDN module into the existing deep learning-based pan-sharpening methods to achieve an end-to-end pan-sharpening network. With extensive experiments on the Gaofen1 and WorldView-4 satellites, we verify that our method constructs good spatial details without PAN images and achieves the best performance. The code is available at https://github.com/Zhao-Tian-yi/Learning-to-Pan-sharpening-with-Memories-of-Spatial-Details.git. △ Less

Submitted 8 August, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

arXiv:2306.14859 [pdf, other]

Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories

Authors: Zixuan Zhang, Minshuo Chen, Mengdi Wang, Wen**g Liao, Tuo Zhao

Abstract: Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to the intrinsic data structures. In real world applications, such an assumption of data lying exactly on a low dimensional manifold is stringent. This paper introduces a relaxed assumption that the input data are concentrated around a subset of… ▽ More Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to the intrinsic data structures. In real world applications, such an assumption of data lying exactly on a low dimensional manifold is stringent. This paper introduces a relaxed assumption that the input data are concentrated around a subset of $\mathbb{R}^d$ denoted by $\mathcal{S}$, and the intrinsic dimension of $\mathcal{S}$ can be characterized by a new complexity notation -- effective Minkowski dimension. We prove that, the sample complexity of deep nonparametric regression only depends on the effective Minkowski dimension of $\mathcal{S}$ denoted by $p$. We further illustrate our theoretical findings by considering nonparametric regression with an anisotropic Gaussian random design $N(0,Σ)$, where $Σ$ is full rank. When the eigenvalues of $Σ$ have an exponential or polynomial decay, the effective Minkowski dimension of such an Gaussian random design is $p=\mathcal{O}(\sqrt{\log n})$ or $p=\mathcal{O}(n^γ)$, respectively, where $n$ is the sample size and $γ\in(0,1)$ is a small constant depending on the polynomial decay rate. Our theory shows that, when the manifold assumption does not hold, deep neural networks can still adapt to the effective Minkowski dimension of the data, and circumvent the curse of the ambient dimensionality for moderate sample sizes. △ Less

Submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.12859

Reinforcement Federated Learning Method Based on Adaptive OPTICS Clustering

Authors: Tianyu Zhao, Jun** Du, Yingxia Shao, Zeli Guan

Abstract: Federated learning is a distributed machine learning technology, which realizes the balance between data privacy protection and data sharing computing. To protect data privacy, feder-ated learning learns shared models by locally executing distributed training on participating devices and aggregating local models into global models. There is a problem in federated learning, that is, the negative im… ▽ More Federated learning is a distributed machine learning technology, which realizes the balance between data privacy protection and data sharing computing. To protect data privacy, feder-ated learning learns shared models by locally executing distributed training on participating devices and aggregating local models into global models. There is a problem in federated learning, that is, the negative impact caused by the non-independent and identical distribu-tion of data across different user terminals. In order to alleviate this problem, this paper pro-poses a strengthened federation aggregation method based on adaptive OPTICS clustering. Specifically, this method perceives the clustering environment as a Markov decision process, and models the adjustment process of parameter search direction, so as to find the best clus-tering parameters to achieve the best federated aggregation method. The core contribution of this paper is to propose an adaptive OPTICS clustering algorithm for federated learning. The algorithm combines OPTICS clustering and adaptive learning technology, and can effective-ly deal with the problem of non-independent and identically distributed data across different user terminals. By perceiving the clustering environment as a Markov decision process, the goal is to find the best parameters of the OPTICS cluster without artificial assistance, so as to obtain the best federated aggregation method and achieve better performance. The reliability and practicability of this method have been verified on the experimental data, and its effec-tiveness and superiority have been proved. △ Less

Submitted 22 June, 2023; v1 submitted 22 June, 2023; originally announced June 2023.

Comments: There is a declarative error in the method part, we want to withdraw the revision first, so as not to mislead others

arXiv:2306.12020 [pdf, other]

doi 10.1109/ICASSP49357.2023.10095084

Visual-Aware Text-to-Speech

Authors: Mohan Zhou, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao, Tao Mei

Abstract: Dynamically synthesizing talking speech that actively responds to a listening head is critical during the face-to-face interaction. For example, the speaker could take advantage of the listener's facial expression to adjust the tones, stressed syllables, or pauses. In this work, we present a new visual-aware text-to-speech (VA-TTS) task to synthesize speech conditioned on both textual inputs and s… ▽ More Dynamically synthesizing talking speech that actively responds to a listening head is critical during the face-to-face interaction. For example, the speaker could take advantage of the listener's facial expression to adjust the tones, stressed syllables, or pauses. In this work, we present a new visual-aware text-to-speech (VA-TTS) task to synthesize speech conditioned on both textual inputs and sequential visual feedback (e.g., nod, smile) of the listener in face-to-face communication. Different from traditional text-to-speech, VA-TTS highlights the impact of visual modality. On this newly-minted task, we devise a baseline model to fuse phoneme linguistic information and listener visual signals for speech synthesis. Extensive experiments on multimodal conversation dataset ViCo-X verify our proposal for generating more natural audio with scenario-appropriate rhythm and prosody. △ Less

Submitted 21 June, 2023; originally announced June 2023.

Comments: accepted as oral and top 3% paper by ICASSP 2023

Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023, 1-5

arXiv:2306.11868 [pdf, other]

Multiverse Transformer: 1st Place Solution for Waymo Open Sim Agents Challenge 2023

Authors: Yu Wang, Tiebiao Zhao, Fan Yi

Abstract: This technical report presents our 1st place solution for the Waymo Open Sim Agents Challenge (WOSAC) 2023. Our proposed MultiVerse Transformer for Agent simulation (MVTA) effectively leverages transformer-based motion prediction approaches, and is tailored for closed-loop simulation of agents. In order to produce simulations with a high degree of realism, we design novel training and sampling met… ▽ More This technical report presents our 1st place solution for the Waymo Open Sim Agents Challenge (WOSAC) 2023. Our proposed MultiVerse Transformer for Agent simulation (MVTA) effectively leverages transformer-based motion prediction approaches, and is tailored for closed-loop simulation of agents. In order to produce simulations with a high degree of realism, we design novel training and sampling methods, and implement a receding horizon prediction mechanism. In addition, we introduce a variable-length history aggregation method to mitigate the compounding error that can arise during closed-loop autoregressive execution. On the WOSAC, our MVTA and its enhanced version MVTE reach a realism meta-metric of 0.5091 and 0.5168, respectively, outperforming all the other methods on the leaderboard. △ Less

Submitted 20 June, 2023; originally announced June 2023.

Comments: Technical report for the 1st place solution of Waymo Open Sim Agents Challenge 2023. Project page: https://multiverse-transformer.github.io/sim-agents/. CVPR 2023 workshop on Autonomous Driving: https://cvpr2023.wad.vision/

arXiv:2306.11300 [pdf, other]

RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing

Authors: Zilun Zhang, Tiancheng Zhao, Yulong Guo, Jianwei Yin

Abstract: Pre-trained Vision-Language Models (VLMs) utilizing extensive image-text paired data have demonstrated unprecedented image-text association capabilities, achieving remarkable results across various downstream tasks. A critical challenge is how to make use of existing large-scale pre-trained VLMs, which are trained on common objects, to perform the domain-specific transfer for accomplishing domain-… ▽ More Pre-trained Vision-Language Models (VLMs) utilizing extensive image-text paired data have demonstrated unprecedented image-text association capabilities, achieving remarkable results across various downstream tasks. A critical challenge is how to make use of existing large-scale pre-trained VLMs, which are trained on common objects, to perform the domain-specific transfer for accomplishing domain-related downstream tasks. A critical challenge is how to make use of existing large-scale pre-trained VLMs, which are trained on common objects, to perform the domain-specific transfer for accomplishing domain-related downstream tasks. In this paper, we propose a new framework that includes the Domain pre-trained Vision-Language Model (DVLM), bridging the gap between the General Vision-Language Model (GVLM) and domain-specific downstream tasks. Moreover, we present an image-text paired dataset in the field of remote sensing (RS), RS5M, which has 5 million RS images with English descriptions. The dataset is obtained from filtering publicly available image-text paired datasets and captioning label-only RS datasets with pre-trained VLM. These constitute the first large-scale RS image-text paired dataset. Additionally, we fine-tuned the CLIP model and tried several Parameter-Efficient Fine-Tuning methods on RS5M to implement the DVLM. Experimental results show that our proposed dataset is highly effective for various tasks, and our model GeoRSCLIP improves upon the baseline or previous state-of-the-art model by $3\%\sim20\%$ in Zero-shot Classification (ZSC), $3\%\sim6\%$ in Remote Sensing Cross-Modal Text-Image Retrieval (RSCTIR) and $4\%\sim5\%$ in Semantic Localization (SeLo) tasks. Dataset and models have been released in: \url{https://github.com/om-ai-lab/RS5M}. △ Less

Submitted 2 January, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

Comments: RS5M dataset v5

arXiv:2306.11222 [pdf, other]

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Authors: Yixiao Li, Yifan Yu, Qingru Zhang, Chen Liang, Pengcheng He, Weizhu Chen, Tuo Zhao

Abstract: Transformer models have achieved remarkable results in various natural language tasks, but they are often prohibitively large, requiring massive memories and computational resources. To reduce the size and complexity of these models, we propose LoSparse (Low-Rank and Sparse approximation), a novel model compression technique that approximates a weight matrix by the sum of a low-rank matrix and a s… ▽ More Transformer models have achieved remarkable results in various natural language tasks, but they are often prohibitively large, requiring massive memories and computational resources. To reduce the size and complexity of these models, we propose LoSparse (Low-Rank and Sparse approximation), a novel model compression technique that approximates a weight matrix by the sum of a low-rank matrix and a sparse matrix. Our method combines the advantages of both low-rank approximations and pruning, while avoiding their limitations. Low-rank approximation compresses the coherent and expressive parts in neurons, while pruning removes the incoherent and non-expressive parts in neurons. Pruning enhances the diversity of low-rank approximations, and low-rank approximation prevents pruning from losing too many expressive neurons. We evaluate our method on natural language understanding, question answering, and natural language generation tasks. We show that it significantly outperforms existing compression methods. △ Less

Submitted 26 June, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

arXiv:2306.09841 [pdf, other]

Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond

Authors: Fangzhi Xu, Qika Lin, Jiawei Han, Tianzhe Zhao, Jun Liu, Erik Cambria

Abstract: Logical reasoning consistently plays a fundamental and significant role in the domains of knowledge engineering and artificial intelligence. Recently, Large Language Models (LLMs) have emerged as a noteworthy innovation in natural language processing (NLP), exhibiting impressive achievements across various classic NLP tasks. However, the question of whether LLMs can effectively address the task of… ▽ More Logical reasoning consistently plays a fundamental and significant role in the domains of knowledge engineering and artificial intelligence. Recently, Large Language Models (LLMs) have emerged as a noteworthy innovation in natural language processing (NLP), exhibiting impressive achievements across various classic NLP tasks. However, the question of whether LLMs can effectively address the task of logical reasoning, which requires gradual cognitive inference similar to human intelligence, remains unanswered. To this end, we aim to bridge this gap and provide comprehensive evaluations in this paper. Firstly, to offer systematic evaluations, we select fifteen typical logical reasoning datasets and organize them into deductive, inductive, abductive and mixed-form reasoning settings. Considering the comprehensiveness of evaluations, we include three representative LLMs (i.e., text-davinci-003, ChatGPT and BARD) and evaluate them on all selected datasets under zero-shot, one-shot and three-shot settings. Secondly, different from previous evaluations relying only on simple metrics (e.g., accuracy), we propose fine-level evaluations from objective and subjective manners, covering both answers and explanations. Additionally, to uncover the logical flaws of LLMs, problematic cases will be attributed to five error types from two dimensions, i.e., evidence selection process and reasoning process. Thirdly, to avoid the influences of knowledge bias and purely focus on benchmarking the logical reasoning capability of LLMs, we propose a new dataset with neutral content. It contains 3,000 samples and covers deductive, inductive and abductive settings. Based on the in-depth evaluations, this paper finally forms a general evaluation scheme of logical reasoning capability from six dimensions. It reflects the pros and cons of LLMs and gives guiding directions for future works. △ Less

Submitted 8 August, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

Comments: 14 pages, 11 figures

arXiv:2306.09832 [pdf, ps, other]

On the generalizations of global dimensions and singularity categories

Authors: Xiaolei Zhang, Tiwei Zhao, Dingguo Wang

Abstract: For each $n\in\mathbb{N}\cup\{\infty\}$, we introduce the notion of $n$-singularity category $\mathbf{D}_{n{\rm-}sg}(R)$ of a given ring $R$, which can be seen as a generalization of the classical singularity category. Moreover, the $n$-global dimension $n$-gldim$(R)$ of $R$ is investigated. We show that $\mathbf{D}_{n{\rm-}sg}(R)=0$ if and only if $n$-gldim$(R)$ is finite. Furthermore, we charact… ▽ More For each $n\in\mathbb{N}\cup\{\infty\}$, we introduce the notion of $n$-singularity category $\mathbf{D}_{n{\rm-}sg}(R)$ of a given ring $R$, which can be seen as a generalization of the classical singularity category. Moreover, the $n$-global dimension $n$-gldim$(R)$ of $R$ is investigated. We show that $\mathbf{D}_{n{\rm-}sg}(R)=0$ if and only if $n$-gldim$(R)$ is finite. Furthermore, we characterize the vanishing property of $n$-singularity categories in terms of recollements. △ Less

Submitted 14 July, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2306.09140

arXiv:2306.09140 [pdf, ps, other]

Little finitistic dimensions and generalized derived categories

Authors: Xiaolei Zhang, Tiwei Zhao, Dingguo Wang

Abstract: In this paper, we introduced a generalization of the derived category, which is called the $n$-derived category and denoted by $\D_{n}(R)$, of a given ring $R$ for each $n\in\mathbb{N}\cup\{\infty\}$. The $n$-derived category of a ring is proved to be very closely connected with its left little finitistic dimension. We also introduce and investigate the notions of $n$-exact sequences, $n$-projecti… ▽ More In this paper, we introduced a generalization of the derived category, which is called the $n$-derived category and denoted by $\D_{n}(R)$, of a given ring $R$ for each $n\in\mathbb{N}\cup\{\infty\}$. The $n$-derived category of a ring is proved to be very closely connected with its left little finitistic dimension. We also introduce and investigate the notions of $n$-exact sequences, $n$-projective (resp., $n$-injective) modules and $n$-exact complexes. In particular, we characterize the left little finitistic dimensions in terms of all above notions. Finally, we build a connection of the classical derived categories and $n$-derived categories. △ Less

Submitted 14 July, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.07919 [pdf, other]

doi 10.1145/3580305.3599506

Skill Disentanglement for Imitation Learning from Suboptimal Demonstrations

Authors: Tianxiang Zhao, Wenchao Yu, Suhang Wang, Lu Wang, Xiang Zhang, Yuncong Chen, Yanchi Liu, Wei Cheng, Haifeng Chen

Abstract: Imitation learning has achieved great success in many sequential decision-making tasks, in which a neural agent is learned by imitating collected human demonstrations. However, existing algorithms typically require a large number of high-quality demonstrations that are difficult and expensive to collect. Usually, a trade-off needs to be made between demonstration quality and quantity in practice.… ▽ More Imitation learning has achieved great success in many sequential decision-making tasks, in which a neural agent is learned by imitating collected human demonstrations. However, existing algorithms typically require a large number of high-quality demonstrations that are difficult and expensive to collect. Usually, a trade-off needs to be made between demonstration quality and quantity in practice. Targeting this problem, in this work we consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set. Some pioneering works have been proposed, but they suffer from many limitations, e.g., assuming a demonstration to be of the same optimality throughout time steps and failing to provide any interpretation w.r.t knowledge learned from the noisy set. Addressing these problems, we propose {\method} by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills. Concretely, {\method} consists of a high-level controller to discover skills and a skill-conditioned module to capture action-taking policies, and is trained following a two-phase pipeline by first discovering skills with all demonstrations and then adapting the controller to only the clean set. A mutual-information-based regularization and a dynamic sub-demonstration optimality estimator are designed to promote disentanglement in the skill space. Extensive experiments are conducted over two gym environments and a real-world healthcare dataset to demonstrate the superiority of {\method} in learning from sub-optimal demonstrations and its improved interpretability by examining learned skills. △ Less

Submitted 13 June, 2023; originally announced June 2023.

Journal ref: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '23), August 6--10, 2023, Long Beach, CA, USA

arXiv:2306.06936 [pdf, other]

doi 10.1145/3580305.3599268

CARL-G: Clustering-Accelerated Representation Learning on Graphs

Authors: William Shiao, Uday Singh Saini, Yozen Liu, Tong Zhao, Neil Shah, Evangelos E. Papalexakis

Abstract: Self-supervised learning on graphs has made large strides in achieving great performance in various downstream tasks. However, many state-of-the-art methods suffer from a number of impediments, which prevent them from realizing their full potential. For instance, contrastive methods typically require negative sampling, which is often computationally costly. While non-contrastive methods avoid this… ▽ More Self-supervised learning on graphs has made large strides in achieving great performance in various downstream tasks. However, many state-of-the-art methods suffer from a number of impediments, which prevent them from realizing their full potential. For instance, contrastive methods typically require negative sampling, which is often computationally costly. While non-contrastive methods avoid this expensive step, most existing methods either rely on overly complex architectures or dataset-specific augmentations. In this paper, we ask: Can we borrow from classical unsupervised machine learning literature in order to overcome those obstacles? Guided by our key insight that the goal of distance-based clustering closely resembles that of contrastive learning: both attempt to pull representations of similar items together and dissimilar items apart. As a result, we propose CARL-G - a novel clustering-based framework for graph representation learning that uses a loss inspired by Cluster Validation Indices (CVIs), i.e., internal measures of cluster quality (no ground truth required). CARL-G is adaptable to different clustering methods and CVIs, and we show that with the right choice of clustering method and CVI, CARL-G outperforms node classification baselines on 4/5 datasets with up to a 79x training speedup compared to the best-performing baseline. CARL-G also performs at par or better than baselines in node clustering and similarity search tasks, training up to 1,500x faster than the best-performing baseline. Finally, we also provide theoretical foundations for the use of CVI-inspired losses in graph representation learning. △ Less

Submitted 31 July, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

Comments: 14 pages. Accepted at KDD 2023

arXiv:2306.06912 [pdf, other]

Chaos with Gaussian invariant distribution by quantum-noise random phase feedback

Authors: Yanqiang Guo, Haifeng Li, Yingqi Wang, Xiangyu Meng, Tong Zhao, Xiaomin Guo

Abstract: We experimentally present a random phase feedback based on quantum noise to generate a chaotic laser with Gaussian invariant distribution. The quantum noise from vacuum fluctuations is acquired by balanced homodyne detection and injected into a phase modulator to form a random phase feedback. An optical switch using high-speed intensity modulator is employed to reset the chaotic states repeatedly… ▽ More We experimentally present a random phase feedback based on quantum noise to generate a chaotic laser with Gaussian invariant distribution. The quantum noise from vacuum fluctuations is acquired by balanced homodyne detection and injected into a phase modulator to form a random phase feedback. An optical switch using high-speed intensity modulator is employed to reset the chaotic states repeatedly and the time evolutions of intensity statistical distributions of the chaotic states stemming from the initial noise are measured. By the quantum-noise random phase feedback, the transient intensity distributions of the chaotic outputs are improved from asymmetric invariant distributions to Gaussian invariant distributions, and the Gaussian invariant distribution indicates a randomly perturbed dynamical transition from microscopic initial noise to macroscopic stochastic fluctuation. The effects of phase feedback bandwidth and modulation depth on the invariant distributions are investigated experimentally. The chaotic time-delay signature and mean permutation entropy are suppressed to 0.036 and enhanced to 0.999 using the random phase feedback, respectively. The high-quality chaotic laser with Gaussian invariant distribution can be a desired random source for ultrafast random number generation and secure communication. △ Less

Submitted 12 June, 2023; originally announced June 2023.

Comments: 11 pages, 10 figures

arXiv:2306.03109 [pdf, other]

Machine Learning Force Fields with Data Cost Aware Training

Authors: Alexander Bukharin, Tianyi Liu, Shengjie Wang, Simiao Zuo, Weihao Gao, Wen Yan, Tuo Zhao

Abstract: Machine learning force fields (MLFF) have been proposed to accelerate molecular dynamics (MD) simulation, which finds widespread applications in chemistry and biomedical research. Even for the most data-efficient MLFFs, reaching chemical accuracy can require hundreds of frames of force and energy labels generated by expensive quantum mechanical algorithms, which may scale as $O(n^3)$ to $O(n^7)$,… ▽ More Machine learning force fields (MLFF) have been proposed to accelerate molecular dynamics (MD) simulation, which finds widespread applications in chemistry and biomedical research. Even for the most data-efficient MLFFs, reaching chemical accuracy can require hundreds of frames of force and energy labels generated by expensive quantum mechanical algorithms, which may scale as $O(n^3)$ to $O(n^7)$, with $n$ proportional to the number of basis functions. To address this issue, we propose a multi-stage computational framework -- ASTEROID, which lowers the data cost of MLFFs by leveraging a combination of cheap inaccurate data and expensive accurate data. The motivation behind ASTEROID is that inaccurate data, though incurring large bias, can help capture the sophisticated structures of the underlying force field. Therefore, we first train a MLFF model on a large amount of inaccurate training data, employing a bias-aware loss function to prevent the model from overfitting tahe potential bias of this data. We then fine-tune the obtained model using a small amount of accurate training data, which preserves the knowledge learned from the inaccurate training data while significantly improving the model's accuracy. Moreover, we propose a variant of ASTEROID based on score matching for the setting where the inaccurate training data are unlabeled. Extensive experiments on MD datasets and downstream tasks validate the efficacy of ASTEROID. Our code and data are available at https://github.com/abukharin3/asteroid. △ Less

Submitted 5 June, 2023; originally announced June 2023.

arXiv:2306.01323 [pdf, other]

Demystifying Structural Disparity in Graph Neural Networks: Can One Size Fit All?

Authors: Haitao Mao, Zhikai Chen, Wei **, Haoyu Han, Yao Ma, Tong Zhao, Neil Shah, Jiliang Tang

Abstract: Recent studies on Graph Neural Networks(GNNs) provide both empirical and theoretical evidence supporting their effectiveness in capturing structural patterns on both homophilic and certain heterophilic graphs. Notably, most real-world homophilic and heterophilic graphs are comprised of a mixture of nodes in both homophilic and heterophilic structural patterns, exhibiting a structural disparity. Ho… ▽ More Recent studies on Graph Neural Networks(GNNs) provide both empirical and theoretical evidence supporting their effectiveness in capturing structural patterns on both homophilic and certain heterophilic graphs. Notably, most real-world homophilic and heterophilic graphs are comprised of a mixture of nodes in both homophilic and heterophilic structural patterns, exhibiting a structural disparity. However, the analysis of GNN performance with respect to nodes exhibiting different structural patterns, e.g., homophilic nodes in heterophilic graphs, remains rather limited. In the present study, we provide evidence that Graph Neural Networks(GNNs) on node classification typically perform admirably on homophilic nodes within homophilic graphs and heterophilic nodes within heterophilic graphs while struggling on the opposite node set, exhibiting a performance disparity. We theoretically and empirically identify effects of GNNs on testing nodes exhibiting distinct structural patterns. We then propose a rigorous, non-i.i.d PAC-Bayesian generalization bound for GNNs, revealing reasons for the performance disparity, namely the aggregated feature distance and homophily ratio difference between training and testing nodes. Furthermore, we demonstrate the practical implications of our new findings via (1) elucidating the effectiveness of deeper GNNs; and (2) revealing an over-looked distribution shift factor on graph out-of-distribution problem and proposing a new scenario accordingly. △ Less

Submitted 14 October, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

Comments: 55 pages, 24 figures. arXiv admin note: text overlap with arXiv:2106.15535 by other authors

arXiv:2306.00369 [pdf, other]

Focused Prefix Tuning for Controllable Text Generation

Authors: Congda Ma, Tianyu Zhao, Makoto Shing, Kei Sawada, Manabu Okumura

Abstract: In a controllable text generation dataset, there exist unannotated attributes that could provide irrelevant learning signals to models that use it for training and thus degrade their performance. We propose focused prefix tuning(FPT) to mitigate the problem and to enable the control to focus on the desired attribute. Experimental results show that FPT can achieve better control accuracy and text f… ▽ More In a controllable text generation dataset, there exist unannotated attributes that could provide irrelevant learning signals to models that use it for training and thus degrade their performance. We propose focused prefix tuning(FPT) to mitigate the problem and to enable the control to focus on the desired attribute. Experimental results show that FPT can achieve better control accuracy and text fluency than baseline models in single-attribute control tasks. In multi-attribute control tasks, FPT achieves comparable control accuracy with the state-of-the-art approach while kee** the flexibility to control new attributes without retraining existing models. △ Less

Submitted 10 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: Accepted to the ACL 2023

arXiv:2305.18713 [pdf, other]

How Thermal Effect Regulates Cyclic Voltammetry of Supercapacitors

Authors: Teng Zhao, Shuangliang Zhao, Shenggao Zhou, Zhenli Xu

Abstract: Cyclic voltammetry (CV) is a powerful technique for characterizing electrochemical properties of electrochemical devices. During charging-discharging cycles, thermal effect has profound impact on its performance, but existing theoretical models cannot clarify such intrinsic mechanism and often give poor prediction. Herein, we propose an interfacial model for the electro-thermal coupling, based on… ▽ More Cyclic voltammetry (CV) is a powerful technique for characterizing electrochemical properties of electrochemical devices. During charging-discharging cycles, thermal effect has profound impact on its performance, but existing theoretical models cannot clarify such intrinsic mechanism and often give poor prediction. Herein, we propose an interfacial model for the electro-thermal coupling, based on fundamentals in non-equilibrium statistical mechanics. By incorporating molecular interactions, our model shows a quantitative agreement with experimental measurements. The integral capacitance shows a first enhanced then decayed trend against the applied heat bath temperature. Such a relation is attributed to the competition between electrical attraction and Born repulsion via dielectric inhomogeneity, which is rarely understood in previous models. In addition, as evidenced in recent experimental CV tests, our model predicts the non-monotonic dependence of the capacitance on the bulk electrolyte density, further demonstrating its high accuracy. This work demonstrates a potential pathway towards next-generation thermal regulation of electrochemical devices. △ Less

Submitted 2 July, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

arXiv:2305.18703 [pdf, other]

Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey

Authors: Chen Ling, Xujiang Zhao, Jiaying Lu, Chengyuan Deng, Can Zheng, Junxiang Wang, Tanmoy Chowdhury, Yun Li, Hejie Cui, Xuchao Zhang, Tianjiao Zhao, Amit Panalkar, Dhagash Mehta, Stefano Pasquali, Wei Cheng, Haoyu Wang, Yanchi Liu, Zhengzhang Chen, Haifeng Chen, Chris White, Quanquan Gu, Jian Pei, Carl Yang, Liang Zhao

Abstract: Large language models (LLMs) have significantly advanced the field of natural language processing (NLP), providing a highly useful, task-agnostic foundation for a wide range of applications. However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles, caused by the heterogeneity of domain data, the sophistication of domain knowledge, the uniqueness of dom… ▽ More Large language models (LLMs) have significantly advanced the field of natural language processing (NLP), providing a highly useful, task-agnostic foundation for a wide range of applications. However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles, caused by the heterogeneity of domain data, the sophistication of domain knowledge, the uniqueness of domain objectives, and the diversity of the constraints (e.g., various social norms, cultural conformity, religious beliefs, and ethical standards in the domain applications). Domain specification techniques are key to make large language models disruptive in many applications. Specifically, to solve these hurdles, there has been a notable increase in research and practices conducted in recent years on the domain specialization of LLMs. This emerging field of study, with its substantial potential for impact, necessitates a comprehensive and systematic review to better summarize and guide ongoing work in this area. In this article, we present a comprehensive survey on domain specification techniques for large language models, an emerging direction critical for large language model applications. First, we propose a systematic taxonomy that categorizes the LLM domain-specialization techniques based on the accessibility to LLMs and summarizes the framework for all the subcategories as well as their relations and differences to each other. Second, we present an extensive taxonomy of critical application domains that can benefit dramatically from specialized LLMs, discussing their practical significance and open challenges. Last, we offer our insights into the current research status and future trends in this area. △ Less

Submitted 29 March, 2024; v1 submitted 29 May, 2023; originally announced May 2023.

arXiv:2305.14913 [pdf, other]

CoLaDa: A Collaborative Label Denoising Framework for Cross-lingual Named Entity Recognition

Authors: Tingting Ma, Qianhui Wu, Huiqiang Jiang, Börje F. Karlsson, Tiejun Zhao, Chin-Yew Lin

Abstract: Cross-lingual named entity recognition (NER) aims to train an NER system that generalizes well to a target language by leveraging labeled data in a given source language. Previous work alleviates the data scarcity problem by translating source-language labeled data or performing knowledge distillation on target-language unlabeled data. However, these methods may suffer from label noise due to the… ▽ More Cross-lingual named entity recognition (NER) aims to train an NER system that generalizes well to a target language by leveraging labeled data in a given source language. Previous work alleviates the data scarcity problem by translating source-language labeled data or performing knowledge distillation on target-language unlabeled data. However, these methods may suffer from label noise due to the automatic labeling process. In this paper, we propose CoLaDa, a Collaborative Label Denoising Framework, to address this problem. Specifically, we first explore a model-collaboration-based denoising scheme that enables models trained on different data sources to collaboratively denoise pseudo labels used by each other. We then present an instance-collaboration-based strategy that considers the label consistency of each token's neighborhood in the representation space for denoising. Experiments on different benchmark datasets show that the proposed CoLaDa achieves superior results compared to previous methods, especially when generalizing to distant languages. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: ACL 2023. Our code is available at https://github.com/microsoft/vert-papers/tree/master/papers/CoLaDa

arXiv:2305.12087 [pdf, other]

Semi-Supervised Graph Imbalanced Regression

Authors: Gang Liu, Tong Zhao, Eric Inae, Tengfei Luo, Meng Jiang

Abstract: Data imbalance is easily found in annotated data when the observations of certain continuous label values are difficult to collect for regression tasks. When they come to molecule and polymer property predictions, the annotated graph datasets are often small because labeling them requires expensive equipment and effort. To address the lack of examples of rare label values in graph regression tasks… ▽ More Data imbalance is easily found in annotated data when the observations of certain continuous label values are difficult to collect for regression tasks. When they come to molecule and polymer property predictions, the annotated graph datasets are often small because labeling them requires expensive equipment and effort. To address the lack of examples of rare label values in graph regression tasks, we propose a semi-supervised framework to progressively balance training data and reduce model bias via self-training. The training data balance is achieved by (1) pseudo-labeling more graphs for under-represented labels with a novel regression confidence measurement and (2) augmenting graph examples in latent space for remaining rare labels after data balancing with pseudo-labels. The former is to identify quality examples from unlabeled data whose labels are confidently predicted and sample a subset of them with a reverse distribution from the imbalanced annotated data. The latter collaborates with the former to target a perfect balance using a novel label-anchored mixup algorithm. We perform experiments in seven regression tasks on graph datasets. Results demonstrate that the proposed framework significantly reduces the error of predicted graph properties, especially in under-represented label areas. △ Less

Submitted 20 May, 2023; originally announced May 2023.

Comments: Accepted by KDD 2023. 17 pages, 5 figures, 10 tables

arXiv:2305.05687 [pdf, other]

doi 10.3847/1538-4357/accc89

Coronal Heating as Determined by the Solar Flare Frequency Distribution Obtained by Aggregating Case Studies

Authors: James Paul Mason, Alexandra Werth, Colin G. West, Allison A. Youngblood, Donald L. Woodraska, Courtney Peck, Kevin Lacjak, Florian G. Frick, Moutamen Gabir, Reema A. Alsinan, Thomas Jacobsen, Mohammad Alrubaie, Kayla M. Chizmar, Benjamin P. Lau, Lizbeth Montoya Dominguez, David Price, Dylan R. Butler, Connor J. Biron, Nikita Feoktistov, Kai Dewey, N. E. Loomis, Michal Bodzianowski, Connor Kuybus, Henry Dietrick, Aubrey M. Wolfe , et al. (977 additional authors not shown)

Abstract: Flare frequency distributions represent a key approach to addressing one of the largest problems in solar and stellar physics: determining the mechanism that counter-intuitively heats coronae to temperatures that are orders of magnitude hotter than the corresponding photospheres. It is widely accepted that the magnetic field is responsible for the heating, but there are two competing mechanisms th… ▽ More Flare frequency distributions represent a key approach to addressing one of the largest problems in solar and stellar physics: determining the mechanism that counter-intuitively heats coronae to temperatures that are orders of magnitude hotter than the corresponding photospheres. It is widely accepted that the magnetic field is responsible for the heating, but there are two competing mechanisms that could explain it: nanoflares or Alfvén waves. To date, neither can be directly observed. Nanoflares are, by definition, extremely small, but their aggregate energy release could represent a substantial heating mechanism, presuming they are sufficiently abundant. One way to test this presumption is via the flare frequency distribution, which describes how often flares of various energies occur. If the slope of the power law fitting the flare frequency distribution is above a critical threshold, $α=2$ as established in prior literature, then there should be a sufficient abundance of nanoflares to explain coronal heating. We performed $>$600 case studies of solar flares, made possible by an unprecedented number of data analysts via three semesters of an undergraduate physics laboratory course. This allowed us to include two crucial, but nontrivial, analysis methods: pre-flare baseline subtraction and computation of the flare energy, which requires determining flare start and stop times. We aggregated the results of these analyses into a statistical study to determine that $α= 1.63 \pm 0.03$. This is below the critical threshold, suggesting that Alfvén waves are an important driver of coronal heating. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: 1,002 authors, 14 pages, 4 figures, 3 tables, published by The Astrophysical Journal on 2023-05-09, volume 948, page 71

Showing 151–200 of 1,070 results for author: Zhao, T