Search | arXiv e-print repository

doi 10.1016/j.metrad.2023.100017

Summary of ChatGPT-Related Research and Perspective Towards the Future of Large Language Models

Authors: Yiheng Liu, Tianle Han, Siyuan Ma, Jiayue Zhang, Yuanyuan Yang, Jiaming Tian, Hao He, Antong Li, Mengshen He, Zhengliang Liu, Zihao Wu, Lin Zhao, Dajiang Zhu, Xiang Li, Ning Qiang, Dingang Shen, Tianming Liu, Bao Ge

Abstract: This paper presents a comprehensive survey of ChatGPT-related (GPT-3.5 and GPT-4) research, state-of-the-art large language models (LLM) from the GPT series, and their prospective applications across diverse domains. Indeed, key innovations such as large-scale pre-training that captures knowledge across the entire world wide web, instruction fine-tuning and Reinforcement Learning from Human Feedba… ▽ More This paper presents a comprehensive survey of ChatGPT-related (GPT-3.5 and GPT-4) research, state-of-the-art large language models (LLM) from the GPT series, and their prospective applications across diverse domains. Indeed, key innovations such as large-scale pre-training that captures knowledge across the entire world wide web, instruction fine-tuning and Reinforcement Learning from Human Feedback (RLHF) have played significant roles in enhancing LLMs' adaptability and performance. We performed an in-depth analysis of 194 relevant papers on arXiv, encompassing trend analysis, word cloud representation, and distribution analysis across various application domains. The findings reveal a significant and increasing interest in ChatGPT-related research, predominantly centered on direct natural language processing applications, while also demonstrating considerable potential in areas ranging from education and history to mathematics, medicine, and physics. This study endeavors to furnish insights into ChatGPT's capabilities, potential implications, ethical concerns, and offer direction for future advancements in this field. △ Less

Submitted 21 August, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

Comments: 21 pages, 4 figures, accepted by Meta-Radiology

Journal ref: Meta-Radiology (2023)100017

arXiv:2303.15935 [pdf, other]

When Brain-inspired AI Meets AGI

Authors: Lin Zhao, Lu Zhang, Zihao Wu, Yuzhong Chen, Haixing Dai, Xiaowei Yu, Zhengliang Liu, Tuo Zhang, Xintao Hu, Xi Jiang, Xiang Li, Dajiang Zhu, Dinggang Shen, Tianming Liu

Abstract: Artificial General Intelligence (AGI) has been a long-standing goal of humanity, with the aim of creating machines capable of performing any intellectual task that humans can do. To achieve this, AGI researchers draw inspiration from the human brain and seek to replicate its principles in intelligent machines. Brain-inspired artificial intelligence is a field that has emerged from this endeavor, c… ▽ More Artificial General Intelligence (AGI) has been a long-standing goal of humanity, with the aim of creating machines capable of performing any intellectual task that humans can do. To achieve this, AGI researchers draw inspiration from the human brain and seek to replicate its principles in intelligent machines. Brain-inspired artificial intelligence is a field that has emerged from this endeavor, combining insights from neuroscience, psychology, and computer science to develop more efficient and powerful AI systems. In this article, we provide a comprehensive overview of brain-inspired AI from the perspective of AGI. We begin with the current progress in brain-inspired AI and its extensive connection with AGI. We then cover the important characteristics for both human intelligence and AGI (e.g., scaling, multimodality, and reasoning). We discuss important technologies toward achieving AGI in current AI systems, such as in-context learning and prompt tuning. We also investigate the evolution of AGI systems from both algorithmic and infrastructural perspectives. Finally, we explore the limitations and future of AGI. △ Less

Submitted 28 March, 2023; originally announced March 2023.

arXiv:2303.15790 [pdf, other]

doi 10.1007/s11467-023-1333-z

STCF Conceptual Design Report: Volume 1 -- Physics & Detector

Authors: M. Achasov, X. C. Ai, R. Aliberti, L. P. An, Q. An, X. Z. Bai, Y. Bai, O. Bakina, A. Barnyakov, V. Blinov, V. Bobrovnikov, D. Bodrov, A. Bogomyagkov, A. Bondar, I. Boyko, Z. H. Bu, F. M. Cai, H. Cai, J. J. Cao, Q. H. Cao, Z. Cao, Q. Chang, K. T. Chao, D. Y. Chen, H. Chen , et al. (413 additional authors not shown)

Abstract: The Super $τ$-Charm facility (STCF) is an electron-positron collider proposed by the Chinese particle physics community. It is designed to operate in a center-of-mass energy range from 2 to 7 GeV with a peak luminosity of $0.5\times 10^{35}{\rm cm}^{-2}{\rm s}^{-1}$ or higher. The STCF will produce a data sample about a factor of 100 larger than that by the present $τ$-Charm factory -- the BEPCII,… ▽ More The Super $τ$-Charm facility (STCF) is an electron-positron collider proposed by the Chinese particle physics community. It is designed to operate in a center-of-mass energy range from 2 to 7 GeV with a peak luminosity of $0.5\times 10^{35}{\rm cm}^{-2}{\rm s}^{-1}$ or higher. The STCF will produce a data sample about a factor of 100 larger than that by the present $τ$-Charm factory -- the BEPCII, providing a unique platform for exploring the asymmetry of matter-antimatter (charge-parity violation), in-depth studies of the internal structure of hadrons and the nature of non-perturbative strong interactions, as well as searching for exotic hadrons and physics beyond the Standard Model. The STCF project in China is under development with an extensive R\&D program. This document presents the physics opportunities at the STCF, describes conceptual designs of the STCF detector system, and discusses future plans for detector R\&D and physics case studies. △ Less

Submitted 5 October, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

Journal ref: Front. Phys. 19(1), 14701 (2024)

arXiv:2303.15569 [pdf, ps, other]

Core-Periphery Principle Guided Redesign of Self-Attention in Transformers

Authors: Xiaowei Yu, Lu Zhang, Haixing Dai, Yanjun Lyu, Lin Zhao, Zihao Wu, David Liu, Tianming Liu, Dajiang Zhu

Abstract: Designing more efficient, reliable, and explainable neural network architectures is critical to studies that are based on artificial intelligence (AI) techniques. Previous studies, by post-hoc analysis, have found that the best-performing ANNs surprisingly resemble biological neural networks (BNN), which indicates that ANNs and BNNs may share some common principles to achieve optimal performance i… ▽ More Designing more efficient, reliable, and explainable neural network architectures is critical to studies that are based on artificial intelligence (AI) techniques. Previous studies, by post-hoc analysis, have found that the best-performing ANNs surprisingly resemble biological neural networks (BNN), which indicates that ANNs and BNNs may share some common principles to achieve optimal performance in either machine learning or cognitive/behavior tasks. Inspired by this phenomenon, we proactively instill organizational principles of BNNs to guide the redesign of ANNs. We leverage the Core-Periphery (CP) organization, which is widely found in human brain networks, to guide the information communication mechanism in the self-attention of vision transformer (ViT) and name this novel framework as CP-ViT. In CP-ViT, the attention operation between nodes is defined by a sparse graph with a Core-Periphery structure (CP graph), where the core nodes are redesigned and reorganized to play an integrative role and serve as a center for other periphery nodes to exchange information. We evaluated the proposed CP-ViT on multiple public datasets, including medical image datasets (INbreast) and natural image datasets. Interestingly, by incorporating the BNN-derived principle (CP structure) into the redesign of ViT, our CP-ViT outperforms other state-of-the-art ANNs. In general, our work advances the state of the art in three aspects: 1) This work provides novel insights for brain-inspired AI: we can utilize the principles found in BNNs to guide and improve our ANN architecture design; 2) We show that there exist sweet spots of CP graphs that lead to CP-ViTs with significantly improved performance; and 3) The core nodes in CP-ViT correspond to task-related meaningful and important image patches, which can significantly enhance the interpretability of the trained deep model. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Comments: Core-periphery, functional brain networks, ViT

arXiv:2303.13637 [pdf, other]

Efficient and Direct Inference of Heart Rate Variability using Both Signal Processing and Machine Learning

Authors: Yuntong Zhang, **gye Xu, Mimi Xie, Dakai Zhu, Houbing Song, Wei Wang

Abstract: Heart Rate Variability (HRV) measures the variation of the time between consecutive heartbeats and is a major indicator of physical and mental health. Recent research has demonstrated that photoplethysmography (PPG) sensors can be used to infer HRV. However, many prior studies had high errors because they only employed signal processing or machine learning (ML), or because they indirectly inferred… ▽ More Heart Rate Variability (HRV) measures the variation of the time between consecutive heartbeats and is a major indicator of physical and mental health. Recent research has demonstrated that photoplethysmography (PPG) sensors can be used to infer HRV. However, many prior studies had high errors because they only employed signal processing or machine learning (ML), or because they indirectly inferred HRV, or because there lacks large training datasets. Many prior studies may also require large ML models. The low accuracy and large model sizes limit their applications to small embedded devices and potential future use in healthcare. To address the above issues, we first collected a large dataset of PPG signals and HRV ground truth. With this dataset, we developed HRV models that combine signal processing and ML to directly infer HRV. Evaluation results show that our method had errors between 3.5% to 25.7% and outperformed signal-processing-only and ML-only methods. We also explored different ML models, which showed that Decision Trees and Multi-level Perceptrons have 13.0% and 9.1% errors on average with models at most hundreds of KB and inference time less than 1ms. Hence, they are more suitable for small embedded devices and potentially enable the future use of PPG-based HRV monitoring in healthcare. △ Less

Submitted 23 March, 2023; originally announced March 2023.

arXiv:2303.13636 [pdf, other]

PPG-based Heart Rate Estimation with Efficient Sensor Sampling and Learning Models

Authors: Yuntong Zhang, **gye Xu, Mimi Xie, Wei Wang, Keying Ye, **g Wang, Dakai Zhu

Abstract: Recent studies showed that Photoplethysmography (PPG) sensors embedded in wearable devices can estimate heart rate (HR) with high accuracy. However, despite of prior research efforts, applying PPG sensor based HR estimation to embedded devices still faces challenges due to the energy-intensive high-frequency PPG sampling and the resource-intensive machine-learning models. In this work, we aim to e… ▽ More Recent studies showed that Photoplethysmography (PPG) sensors embedded in wearable devices can estimate heart rate (HR) with high accuracy. However, despite of prior research efforts, applying PPG sensor based HR estimation to embedded devices still faces challenges due to the energy-intensive high-frequency PPG sampling and the resource-intensive machine-learning models. In this work, we aim to explore HR estimation techniques that are more suitable for lower-power and resource-constrained embedded devices. More specifically, we seek to design techniques that could provide high-accuracy HR estimation with low-frequency PPG sampling, small model size, and fast inference time. First, we show that by combining signal processing and ML, it is possible to reduce the PPG sampling frequency from 125 Hz to only 25 Hz while providing higher HR estimation accuracy. This combination also helps to reduce the ML model feature size, leading to smaller models. Additionally, we present a comprehensive analysis on different ML models and feature sizes to compare their accuracy, model size, and inference time. The models explored include Decision Tree (DT), Random Forest (RF), K-nearest neighbor (KNN), Support vector machines (SVM), and Multi-layer perceptron (MLP). Experiments were conducted using both a widely-utilized dataset and our self-collected dataset. The experimental results show that our method by combining signal processing and ML had only 5% error for HR estimation using low-frequency PPG data. Moreover, our analysis showed that DT models with 10 to 20 input features usually have good accuracy, while are several magnitude smaller in model sizes and faster in inference time. △ Less

Submitted 23 March, 2023; originally announced March 2023.

arXiv:2303.11141 [pdf, other]

DocRED-FE: A Document-Level Fine-Grained Entity And Relation Extraction Dataset

Authors: Hongbo Wang, Weimin Xiong, Yifan Song, Dawei Zhu, Yu Xia, Sujian Li

Abstract: Joint entity and relation extraction (JERE) is one of the most important tasks in information extraction. However, most existing works focus on sentence-level coarse-grained JERE, which have limitations in real-world scenarios. In this paper, we construct a large-scale document-level fine-grained JERE dataset DocRED-FE, which improves DocRED with Fine-Grained Entity Type. Specifically, we redesign… ▽ More Joint entity and relation extraction (JERE) is one of the most important tasks in information extraction. However, most existing works focus on sentence-level coarse-grained JERE, which have limitations in real-world scenarios. In this paper, we construct a large-scale document-level fine-grained JERE dataset DocRED-FE, which improves DocRED with Fine-Grained Entity Type. Specifically, we redesign a hierarchical entity type schema including 11 coarse-grained types and 119 fine-grained types, and then re-annotate DocRED manually according to this schema. Through comprehensive experiments we find that: (1) DocRED-FE is challenging to existing JERE models; (2) Our fine-grained entity types promote relation classification. We make DocRED-FE with instruction and the code for our baselines publicly available at https://github.com/PKU-TANGENT/DOCRED-FE. △ Less

Submitted 21 March, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

Comments: Accepted by IEEE ICASSP 2023. The first two authors contribute equally

arXiv:2303.11032 [pdf, other]

DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4

Authors: Zhengliang Liu, Yue Huang, Xiaowei Yu, Lu Zhang, Zihao Wu, Chao Cao, Haixing Dai, Lin Zhao, Yiwei Li, Peng Shu, Fang Zeng, Lichao Sun, Wei Liu, Dinggang Shen, Quanzheng Li, Tianming Liu, Dajiang Zhu, Xiang Li

Abstract: The digitization of healthcare has facilitated the sharing and re-using of medical data but has also raised concerns about confidentiality and privacy. HIPAA (Health Insurance Portability and Accountability Act) mandates removing re-identifying information before the dissemination of medical records. Thus, effective and efficient solutions for de-identifying medical data, especially those in free-… ▽ More The digitization of healthcare has facilitated the sharing and re-using of medical data but has also raised concerns about confidentiality and privacy. HIPAA (Health Insurance Portability and Accountability Act) mandates removing re-identifying information before the dissemination of medical records. Thus, effective and efficient solutions for de-identifying medical data, especially those in free-text forms, are highly needed. While various computer-assisted de-identification methods, including both rule-based and learning-based, have been developed and used in prior practice, such solutions still lack generalizability or need to be fine-tuned according to different scenarios, significantly imposing restrictions in wider use. The advancement of large language models (LLM), such as ChatGPT and GPT-4, have shown great potential in processing text data in the medical domain with zero-shot in-context learning, especially in the task of privacy protection, as these models can identify confidential information by their powerful named entity recognition (NER) capability. In this work, we developed a novel GPT4-enabled de-identification framework (``DeID-GPT") to automatically identify and remove the identifying information. Compared to existing commonly used medical text data de-identification methods, our developed DeID-GPT showed the highest accuracy and remarkable reliability in masking private information from the unstructured medical text while preserving the original structure and meaning of the text. This study is one of the earliest to utilize ChatGPT and GPT-4 for medical text data processing and de-identification, which provides insights for further research and solution development on the use of LLMs such as ChatGPT/GPT-4 in healthcare. Codes and benchmarking data information are available at https://github.com/yhydhx/ChatGPT-API. △ Less

Submitted 21 December, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

arXiv:2303.06594 [pdf, other]

ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions

Authors: Deyao Zhu, Jun Chen, Kilichbek Haydarov, Xiaoqian Shen, Wenxuan Zhang, Mohamed Elhoseiny

Abstract: Asking insightful questions is crucial for acquiring knowledge and expanding our understanding of the world. However, the importance of questioning has been largely overlooked in AI research, where models have been primarily developed to answer questions. With the recent advancements of large language models (LLMs) like ChatGPT, we discover their capability to ask high-quality questions when provi… ▽ More Asking insightful questions is crucial for acquiring knowledge and expanding our understanding of the world. However, the importance of questioning has been largely overlooked in AI research, where models have been primarily developed to answer questions. With the recent advancements of large language models (LLMs) like ChatGPT, we discover their capability to ask high-quality questions when provided with a suitable prompt. This discovery presents a new opportunity to develop an automatic questioning system. In this paper, we introduce ChatCaptioner, a novel automatic-questioning method deployed in image captioning. Here, ChatGPT is prompted to ask a series of informative questions about images to BLIP-2, a strong vision question-answering model. By kee** acquiring new visual information from BLIP-2's answers, ChatCaptioner is able to generate more enriched image descriptions. We conduct human-subject evaluations on common image caption datasets such as COCO, Conceptual Caption, and WikiArt, and compare ChatCaptioner with BLIP-2 as well as ground truth. Our results demonstrate that ChatCaptioner's captions are significantly more informative, receiving three times as many votes from human evaluators for providing the most image information. Besides, ChatCaptioner identifies 53% more objects within the image than BLIP-2 alone measured by WordNet synset matching. Code is available at https://github.com/Vision-CAIR/ChatCaptioner △ Less

Submitted 12 March, 2023; originally announced March 2023.

arXiv:2303.04994 [pdf, ps, other]

Distributional Vector Autoregression: Eliciting Macro and Financial Dependence

Authors: Yunyun Wang, Tatsushi Oka, Dan Zhu

Abstract: Vector autoregression is an essential tool in empirical macroeconomics and finance for understanding the dynamic interdependencies among multivariate time series. In this study, we expand the scope of vector autoregression by incorporating a multivariate distributional regression framework and introducing a distributional impulse response function, providing a comprehensive view of dynamic heterog… ▽ More Vector autoregression is an essential tool in empirical macroeconomics and finance for understanding the dynamic interdependencies among multivariate time series. In this study, we expand the scope of vector autoregression by incorporating a multivariate distributional regression framework and introducing a distributional impulse response function, providing a comprehensive view of dynamic heterogeneity. We propose a straightforward yet flexible estimation method and establish its asymptotic properties under weak dependence assumptions. Our empirical analysis examines the conditional joint distribution of GDP growth and financial conditions in the United States, with a focus on the global financial crisis. Our results show that tight financial conditions lead to a multimodal conditional joint distribution of GDP growth and financial conditions, and easing financial conditions significantly impacts long-term GDP growth, while improving the GDP growth during the global financial crisis has limited effects on financial conditions. △ Less

Submitted 8 March, 2023; originally announced March 2023.

arXiv:2303.03447 [pdf, other]

doi 10.1103/PhysRevB.107.184431

Magnetic domain depinning as possible evidence for two ferromagnetic phases in LaCrGe$_3$

Authors: R. R. Ullah, P. Klavins, X. D. Zhu, and V. Taufour

Abstract: Two ferromagnetic phases, FM1 and FM2, were first proposed to exist in LaCrGe$_3$ based on a broad maximum in the temperature derivative of resistivity resembling that of the superconducting ferromagnet UGe$_2$ where FM1 and FM2 are well-established. While evidence for two FM phases can be found in certain additional probes, corresponding anomalies in magnetization have not been recognized until n… ▽ More Two ferromagnetic phases, FM1 and FM2, were first proposed to exist in LaCrGe$_3$ based on a broad maximum in the temperature derivative of resistivity resembling that of the superconducting ferromagnet UGe$_2$ where FM1 and FM2 are well-established. While evidence for two FM phases can be found in certain additional probes, corresponding anomalies in magnetization have not been recognized until now. Our spatially-resolved images of the magnetic domains show a substantial change in the domain structure between the higher temperature FM1 phase and the lower temperature FM2 phase. Furthermore, our measurements of the coercive field and virgin magnetization curves reveal an unconventional magnetic domain pinning region in the FM1 phase, followed by a depinning region at lower temperatures where the system is reported to crossover into the FM2 phase. We incorporate this discovery into a simple domain magnetization model that demystifies the magnetization curve seen in all previous studies. Finally, we find that the unusual domain behavior can be explained by a change in the ferromagnetic exchange interaction and magnetic moment, both of which are consistent with the existence of two FM phases. This revelation may help explain a range of anomalous behaviors observed in LaCrGe$_3$ and rekindles the discussion about the prevalence of multiple FM phases in fragile FM systems. △ Less

Submitted 29 April, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

Comments: 7+8 pages, 4+8 figures. Revised with suggestions from referee

arXiv:2303.00575 [pdf, other]

IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction

Authors: Dekai Zhu, Guangyao Zhai, Yan Di, Fabian Manhardt, Hendrik Berkemeyer, Tuan Tran, Nassir Navab, Federico Tombari, Benjamin Busam

Abstract: Reliable multi-agent trajectory prediction is crucial for the safe planning and control of autonomous systems. Compared with single-agent cases, the major challenge in simultaneously processing multiple agents lies in modeling complex social interactions caused by various driving intentions and road conditions. Previous methods typically leverage graph-based message propagation or attention mechan… ▽ More Reliable multi-agent trajectory prediction is crucial for the safe planning and control of autonomous systems. Compared with single-agent cases, the major challenge in simultaneously processing multiple agents lies in modeling complex social interactions caused by various driving intentions and road conditions. Previous methods typically leverage graph-based message propagation or attention mechanism to encapsulate such interactions in the format of marginal probabilistic distributions. However, it is inherently sub-optimal. In this paper, we propose IPCC-TP, a novel relevance-aware module based on Incremental Pearson Correlation Coefficient to improve multi-agent interaction modeling. IPCC-TP learns pairwise joint Gaussian Distributions through the tightly-coupled estimation of the means and covariances according to interactive incremental movements. Our module can be conveniently embedded into existing multi-agent prediction methods to extend original motion distribution decoders. Extensive experiments on nuScenes and Argoverse 2 datasets demonstrate that IPCC-TP improves the performance of baselines by a large margin. △ Less

Submitted 30 April, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

Comments: CVPR 2023 accepted

arXiv:2303.00520 [pdf, other]

Valid Information Guidance Network for Compressed Video Quality Enhancement

Authors: Xuan Sun, Ziyue Zhang, Guannan Chen, Dan Zhu

Abstract: In recent years deep learning methods have shown great superiority in compressed video quality enhancement tasks. Existing methods generally take the raw video as the ground truth and extract practical information from consecutive frames containing various artifacts. However, they do not fully exploit the valid information of compressed and raw videos to guide the quality enhancement for compresse… ▽ More In recent years deep learning methods have shown great superiority in compressed video quality enhancement tasks. Existing methods generally take the raw video as the ground truth and extract practical information from consecutive frames containing various artifacts. However, they do not fully exploit the valid information of compressed and raw videos to guide the quality enhancement for compressed videos. In this paper, we propose a unique Valid Information Guidance scheme (VIG) to enhance the quality of compressed videos by mining valid information from both compressed videos and raw videos. Specifically, we propose an efficient framework, Compressed Redundancy Filtering (CRF) network, to balance speed and enhancement. After removing the redundancy by filtering the information, CRF can use the valid information of the compressed video to reconstruct the texture. Furthermore, we propose a progressive Truth Guidance Distillation (TGD) strategy, which does not need to design additional teacher models and distillation loss functions. By only using the ground truth as input to guide the model to aggregate the correct spatio-temporal correspondence across the raw frames, TGD can significantly improve the enhancement effect without increasing the extra training cost. Extensive experiments show that our method achieves the state-of-the-art performance of compressed video quality enhancement in terms of accuracy and efficiency. △ Less

Submitted 28 February, 2023; originally announced March 2023.

arXiv:2302.13929 [pdf, other]

Efficient Informed Proposals for Discrete Distributions via Newton's Series Approximation

Authors: Yue Xiang, Dongyao Zhu, Bowen Lei, Dongkuan Xu, Ruqi Zhang

Abstract: Gradients have been exploited in proposal distributions to accelerate the convergence of Markov chain Monte Carlo algorithms on discrete distributions. However, these methods require a natural differentiable extension of the target discrete distribution, which often does not exist or does not provide effective gradient guidance. In this paper, we develop a gradient-like proposal for any discrete d… ▽ More Gradients have been exploited in proposal distributions to accelerate the convergence of Markov chain Monte Carlo algorithms on discrete distributions. However, these methods require a natural differentiable extension of the target discrete distribution, which often does not exist or does not provide effective gradient guidance. In this paper, we develop a gradient-like proposal for any discrete distribution without this strong requirement. Built upon a locally-balanced proposal, our method efficiently approximates the discrete likelihood ratio via Newton's series expansion to enable a large and efficient exploration in discrete spaces. We show that our method can also be viewed as a multilinear extension, thus inheriting its desired properties. We prove that our method has a guaranteed convergence rate with or without the Metropolis-Hastings step. Furthermore, our method outperforms a number of popular alternatives in several different experiments, including the facility location problem, extractive text summarization, and image retrieval. △ Less

Submitted 27 February, 2023; originally announced February 2023.

Comments: Published at AISTATS 2023

arXiv:2302.13007 [pdf, other]

AugGPT: Leveraging ChatGPT for Text Data Augmentation

Authors: Haixing Dai, Zhengliang Liu, Wenxiong Liao, Xiaoke Huang, Yihan Cao, Zihao Wu, Lin Zhao, Shaochen Xu, Wei Liu, Ninghao Liu, Sheng Li, Dajiang Zhu, Hongmin Cai, Lichao Sun, Quanzheng Li, Dinggang Shen, Tianming Liu, Xiang Li

Abstract: Text data augmentation is an effective strategy for overcoming the challenge of limited sample sizes in many natural language processing (NLP) tasks. This challenge is especially prominent in the few-shot learning scenario, where the data in the target domain is generally much scarcer and of lowered quality. A natural and widely-used strategy to mitigate such challenges is to perform data augmenta… ▽ More Text data augmentation is an effective strategy for overcoming the challenge of limited sample sizes in many natural language processing (NLP) tasks. This challenge is especially prominent in the few-shot learning scenario, where the data in the target domain is generally much scarcer and of lowered quality. A natural and widely-used strategy to mitigate such challenges is to perform data augmentation to better capture the data invariance and increase the sample size. However, current text data augmentation methods either can't ensure the correct labeling of the generated data (lacking faithfulness) or can't ensure sufficient diversity in the generated data (lacking compactness), or both. Inspired by the recent success of large language models, especially the development of ChatGPT, which demonstrated improved language comprehension abilities, in this work, we propose a text data augmentation approach based on ChatGPT (named AugGPT). AugGPT rephrases each sentence in the training samples into multiple conceptually similar but semantically different samples. The augmented samples can then be used in downstream model training. Experiment results on few-shot learning text classification tasks show the superior performance of the proposed AugGPT approach over state-of-the-art text data augmentation methods in terms of testing accuracy and distribution of the augmented samples. △ Less

Submitted 20 March, 2023; v1 submitted 25 February, 2023; originally announced February 2023.

arXiv:2302.10447 [pdf, other]

Mask-guided BERT for Few Shot Text Classification

Authors: Wenxiong Liao, Zhengliang Liu, Haixing Dai, Zihao Wu, Yiyang Zhang, Xiaoke Huang, Yuzhong Chen, Xi Jiang, Wei Liu, Dajiang Zhu, Tianming Liu, Sheng Li, Xiang Li, Hongmin Cai

Abstract: Transformer-based language models have achieved significant success in various domains. However, the data-intensive nature of the transformer architecture requires much labeled data, which is challenging in low-resource scenarios (i.e., few-shot learning (FSL)). The main challenge of FSL is the difficulty of training robust models on small amounts of samples, which frequently leads to overfitting.… ▽ More Transformer-based language models have achieved significant success in various domains. However, the data-intensive nature of the transformer architecture requires much labeled data, which is challenging in low-resource scenarios (i.e., few-shot learning (FSL)). The main challenge of FSL is the difficulty of training robust models on small amounts of samples, which frequently leads to overfitting. Here we present Mask-BERT, a simple and modular framework to help BERT-based architectures tackle FSL. The proposed approach fundamentally differs from existing FSL strategies such as prompt tuning and meta-learning. The core idea is to selectively apply masks on text inputs and filter out irrelevant information, which guides the model to focus on discriminative tokens that influence prediction results. In addition, to make the text representations from different categories more separable and the text representations from the same category more compact, we introduce a contrastive learning loss function. Experimental results on public-domain benchmark datasets demonstrate the effectiveness of Mask-BERT. △ Less

Submitted 8 March, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

arXiv:2302.07801 [pdf, other]

Data Forensics in Diffusion Models: A Systematic Analysis of Membership Privacy

Authors: Derui Zhu, Dingfan Chen, Jens Grossklags, Mario Fritz

Abstract: In recent years, diffusion models have achieved tremendous success in the field of image generation, becoming the stateof-the-art technology for AI-based image processing applications. Despite the numerous benefits brought by recent advances in diffusion models, there are also concerns about their potential misuse, specifically in terms of privacy breaches and intellectual property infringement. I… ▽ More In recent years, diffusion models have achieved tremendous success in the field of image generation, becoming the stateof-the-art technology for AI-based image processing applications. Despite the numerous benefits brought by recent advances in diffusion models, there are also concerns about their potential misuse, specifically in terms of privacy breaches and intellectual property infringement. In particular, some of their unique characteristics open up new attack surfaces when considering the real-world deployment of such models. With a thorough investigation of the attack vectors, we develop a systematic analysis of membership inference attacks on diffusion models and propose novel attack methods tailored to each attack scenario specifically relevant to diffusion models. Our approach exploits easily obtainable quantities and is highly effective, achieving near-perfect attack performance (>0.9 AUCROC) in realistic scenarios. Our extensive experiments demonstrate the effectiveness of our method, highlighting the importance of considering privacy and intellectual property risks when using diffusion models in image generation tasks. △ Less

Submitted 5 August, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

arXiv:2302.03797 [pdf, other]

Men Can't Always be Transformed into Mice: Decision Algorithms and Complexity for Sorting by Symmetric Reversals

Authors: Xin Tong, Yixiao Yu, Ziyi Fang, Haitao Jiang, Lusheng Wang, Binhai Zhu, Daming Zhu

Abstract: Sorting a permutation by reversals is a famous problem in genome rearrangements. Since 1997, quite some biological evidence were found that in many genomes the reversed regions are usually flanked by a pair of inverted repeats. This type of reversals are called symmetric reversals, which, unfortunately, were largely ignored until recently. In this paper, we investigate the problem of sorting by sy… ▽ More Sorting a permutation by reversals is a famous problem in genome rearrangements. Since 1997, quite some biological evidence were found that in many genomes the reversed regions are usually flanked by a pair of inverted repeats. This type of reversals are called symmetric reversals, which, unfortunately, were largely ignored until recently. In this paper, we investigate the problem of sorting by symmetric reversals, which requires a series of symmetric reversals to transform one chromosome $A$ into the another chromosome $B$. The decision problem of sorting by symmetric reversals is referred to as {\em SSR} (when the input chromosomes $A$ and $B$ are given, we use {\em SSR(A,B)}) and the corresponding optimization version (i.e., when the answer for {\em SSR(A,B)} is yes, using the minimum number of symmetric reversals to convert $A$ to $B$), is referred to as {\em SMSR(A,B)}. The main results of this paper are summarized as follows, where the input is a pair of chromosomes $A$ and $B$ with $n$ repeats. (1) We present an $O(n^2)$ time algorithm to solve the decision problem {\em SSR(A,B)}, i.e., determine whether a chromosome $A$ can be transformed into $B$ by a series of symmetric reversals. (2) We design an $O(n^2)$ time algorithm for a special 2-balanced case of {\em SMSR(A,B)}, where chromosomes $A$ and $B$ both have duplication number 2 and every repeat appears twice in different orientations in $A$ and $B$. (3) We show that SMSR is NP-hard even if the duplication number of the input chromosomes are at most 2, hence showing that the above positive optimization result is the best possible. As a by-product, we show that the \emph{minimum Steiner tree} problem on \emph{circle graphs} is NP-hard, settling the complexity status of a 38-year old open problem. △ Less

Submitted 7 February, 2023; originally announced February 2023.

Comments: 35 pages, 12 figures

MSC Class: 68Qxx; 68Wxx ACM Class: G.2.2; G.2.3; J.3

arXiv:2302.03172 [pdf, ps, other]

High-Dimensional Conditionally Gaussian State Space Models with Missing Data

Authors: Joshua C. C. Chan, Aubrey Poon, Dan Zhu

Abstract: We develop an efficient sampling approach for handling complex missing data patterns and a large number of missing observations in conditionally Gaussian state space models. Two important examples are dynamic factor models with unbalanced datasets and large Bayesian VARs with variables in multiple frequencies. A key insight underlying the proposed approach is that the joint distribution of the mis… ▽ More We develop an efficient sampling approach for handling complex missing data patterns and a large number of missing observations in conditionally Gaussian state space models. Two important examples are dynamic factor models with unbalanced datasets and large Bayesian VARs with variables in multiple frequencies. A key insight underlying the proposed approach is that the joint distribution of the missing data conditional on the observed data is Gaussian. Moreover, the inverse covariance or precision matrix of this conditional distribution is sparse, and this special structure can be exploited to substantially speed up computations. We illustrate the methodology using two empirical applications. The first application combines quarterly, monthly and weekly data using a large Bayesian VAR to produce weekly GDP estimates. In the second application, we extract latent factors from unbalanced datasets involving over a hundred monthly variables via a dynamic factor model with stochastic volatility. △ Less

Submitted 6 February, 2023; originally announced February 2023.

arXiv:2302.02074 [pdf, other]

Quantum computation: Efficient network partitioning for large scale critical infrastructures

Authors: Saikat Ray Majumder, Annarita Giani, Weiwei Shen, Bogdan Neculaes, Daiwei Zhu, Sonika Johri

Abstract: Quantum computers are emerging as a viable alternative to tackle certain computational problems that are challenging for classical computers. With the rapid development of quantum hardware such as those based on trapped ions, there is practical motivation for identifying risk management problems that are efficiently solvable with these systems. Here we focus on network partitioning as a means for… ▽ More Quantum computers are emerging as a viable alternative to tackle certain computational problems that are challenging for classical computers. With the rapid development of quantum hardware such as those based on trapped ions, there is practical motivation for identifying risk management problems that are efficiently solvable with these systems. Here we focus on network partitioning as a means for analyzing risk in critical infrastructures and present a quantum approach for its implementation. It is based on the potential speedup quantum computers can provide in the identification of eigenvalues and eigenvectors of sparse graph Laplacians, a procedure which is constrained by time and memory on classical computers. △ Less

Submitted 3 February, 2023; originally announced February 2023.

arXiv:2302.00146 [pdf, other]

Gyri vs. Sulci: Disentangling Brain Core-Periphery Functional Networks via Twin-Transformer

Authors: Xiaowei Yu, Lu Zhang, Haixing Dai, Lin Zhao, Yanjun Lyu, Zihao Wu, Tianming Liu, Dajiang Zhu

Abstract: The human cerebral cortex is highly convoluted into convex gyri and concave sulci. It has been demonstrated that gyri and sulci are significantly different in their anatomy, connectivity, and function, besides exhibiting opposite shape patterns, long-distance axonal fibers connected to gyri are much denser than those connected to sulci, and neural signals on gyri are more complex in low-frequency… ▽ More The human cerebral cortex is highly convoluted into convex gyri and concave sulci. It has been demonstrated that gyri and sulci are significantly different in their anatomy, connectivity, and function, besides exhibiting opposite shape patterns, long-distance axonal fibers connected to gyri are much denser than those connected to sulci, and neural signals on gyri are more complex in low-frequency while sulci are more complex in high-frequency. Although accumulating evidence shows significant differences between gyri and sulci, their primary roles in brain function have not been elucidated yet. To solve this fundamental problem, we design a novel Twin-Transformer framework to unveil the unique functional roles of gyri and sulci as well as their relationship in the whole brain function. Our Twin-Transformer framework adopts two structure-identical (twin) Transformers to disentangle spatial-temporal patterns of gyri and sulci, one focuses on the information of gyri and the other is on sulci. The Gyro-Sulcal interactions, along with the tremendous but widely existing variability across subjects, are characterized in the loss design. We validated our Twin-Transformer on the HCP task-fMRI dataset, for the first time, to elucidate the different roles of gyri and sulci in brain function. Our results suggest that gyri and sulci could work together in a core-periphery network manner, that is, gyri could serve as core networks for information gathering and distributing, while sulci could serve as periphery networks for specific local information processing. These findings have shed new light on our fundamental understanding of the brain's basic structural and functional mechanisms. △ Less

Submitted 31 January, 2023; originally announced February 2023.

Comments: 13 pages, 4 figures

arXiv:2301.13803 [pdf, other]

Fairness-aware Vision Transformer via Debiased Self-Attention

Authors: Yao Qiang, Chengyin Li, Prashant Khanduri, Dongxiao Zhu

Abstract: Vision Transformer (ViT) has recently gained significant interest in solving computer vision (CV) problems due to its capability of extracting informative features and modeling long-range dependencies through the self-attention mechanism. To fully realize the advantages of ViT in real-world applications, recent works have explored the trustworthiness of ViT, including its robustness and explainabi… ▽ More Vision Transformer (ViT) has recently gained significant interest in solving computer vision (CV) problems due to its capability of extracting informative features and modeling long-range dependencies through the self-attention mechanism. To fully realize the advantages of ViT in real-world applications, recent works have explored the trustworthiness of ViT, including its robustness and explainability. However, another desiderata, fairness has not yet been adequately addressed in the literature. We establish that the existing fairness-aware algorithms (primarily designed for CNNs) do not perform well on ViT. This necessitates the need for develo** our novel framework via Debiased Self-Attention (DSA). DSA is a fairness-through-blindness approach that enforces ViT to eliminate spurious features correlated with the sensitive attributes for bias mitigation. Notably, adversarial examples are leveraged to locate and mask the spurious features in the input image patches. In addition, DSA utilizes an attention weights alignment regularizer in the training objective to encourage learning informative features for target prediction. Importantly, our DSA framework leads to improved fairness guarantees over prior works on multiple prediction tasks without compromising target prediction performance. △ Less

Submitted 29 August, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

arXiv:2301.12876 [pdf, other]

Guiding Online Reinforcement Learning with Action-Free Offline Pretraining

Authors: Deyao Zhu, Yuhui Wang, Jürgen Schmidhuber, Mohamed Elhoseiny

Abstract: Offline RL methods have been shown to reduce the need for environment interaction by training agents using offline collected episodes. However, these methods typically require action information to be logged during data collection, which can be difficult or even impossible in some practical cases. In this paper, we investigate the potential of using action-free offline datasets to improve online r… ▽ More Offline RL methods have been shown to reduce the need for environment interaction by training agents using offline collected episodes. However, these methods typically require action information to be logged during data collection, which can be difficult or even impossible in some practical cases. In this paper, we investigate the potential of using action-free offline datasets to improve online reinforcement learning, name this problem Reinforcement Learning with Action-Free Offline Pretraining (AFP-RL). We introduce Action-Free Guide (AF-Guide), a method that guides online training by extracting knowledge from action-free offline datasets. AF-Guide consists of an Action-Free Decision Transformer (AFDT) implementing a variant of Upside-Down Reinforcement Learning. It learns to plan the next states from the offline dataset, and a Guided Soft Actor-Critic (Guided SAC) that learns online with guidance from AFDT. Experimental results show that AF-Guide can improve sample efficiency and performance in online training thanks to the knowledge from the action-free offline dataset. Code is available at https://github.com/Vision-CAIR/AF-Guide. △ Less

Submitted 22 March, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

arXiv:2301.11490 [pdf, other]

Neural Episodic Control with State Abstraction

Authors: Zhuo Li, Derui Zhu, Yu**g Hu, Xiaofei Xie, Lei Ma, Yan Zheng, Yan Song, Yingfeng Chen, Jianjun Zhao

Abstract: Existing Deep Reinforcement Learning (DRL) algorithms suffer from sample inefficiency. Generally, episodic control-based approaches are solutions that leverage highly-rewarded past experiences to improve sample efficiency of DRL algorithms. However, previous episodic control-based approaches fail to utilize the latent information from the historical behaviors (e.g., state transitions, topological… ▽ More Existing Deep Reinforcement Learning (DRL) algorithms suffer from sample inefficiency. Generally, episodic control-based approaches are solutions that leverage highly-rewarded past experiences to improve sample efficiency of DRL algorithms. However, previous episodic control-based approaches fail to utilize the latent information from the historical behaviors (e.g., state transitions, topological similarities, etc.) and lack scalability during DRL training. This work introduces Neural Episodic Control with State Abstraction (NECSA), a simple but effective state abstraction-based episodic control containing a more comprehensive episodic memory, a novel state evaluation, and a multi-step state analysis. We evaluate our approach to the MuJoCo and Atari tasks in OpenAI gym domains. The experimental results indicate that NECSA achieves higher sample efficiency than the state-of-the-art episodic control-based approaches. Our data and code are available at the project website\footnote{\url{https://sites.google.com/view/drl-necsa}}. △ Less

Submitted 20 February, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

arXiv:2301.08955 [pdf, other]

Determination of nonthermal bonding origin of a novel photoexcited lattice instability in SnSe

Authors: Yi**g Huang, Samuel Teitelbaum, Shan Yang, Gilberto De la Pe na, Takahiro Sato Matthieu Chollet, Diling Zhu, Jennifer L. Niedziela, Dipanshu Bansal, Andrew F. May, Aaron M. Lindenberg, Olivier Delaire, Mariano Trigo, David A. Reis

Abstract: Interatomic forces that bind materials are largely determined by an often complex interplay between the electronic band-structure and the atomic arrangements to form its equilibrium structure and dynamics. As these forces also determine the phonon dispersion, lattice dynamics measurements are often crucial tools for understanding how materials transform between different structures. This is the ca… ▽ More Interatomic forces that bind materials are largely determined by an often complex interplay between the electronic band-structure and the atomic arrangements to form its equilibrium structure and dynamics. As these forces also determine the phonon dispersion, lattice dynamics measurements are often crucial tools for understanding how materials transform between different structures. This is the case for the mono-chalcogenides which feature a number of lattice instabilities associated with their network of resonant bonds and a large tunability in their functional properties. SnSe hosts a novel lattice instability upon above-bandgap photoexcitation that is distinct from the distortions associated with its high temperature phase transition, demonstrating that photoexcitation can alter the interatomic forces significantly different than thermal excitation. Here we report decisive time-resolved X-ray scattering-based measurements of the nonequlibrium lattice dynamics in SnSe. By fitting interatomic force models to the excited-state dispersion, we determine this instability as being primarily due to changes in the fourth-nearest neighbor bonds that connect bilayers, with relatively little change to the intralayer resonant bonds. In addition to providing critical insight into the nonthermal bonding origin of the instability in SnSe, such measurements will be crucial for understanding and controlling materials properties under non-equilibrium conditions. △ Less

Submitted 21 January, 2023; originally announced January 2023.

arXiv:2301.06989 [pdf, other]

Negative Flux Aggregation to Estimate Feature Attributions

Authors: Xin Li, Deng Pan, Chengyin Li, Yao Qiang, Dongxiao Zhu

Abstract: There are increasing demands for understanding deep neural networks' (DNNs) behavior spurred by growing security and/or transparency concerns. Due to multi-layer nonlinearity of the deep neural network architectures, explaining DNN predictions still remains as an open problem, preventing us from gaining a deeper understanding of the mechanisms. To enhance the explainability of DNNs, we estimate th… ▽ More There are increasing demands for understanding deep neural networks' (DNNs) behavior spurred by growing security and/or transparency concerns. Due to multi-layer nonlinearity of the deep neural network architectures, explaining DNN predictions still remains as an open problem, preventing us from gaining a deeper understanding of the mechanisms. To enhance the explainability of DNNs, we estimate the input feature's attributions to the prediction task using divergence and flux. Inspired by the divergence theorem in vector analysis, we develop a novel Negative Flux Aggregation (NeFLAG) formulation and an efficient approximation algorithm to estimate attribution map. Unlike the previous techniques, ours doesn't rely on fitting a surrogate model nor need any path integration of gradients. Both qualitative and quantitative experiments demonstrate a superior performance of NeFLAG in generating more faithful attribution maps than the competing methods. Our code is available at \url{https://github.com/xinli0928/NeFLAG} △ Less

Submitted 13 May, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

Comments: 14 pages, 4 figures, 2 tables

arXiv:2301.02561 [pdf, other]

Multi-Vehicle Trajectory Prediction at Intersections using State and Intention Information

Authors: Dekai Zhu, Qadeer Khan, Daniel Cremers

Abstract: Traditional approaches to prediction of future trajectory of road agents rely on knowing information about their past trajectory. This work rather relies only on having knowledge of the current state and intended direction to make predictions for multiple vehicles at intersections. Furthermore, message passing of this information between the vehicles provides each one of them a more holistic overv… ▽ More Traditional approaches to prediction of future trajectory of road agents rely on knowing information about their past trajectory. This work rather relies only on having knowledge of the current state and intended direction to make predictions for multiple vehicles at intersections. Furthermore, message passing of this information between the vehicles provides each one of them a more holistic overview of the environment allowing for a more informed prediction. This is done by training a neural network which takes the state and intent of the multiple vehicles to predict their future trajectory. Using the intention as an input allows our approach to be extended to additionally control the multiple vehicles to drive towards desired paths. Experimental results demonstrate the robustness of our approach both in terms of trajectory prediction and vehicle control at intersections. The complete training and evaluation code for this work is available here: \url{https://github.com/Dekai21/Multi_Agent_Intersection}. △ Less

Submitted 6 January, 2023; originally announced January 2023.

arXiv:2301.01827 [pdf, other]

doi 10.1109/TASE.2022.3230951

A GOA-Based Fault-Tolerant Trajectory Tracking Control for an Underwater Vehicle of Multi-Thruster System without Actuator Saturation

Authors: Danjie Zhu, Lei Wang, Hua Zhang, Simon X. Yang

Abstract: This paper proposes an intelligent fault-tolerant control (FTC) strategy to tackle the trajectory tracking problem of an underwater vehicle (UV) under thruster damage (power loss) cases and meanwhile resolve the actuator saturation brought by the vehicle's physical constraints. In the proposed control strategy, the trajectory tracking component is formed by a refined backstep** algorithm that co… ▽ More This paper proposes an intelligent fault-tolerant control (FTC) strategy to tackle the trajectory tracking problem of an underwater vehicle (UV) under thruster damage (power loss) cases and meanwhile resolve the actuator saturation brought by the vehicle's physical constraints. In the proposed control strategy, the trajectory tracking component is formed by a refined backstep** algorithm that controls the velocity variation and a sliding mode control deducts the torque/force outputs; the fault-tolerant component is established based on a Grasshopper Optimization Algorithm (GOA), which provides fast convergence speed as well as satisfactory accuracy of deducting optimized reallocation of the thruster forces to compensate for the power loss in different fault cases. Simulations with or without environmental perturbations under different fault cases and comparisons to other traditional FTCs are presented, thus verifying the effectiveness and robustness of the proposed GOA-based fault-tolerant trajectory tracking design. △ Less

Submitted 4 January, 2023; originally announced January 2023.

Comments: arXiv admin note: text overlap with arXiv:2210.01706

arXiv:2212.09892 [pdf, other]

doi 10.1103/PhysRevB.107.014305

Influence of local symmetry on lattice dynamics coupled to topological surface states

Authors: Jonathan A. Sobota, Samuel W. Teitelbaum, Yi**g Huang, José D. Querales-Flores, Robert Power, Meabh Allen, Costel R. Rotundu, Trevor P. Bailey, Ctirad Uher, Tom Henighan, Mason Jiang, Diling Zhu, Matthieu Chollet, Takahiro Sato, Mariano Trigo, Éamonn D. Murray, Ivana Savić, Patrick S. Kirchmann, Stephen Fahy, David. A. Reis, Zhi-Xun Shen

Abstract: We investigate coupled electron-lattice dynamics in the topological insulator Bi2Te3 with time-resolved photoemission and time-resolved x-ray diffraction. It is well established that coherent phonons can be launched by optical excitation, but selection rules generally restrict these modes to zone-center wavevectors and Raman-active branches. We find that the topological surface state couples to ad… ▽ More We investigate coupled electron-lattice dynamics in the topological insulator Bi2Te3 with time-resolved photoemission and time-resolved x-ray diffraction. It is well established that coherent phonons can be launched by optical excitation, but selection rules generally restrict these modes to zone-center wavevectors and Raman-active branches. We find that the topological surface state couples to additional modes, including a continuum of surface-projected bulk modes from both Raman- and infrared-branches, with possible contributions from surface-localized modes when they exist. Our calculations show that this surface vibrational spectrum occurs naturally as a consequence of the translational and inversion symmetries broken at the surface, without requiring the splitting-off of surface-localized phonon modes. The generality of this result suggests that coherent phonon spectra are useful by providing unique fingerprints for identifying surface states in more controversial materials. These effects may also expand the phase space for tailoring surface state wavefunctions via ultrafast optical excitation. △ Less

Submitted 19 December, 2022; originally announced December 2022.

arXiv:2212.02084 [pdf, other]

End-to-end Recording Device Identification Based on Deep Representation Learning

Authors: Chunyan Zeng, Dongliang Zhu, Zhifeng Wang, Minghu Wu, Wei Xiong, Nan Zhao

Abstract: Deep learning techniques have achieved specific results in recording device source identification. The recording device source features include spatial information and certain temporal information. However, most recording device source identification methods based on deep learning only use spatial representation learning from recording device source features, which cannot make full use of recordin… ▽ More Deep learning techniques have achieved specific results in recording device source identification. The recording device source features include spatial information and certain temporal information. However, most recording device source identification methods based on deep learning only use spatial representation learning from recording device source features, which cannot make full use of recording device source information. Therefore, in this paper, to fully explore the spatial information and temporal information of recording device source, we propose a new method for recording device source identification based on the fusion of spatial feature information and temporal feature information by using an end-to-end framework. From a feature perspective, we designed two kinds of networks to extract recording device source spatial and temporal information. Afterward, we use the attention mechanism to adaptively assign the weight of spatial information and temporal information to obtain fusion features. From a model perspective, our model uses an end-to-end framework to learn the deep representation from spatial feature and temporal feature and train using deep and shallow loss to joint optimize our network. This method is compared with our previous work and baseline system. The results show that the proposed method is better than our previous work and baseline system under general conditions. △ Less

Submitted 5 December, 2022; originally announced December 2022.

Comments: 20 pages, 5 figures, recording device identification

arXiv:2211.13332 [pdf, other]

Learning Compact Features via In-Training Representation Alignment

Authors: Xin Li, Xiangrui Li, Deng Pan, Yao Qiang, Dongxiao Zhu

Abstract: Deep neural networks (DNNs) for supervised learning can be viewed as a pipeline of the feature extractor (i.e., last hidden layer) and a linear classifier (i.e., output layer) that are trained jointly with stochastic gradient descent (SGD) on the loss function (e.g., cross-entropy). In each epoch, the true gradient of the loss function is estimated using a mini-batch sampled from the training set… ▽ More Deep neural networks (DNNs) for supervised learning can be viewed as a pipeline of the feature extractor (i.e., last hidden layer) and a linear classifier (i.e., output layer) that are trained jointly with stochastic gradient descent (SGD) on the loss function (e.g., cross-entropy). In each epoch, the true gradient of the loss function is estimated using a mini-batch sampled from the training set and model parameters are then updated with the mini-batch gradients. Although the latter provides an unbiased estimation of the former, they are subject to substantial variances derived from the size and number of sampled mini-batches, leading to noisy and jumpy updates. To stabilize such undesirable variance in estimating the true gradients, we propose In-Training Representation Alignment (ITRA) that explicitly aligns feature distributions of two different mini-batches with a matching loss in the SGD training process. We also provide a rigorous analysis of the desirable effects of the matching loss on feature representation learning: (1) extracting compact feature representation; (2) reducing over-adaption on mini-batches via an adaptive weighting mechanism; and (3) accommodating to multi-modalities. Finally, we conduct large-scale experiments on both image and text classifications to demonstrate its superior performance to the strong baselines. △ Less

Submitted 23 November, 2022; originally announced November 2022.

Comments: 11 pages, 4 figures, 6 tables. Accepted for publication by AAAI-23. arXiv admin note: text overlap with arXiv:2002.09917

arXiv:2211.11992 [pdf, other]

Electroweak Monopole-antimonopole Pair in the Standard Model

Authors: Dan Zhu, Khai-Ming Wong, Guo-Quan Wong

Abstract: We present the first numerical solution that corresponds to a pair of Cho-Maison monopole and antimonopole (MAP) in the SU(2)$\times$U(1) Weinberg-Salam (WS) theory. The monopoles are finitely separated, while each pole carries magnetic charge $\pm 4π/e$. The positive pole is situated in the upper hemisphere, whereas the negative pole is in the lower hemisphere. The Cho-Maison MAP was investigated… ▽ More We present the first numerical solution that corresponds to a pair of Cho-Maison monopole and antimonopole (MAP) in the SU(2)$\times$U(1) Weinberg-Salam (WS) theory. The monopoles are finitely separated, while each pole carries magnetic charge $\pm 4π/e$. The positive pole is situated in the upper hemisphere, whereas the negative pole is in the lower hemisphere. The Cho-Maison MAP was investigated for a range of Weinberg angle, $0.4675\leq\tanθ_W\leq10$, and Higgs self-coupling, $0\leqβ\leq1.7704$. Magnetic dipole moment ($μ_m$) and pole separation ($d_z$) of the numerical solutions are calculated and analyzed. Total energy of the system, however, is infinite due to point singularities at the locations of monopoles. △ Less

Submitted 11 December, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

arXiv:2211.10905 [pdf, other]

doi 10.1103/PhysRevB.107.085407

Sublattice-enriched tunability of bound states in second-order topological insulators and superconductors

Authors: Di Zhu, Majid Kheirkhah, Zhongbo Yan

Abstract: Bound states at sharp corners have been widely viewed as the hallmark of two-dimensional second-order topological insulators and superconductors. In this work, we show that the existence of sublattice degrees of freedom can enrich the tunability of bound states on the boundary and hence lift the constraint on their locations. We take the Kane-Mele model with honeycomb-lattice structure to illustra… ▽ More Bound states at sharp corners have been widely viewed as the hallmark of two-dimensional second-order topological insulators and superconductors. In this work, we show that the existence of sublattice degrees of freedom can enrich the tunability of bound states on the boundary and hence lift the constraint on their locations. We take the Kane-Mele model with honeycomb-lattice structure to illustrate the underlying physics. With the introduction of an in-plane exchange field to the model, we find that the boundary Dirac mass induced by the exchange field has a sensitive dependence on the boundary sublattice termination. We find that the sensitive sublattice dependence can lead bound states to emerge at a specific type of boundary defects named as sublattice domain walls if the exchange field is of ferromagnetic nature, even in the absence of any sharp corner on the boundary. Remarkably, this sensitive dependence of the boundary Dirac mass on the boundary sublattice termination allows the positions of bound states to be manipulated to any place on the boundary for an appropriately-designed sample. With a further introduction of conventional s-wave superconductivity to the model, we find that, no matter whether the exchange field is ferromagnetic, antiferromagnetic, or ferrimagnetic, highly controllable Majorana zero modes can be achieved at the sublattice domain walls. Our work reshapes the understanding of boundary physics in second-order topological phases, and meanwhile opens potential avenues to realize highly controllable bound states for potential applications. △ Less

Submitted 16 February, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

Comments: 15 pages, 9 figures

Journal ref: Phys. Rev. B 107, 085407 (2023)

arXiv:2211.06701 [pdf, other]

Structure-Preserving 3D Garment Modeling with Neural Sewing Machines

Authors: Xipeng Chen, Guangrun Wang, Dizhong Zhu, Xiaodan Liang, Philip H. S. Torr, Liang Lin

Abstract: 3D Garment modeling is a critical and challenging topic in the area of computer vision and graphics, with increasing attention focused on garment representation learning, garment reconstruction, and controllable garment manipulation, whereas existing methods were constrained to model garments under specific categories or with relatively simple topologies. In this paper, we propose a novel Neural S… ▽ More 3D Garment modeling is a critical and challenging topic in the area of computer vision and graphics, with increasing attention focused on garment representation learning, garment reconstruction, and controllable garment manipulation, whereas existing methods were constrained to model garments under specific categories or with relatively simple topologies. In this paper, we propose a novel Neural Sewing Machine (NSM), a learning-based framework for structure-preserving 3D garment modeling, which is capable of learning representations for garments with diverse shapes and topologies and is successfully applied to 3D garment reconstruction and controllable manipulation. To model generic garments, we first obtain sewing pattern embedding via a unified sewing pattern encoding module, as the sewing pattern can accurately describe the intrinsic structure and the topology of the 3D garment. Then we use a 3D garment decoder to decode the sewing pattern embedding into a 3D garment using the UV-position maps with masks. To preserve the intrinsic structure of the predicted 3D garment, we introduce an inner-panel structure-preserving loss, an inter-panel structure-preserving loss, and a surface-normal loss in the learning process of our framework. We evaluate NSM on the public 3D garment dataset with sewing patterns with diverse garment shapes and categories. Extensive experiments demonstrate that the proposed NSM is capable of representing 3D garments under diverse garment shapes and topologies, realistically reconstructing 3D garments from 2D images with the preserved structure, and accurately manipulating the 3D garment categories, shapes, and topologies, outperforming the state-of-the-art methods by a clear margin. △ Less

Submitted 12 November, 2022; originally announced November 2022.

Comments: NeurIPS 2022

arXiv:2211.05910 [pdf, other]

Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

Authors: Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Ganzorig Gankhuyag, **gang Huh, Myeong Kyun Kim, Kihwan Yoon, Hyeon-Cheol Moon, Seungho Lee, Yoonsik Choe, **woo Jeong, Sungjei Kim, Maciej Smyl, Tomasz Latkowski, Pawel Kubik, Michal Sokolski, Yujie Ma, Jiahao Chao, Zhou Zhou, Hongfan Gao, Zhengfeng Yang, Zhenbing Zeng, Zhengyang Zhuge, Chenghua Li , et al. (71 additional authors not shown)

Abstract: Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose… ▽ More Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2105.07825, arXiv:2105.08826, arXiv:2211.04470, arXiv:2211.03885, arXiv:2211.05256

arXiv:2211.05780 [pdf, ps, other]

Quasi-linear relation between partition and analytic rank

Authors: Guy Moshkovitz, Daniel G. Zhu

Abstract: An important conjecture in additive combinatorics posits that the partition rank and analytic rank of tensors are equal up to a constant, over any finite field. We prove the conjecture up to logarithmic factors. Our proof is largely independent of previous work, utilizing recursively constructed polynomial identities and random walks on zero sets of polynomials. We also introduce a new, vector-val… ▽ More An important conjecture in additive combinatorics posits that the partition rank and analytic rank of tensors are equal up to a constant, over any finite field. We prove the conjecture up to logarithmic factors. Our proof is largely independent of previous work, utilizing recursively constructed polynomial identities and random walks on zero sets of polynomials. We also introduce a new, vector-valued notion of tensor rank (``local rank''), which serves as a bridge between partition and analytic rank and which may be of independent interest. △ Less

Submitted 10 November, 2022; originally announced November 2022.

MSC Class: 11B30; 15A69; 68R05

arXiv:2211.05256 [pdf, other]

Power Efficient Video Super-Resolution on Mobile NPUs with Deep Learning, Mobile AI & AIM 2022 challenge: Report

Authors: Andrey Ignatov, Radu Timofte, Cheng-Ming Chiang, Hsien-Kai Kuo, Yu-Syuan Xu, Man-Yu Lee, Allen Lu, Chia-Ming Cheng, Chih-Cheng Chen, Jia-Ying Yong, Hong-Han Shuai, Wen-Huang Cheng, Zhuang Jia, Tianyu Xu, Yijian Zhang, Long Bao, Heng Sun, Diankai Zhang, Si Gao, Shaoli Liu, Biao Wu, Xiaofeng Zhang, Chengjian Zheng, Kaidi Lu, Ning Wang , et al. (29 additional authors not shown)

Abstract: Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this prob… ▽ More Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2105.08826, arXiv:2105.07809, arXiv:2211.04470, arXiv:2211.03885

arXiv:2211.03298 [pdf, other]

doi 10.1103/PhysRevLett.129.213901

Femtosecond-Terawatt Hard X Ray Pulse Generation with Chirped Pulse Amplification on a Free Electron Laser

Authors: Haoyuan Li, James MacArthur, Sean Littleton, Mike Dunne, Zhirong Huang, Diling Zhu

Abstract: Advances of high intensity lasers have opened up the field of strong field physics and led to a broad range of technological applications. Recent x ray laser sources and optics development makes it possible to obtain extremely high intensity and brightness at x ray wavelengths. In this paper, we present a system design that implements chirped pulse amplification for hard x ray free electron lasers… ▽ More Advances of high intensity lasers have opened up the field of strong field physics and led to a broad range of technological applications. Recent x ray laser sources and optics development makes it possible to obtain extremely high intensity and brightness at x ray wavelengths. In this paper, we present a system design that implements chirped pulse amplification for hard x ray free electron lasers. Numerical modeling with realistic experimental parameters show that near-transform-limit single-femtosecond hard x ray laser pulses with peak power exceeding 1 TW and brightness exceeding $4\times10^{35}~$s$^{-1}$mm$^{-2}$mrad$^{-2}$0.1\%bandwdith$^{-1}$ can be consistently generated. Realization of such beam qualities is essential for establishing systematic and quantitative understanding of strong field x-ray physics and nonlinear x ray optics phenomena. △ Less

Submitted 6 November, 2022; originally announced November 2022.

Comments: 23 pages, 7 figures, Accepted by PRL

arXiv:2210.17483 [pdf, other]

Ultrafast x-ray scattering reveals composite amplitude collective mode in the Weyl charge density wave material (TaSe$_4$)$_2$I

Authors: Quynh L. Nguyen, Ryan A. Duncan, Gal Orenstein, Yi**g Huang, Viktor Krapivin, Gilberto de la Pena, Chance Ornelas-Skarin, David A. Reis, Peter Abbamonte, Simon Bettler, Matthieu Chollet, Matthias C. Hoffmann, Matthew Hurley, Soyeun Kim, Patrick S. Kirchmann, Yuya Kubota, Fahad Mahmood, Alexander Miller, Taito Osaka, Kejian Qu, Takahiro Sato, Daniel P. Shoemaker, Nicholas Sirica, Sanghoon Song, Jade Stanton , et al. (5 additional authors not shown)

Abstract: We report ultrafast x-ray scattering experiments of the quasi-1D charge density wave (CDW) material (TaSe$_4$)$_2$I following photoexcitation with femtosecond infrared laser pulses. From the time-dependent diffraction signal at the CDW sidebands we identify an amplitude mode derived primarily from the transverse acoustic component of the CDW static distortion. The dynamics of this acoustic amplitu… ▽ More We report ultrafast x-ray scattering experiments of the quasi-1D charge density wave (CDW) material (TaSe$_4$)$_2$I following photoexcitation with femtosecond infrared laser pulses. From the time-dependent diffraction signal at the CDW sidebands we identify an amplitude mode derived primarily from the transverse acoustic component of the CDW static distortion. The dynamics of this acoustic amplitude mode are described well by a model of a displacive excitation, which we interpret as mediated through a coupling to the optical phonon component associated with the tetramerization of the Ta chains. △ Less

Submitted 23 December, 2022; v1 submitted 31 October, 2022; originally announced October 2022.

arXiv:2210.15790 [pdf, other]

BI AVAN: Brain inspired Adversarial Visual Attention Network

Authors: Heng Huang, Lin Zhao, Xintao Hu, Haixing Dai, Lu Zhang, Dajiang Zhu, Tianming Liu

Abstract: Visual attention is a fundamental mechanism in the human brain, and it inspires the design of attention mechanisms in deep neural networks. However, most of the visual attention studies adopted eye-tracking data rather than the direct measurement of brain activity to characterize human visual attention. In addition, the adversarial relationship between the attention-related objects and attention-n… ▽ More Visual attention is a fundamental mechanism in the human brain, and it inspires the design of attention mechanisms in deep neural networks. However, most of the visual attention studies adopted eye-tracking data rather than the direct measurement of brain activity to characterize human visual attention. In addition, the adversarial relationship between the attention-related objects and attention-neglected background in the human visual system was not fully exploited. To bridge these gaps, we propose a novel brain-inspired adversarial visual attention network (BI-AVAN) to characterize human visual attention directly from functional brain activity. Our BI-AVAN model imitates the biased competition process between attention-related/neglected objects to identify and locate the visual objects in a movie frame the human brain focuses on in an unsupervised manner. We use independent eye-tracking data as ground truth for validation and experimental results show that our model achieves robust and promising results when inferring meaningful human visual attention and map** the relationship between brain activities and visual stimuli. Our BI-AVAN model contributes to the emerging field of leveraging the brain's functional architecture to inspire and guide the model design in artificial intelligence (AI), e.g., deep neural networks. △ Less

Submitted 27 October, 2022; originally announced October 2022.

arXiv:2210.13521 [pdf, other]

doi 10.1038/s41467-023-36870-w

Sub-1 Volt and High-Bandwidth Visible to Near-Infrared Electro-Optic Modulators

Authors: Dylan Renaud, Daniel Rimoli Assumpcao, Graham Joe, Amirhassan Shams-Ansari, Di Zhu, Yaowen Hu, Neil Sinclair, Marko Loncar

Abstract: Integrated electro-optic (EO) modulators are fundamental photonics components with utility in domains ranging from digital communications to quantum information processing. At telecommunication wavelengths, thin-film lithium niobate modulators exhibit state-of-the-art performance in voltage-length product ($V_π$L), optical loss, and EO bandwidth. However, applications in optical imaging, optogenet… ▽ More Integrated electro-optic (EO) modulators are fundamental photonics components with utility in domains ranging from digital communications to quantum information processing. At telecommunication wavelengths, thin-film lithium niobate modulators exhibit state-of-the-art performance in voltage-length product ($V_π$L), optical loss, and EO bandwidth. However, applications in optical imaging, optogenetics, and quantum science generally require devices operating in the visible-to-near-infrared (VNIR) wavelength range. In this work, we realize VNIR amplitude and phase modulators featuring $V_π$L's of sub-1 V$\cdot\,$cm, low optical loss, and high bandwidth EO response. Our Mach-Zehnder modulators exhibit a $V_π$L as low as 0.55 V$\cdot\,$cm at 738 nm, and EO bandwidths in excess of 35 GHz. Furthermore, we highlight the new opportunities these high-performance modulators offer by demonstrating the first integrated EO frequency combs at VNIR wavelengths, with over 50 lines and tunable spacing, and the first frequency shifting of pulsed light beyond its intrinsic bandwidth (up to 7x Fourier limit) by an EO shearing method. △ Less

Submitted 8 February, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

Comments: 11 pages, 4 figures

arXiv:2210.12693 [pdf, other]

Coupling User Preference with External Rewards to Enable Driver-centered and Resource-aware EV Charging Recommendation

Authors: Chengyin Li, Zheng Dong, Nathan Fisher, Dongxiao Zhu

Abstract: Electric Vehicle (EV) charging recommendation that both accommodates user preference and adapts to the ever-changing external environment arises as a cost-effective strategy to alleviate the range anxiety of private EV drivers. Previous studies focus on centralized strategies to achieve optimized resource allocation, particularly useful for privacy-indifferent taxi fleets and fixed-route public tr… ▽ More Electric Vehicle (EV) charging recommendation that both accommodates user preference and adapts to the ever-changing external environment arises as a cost-effective strategy to alleviate the range anxiety of private EV drivers. Previous studies focus on centralized strategies to achieve optimized resource allocation, particularly useful for privacy-indifferent taxi fleets and fixed-route public transits. However, private EV driver seeks a more personalized and resource-aware charging recommendation that is tailor-made to accommodate the user preference (when and where to charge) yet sufficiently adaptive to the spatiotemporal mismatch between charging supply and demand. Here we propose a novel Regularized Actor-Critic (RAC) charging recommendation approach that would allow each EV driver to strike an optimal balance between the user preference (historical charging pattern) and the external reward (driving distance and wait time). Experimental results on two real-world datasets demonstrate the unique features and superior performance of our approach to the competing methods. △ Less

Submitted 23 October, 2022; originally announced October 2022.

Comments: 16 pages, 5 figures. To appear in the Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2022)

arXiv:2210.12440 [pdf, other]

Spectrum-BERT: Pre-training of Deep Bidirectional Transformers for Spectral Classification of Chinese Liquors

Authors: Yansong Wang, Yundong Sun, Yansheng Fu, Dongjie Zhu, Zhaoshuo Tian

Abstract: Spectral detection technology, as a non-invasive method for rapid detection of substances, combined with deep learning algorithms, has been widely used in food detection. However, in real scenarios, acquiring and labeling spectral data is an extremely labor-intensive task, which makes it impossible to provide enough high-quality data for training efficient supervised deep learning models. To bette… ▽ More Spectral detection technology, as a non-invasive method for rapid detection of substances, combined with deep learning algorithms, has been widely used in food detection. However, in real scenarios, acquiring and labeling spectral data is an extremely labor-intensive task, which makes it impossible to provide enough high-quality data for training efficient supervised deep learning models. To better leverage limited samples, we apply pre-training & fine-tuning paradigm to the field of spectral detection for the first time and propose a pre-training method of deep bidirectional transformers for spectral classification of Chinese liquors, abbreviated as Spectrum-BERT. Specifically, first, to retain the model's sensitivity to the characteristic peak position and local information of the spectral curve, we innovatively partition the curve into multiple blocks and obtain the embeddings of different blocks, as the feature input for the next calculation. Second, in the pre-training stage, we elaborately design two pre-training tasks, Next Curve Prediction (NCP) and Masked Curve Model (MCM), so that the model can effectively utilize unlabeled samples to capture the potential knowledge of spectral data, breaking the restrictions of the insufficient labeled samples, and improving the applicability and performance of the model in practical scenarios. Finally, we conduct a large number of experiments on the real liquor spectral dataset. In the comparative experiments, the proposed Spectrum-BERT significantly outperforms the baselines in multiple metrics and this advantage is more significant on the imbalanced dataset. Moreover, in the parameter sensitivity experiment, we also analyze the model performance under different parameter settings, to provide a reference for subsequent research. △ Less

Submitted 22 October, 2022; originally announced October 2022.

Comments: 12 pages, 8 figures

arXiv:2210.11711 [pdf, ps, other]

Modelling Multi-relations for Convolutional-based Knowledge Graph Embedding

Authors: Sirui Li, Kok Wai Wong, Dengya Zhu, Chun Che Fung

Abstract: Representation learning of knowledge graphs aims to embed entities and relations into low-dimensional vectors. Most existing works only consider the direct relations or paths between an entity pair. It is considered that such approaches disconnect the semantic connection of multi-relations between an entity pair, and we propose a convolutional and multi-relational representation learning model, Co… ▽ More Representation learning of knowledge graphs aims to embed entities and relations into low-dimensional vectors. Most existing works only consider the direct relations or paths between an entity pair. It is considered that such approaches disconnect the semantic connection of multi-relations between an entity pair, and we propose a convolutional and multi-relational representation learning model, ConvMR. The proposed ConvMR model addresses the multi-relation issue in two aspects: (1) Encoding the multi-relations between an entity pair into a unified vector that maintains the semantic connection. (2) Since not all relations are necessary while joining multi-relations, we propose an attention-based relation encoder to automatically assign weights to different relations based on semantic hierarchy. Experimental results on two popular datasets, FB15k-237 and WN18RR, achieved consistent improvements on the mean rank. We also found that ConvMR is efficient to deal with less frequent entities. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 26th International Conference KES2022

arXiv:2210.08218 [pdf]

Massive MIMO Evolution Towards 3GPP Release 18

Authors: Huang** **, Kunpeng Liu, Gilwon Lee, Emad J. Farag, Min Zhang, Dalin Zhu, Leiming Zhang, Eko Onggosanusi, Mansoor Shafi, Harsh Tataria

Abstract: Since the introduction of fifth-generation new radio (5G-NR) in Third Generation Partnership Project (3GPP) Release 15, swift progress has been made to evolve 5G with 3GPP Release 18 emerging. A critical aspect is the design of massive multiple-input multiple-output (MIMO) technology. In this line, this paper makes several important contributions: We provide a comprehensive overview of the evoluti… ▽ More Since the introduction of fifth-generation new radio (5G-NR) in Third Generation Partnership Project (3GPP) Release 15, swift progress has been made to evolve 5G with 3GPP Release 18 emerging. A critical aspect is the design of massive multiple-input multiple-output (MIMO) technology. In this line, this paper makes several important contributions: We provide a comprehensive overview of the evolution of standardized massive MIMO features from 3GPP Release 15 to 17 for both time/frequency-division duplex operation across bands FR-1 and FR-2. We analyze the progress on channel state information (CSI) frameworks, beam management frameworks and present enhancements for uplink CSI. We shed light on emerging 3GPP Release 18 problems requiring imminent attention. These include advanced codebook design and sounding reference signal design for coherent joint transmission (CJT) with multiple transmission/reception points (multi- TRPs). We discuss advancements in uplink demodulation reference signal design, enhancements for mobility to provide accurate CSI estimates, and unified transmission configuration indicator framework tailored for FR-2 bands. For each concept, we provide system level simulation results to highlight their performance benefits. Via field trials in an outdoor environment at Shanghai Jiaotong University, we demonstrate the gains of multi-TRP CJT relative to single TRP at 3.7 GHz. △ Less

Submitted 15 October, 2022; originally announced October 2022.

Comments: 23 pages, 37 Figures, one fig in the annex

arXiv:2210.03963 [pdf, other]

SDA: Simple Discrete Augmentation for Contrastive Sentence Representation Learning

Authors: Dongsheng Zhu, Zhenyu Mao, **ghui Lu, Rui Zhao, Fei Tan

Abstract: Contrastive learning has recently achieved compelling performance in unsupervised sentence representation. As an essential element, data augmentation protocols, however, have not been well explored. The pioneering work SimCSE resorting to a simple dropout mechanism (viewed as continuous augmentation) surprisingly dominates discrete augmentations such as crop**, word deletion, and synonym replace… ▽ More Contrastive learning has recently achieved compelling performance in unsupervised sentence representation. As an essential element, data augmentation protocols, however, have not been well explored. The pioneering work SimCSE resorting to a simple dropout mechanism (viewed as continuous augmentation) surprisingly dominates discrete augmentations such as crop**, word deletion, and synonym replacement as reported. To understand the underlying rationales, we revisit existing approaches and attempt to hypothesize the desiderata of reasonable data augmentation methods: balance of semantic consistency and expression diversity. We then develop three simple yet effective discrete sentence augmentation schemes: punctuation insertion, modal verbs, and double negation. They act as minimal noises at lexical level to produce diverse forms of sentences. Furthermore, standard negation is capitalized on to generate negative samples for alleviating feature suppression involved in contrastive learning. We experimented extensively with semantic textual similarity on diverse datasets. The results support the superiority of the proposed methods consistently. Our key code is available at https://github.com/Zhudongsheng75/SDA △ Less

Submitted 13 June, 2024; v1 submitted 8 October, 2022; originally announced October 2022.

Comments: Accepted by LREC-COLING 2024

arXiv:2210.03258 [pdf, other]

Interpreting County Level COVID-19 Infection and Feature Sensitivity using Deep Learning Time Series Models

Authors: Md Khairul Islam, Di Zhu, Yingzheng Liu, Andrej Erkelens, Nick Daniello, Judy Fox

Abstract: Interpretable machine learning plays a key role in healthcare because it is challenging in understanding feature importance in deep learning model predictions. We propose a novel framework that uses deep learning to study feature sensitivity for model predictions. This work combines sensitivity analysis with heterogeneous time-series deep learning model prediction, which corresponds to the interpr… ▽ More Interpretable machine learning plays a key role in healthcare because it is challenging in understanding feature importance in deep learning model predictions. We propose a novel framework that uses deep learning to study feature sensitivity for model predictions. This work combines sensitivity analysis with heterogeneous time-series deep learning model prediction, which corresponds to the interpretations of spatio-temporal features. We forecast county-level COVID-19 infection using the Temporal Fusion Transformer. We then use the sensitivity analysis extending Morris Method to see how sensitive the outputs are with respect to perturbation to our static and dynamic input features. The significance of the work is grounded in a real-world COVID-19 infection prediction with highly non-stationary, finely granular, and heterogeneous data. 1) Our model can capture the detailed daily changes of temporal and spatial model behaviors and achieves high prediction performance compared to a PyTorch baseline. 2) By analyzing the Morris sensitivity indices and attention patterns, we decipher the meaning of feature importance with observational population and dynamic model changes. 3) We have collected 2.5 years of socioeconomic and health features over 3142 US counties, such as observed cases and deaths, and a number of static (age distribution, health disparity, and industry) and dynamic features (vaccination, disease spread, transmissible cases, and social distancing). Using the proposed framework, we conduct extensive experiments and show our model can learn complex interactions and perform predictions for daily infection at the county level. Being able to model the disease infection with a hybrid prediction and description accuracy measurement with Morris index at the county level is a central idea that sheds light on individual feature interpretation via sensitivity analysis. △ Less

Submitted 6 October, 2022; originally announced October 2022.

arXiv:2210.03189 [pdf, other]

FocalUNETR: A Focal Transformer for Boundary-aware Segmentation of CT Images

Authors: Chengyin Li, Yao Qiang, Rafi Ibn Sultan, Hassan Bagher-Ebadian, Prashant Khanduri, Indrin J. Chetty, Dongxiao Zhu

Abstract: Computed Tomography (CT) based precise prostate segmentation for treatment planning is challenging due to (1) the unclear boundary of the prostate derived from CT's poor soft tissue contrast and (2) the limitation of convolutional neural network-based models in capturing long-range global context. Here we propose a novel focal transformer-based image segmentation architecture to effectively and ef… ▽ More Computed Tomography (CT) based precise prostate segmentation for treatment planning is challenging due to (1) the unclear boundary of the prostate derived from CT's poor soft tissue contrast and (2) the limitation of convolutional neural network-based models in capturing long-range global context. Here we propose a novel focal transformer-based image segmentation architecture to effectively and efficiently extract local visual features and global context from CT images. Additionally, we design an auxiliary boundary-induced label regression task coupled with the main prostate segmentation task to address the unclear boundary issue in CT images. We demonstrate that this design significantly improves the quality of the CT-based prostate segmentation task over other competing methods, resulting in substantially improved performance, i.e., higher Dice Similarity Coefficient, lower Hausdorff Distance, and Average Symmetric Surface Distance, on both private and public CT image datasets. Our code is available at this \href{https://github.com/ChengyinLee/FocalUNETR.git}{link}. △ Less

Submitted 18 July, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

Comments: 13 pages, 3 figures, 2 tables

arXiv:2210.02206 [pdf, other]

Improving Visual-Semantic Embedding with Adaptive Pooling and Optimization Objective

Authors: Zijian Zhang, Chang Shu, Ya Xiao, Yuan Shen, Di Zhu, **g Xiao, Youxin Chen, Jey Han Lau, Qian Zhang, Zheng Lu

Abstract: Visual-Semantic Embedding (VSE) aims to learn an embedding space where related visual and semantic instances are close to each other. Recent VSE models tend to design complex structures to pool visual and semantic features into fixed-length vectors and use hard triplet loss for optimization. However, we find that: (1) combining simple pooling methods is no worse than these sophisticated methods; a… ▽ More Visual-Semantic Embedding (VSE) aims to learn an embedding space where related visual and semantic instances are close to each other. Recent VSE models tend to design complex structures to pool visual and semantic features into fixed-length vectors and use hard triplet loss for optimization. However, we find that: (1) combining simple pooling methods is no worse than these sophisticated methods; and (2) only considering the most difficult-to-distinguish negative sample leads to slow convergence and poor Recall@K improvement. To this end, we propose an adaptive pooling strategy that allows the model to learn how to aggregate features through a combination of simple pooling methods. We also introduce a strategy to dynamically select a group of negative samples to make the optimization converge faster and perform better. Experimental results on Flickr30K and MS-COCO demonstrate that a standard VSE using our pooling and optimization strategies outperforms current state-of-the-art systems (at least 1.0% on the metrics of recall) in image-to-text and text-to-image retrieval. Source code of our experiments is available at https://github.com/96-Zachary/vse_2ad. △ Less

Submitted 5 October, 2022; originally announced October 2022.

arXiv:2210.01706 [pdf, other]

doi 10.1007/s10846-022-01742-w

A Fuzzy Logic-based Cascade Control without Actuator Saturation for the Unmanned Underwater Vehicle Trajectory Tracking

Authors: Danjie Zhu, Simon X. Yang, Mohammad Biglarbegian

Abstract: An intelligent control strategy is proposed to eliminate the actuator saturation problem that exists in the trajectory tracking process of unmanned underwater vehicles (UUV). The control strategy consists of two parts: for the kinematic modeling part, a fuzzy logic-refined backstep** control is developed to achieve control velocities within acceptable ranges and errors of small fluctuations; on… ▽ More An intelligent control strategy is proposed to eliminate the actuator saturation problem that exists in the trajectory tracking process of unmanned underwater vehicles (UUV). The control strategy consists of two parts: for the kinematic modeling part, a fuzzy logic-refined backstep** control is developed to achieve control velocities within acceptable ranges and errors of small fluctuations; on the basis of the velocities deducted by the improved kinematic control, the sliding mode control (SMC) is introduced in the dynamic modeling to obtain corresponding torques and forces that should be applied to the vehicle body. With the control velocities computed by the kinematic model and applied forces derived by the dynamic model, the robustness and accuracy of the UUV trajectory without actuator saturation can be achieved. △ Less

Submitted 4 October, 2022; originally announced October 2022.

Showing 201–250 of 648 results for author: Zhu, D