-
BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
Authors:
Zhen Xiang,
Fengqing Jiang,
Zidi Xiong,
Bhaskar Ramasubramanian,
Radha Poovendran,
Bo Li
Abstract:
Large language models (LLMs) are shown to benefit from chain-of-thought (COT) prompting, particularly when tackling tasks that require systematic reasoning processes. On the other hand, COT prompting also poses new vulnerabilities in the form of backdoor attacks, wherein the model will output unintended malicious content under specific backdoor-triggered conditions during inference. Traditional me…
▽ More
Large language models (LLMs) are shown to benefit from chain-of-thought (COT) prompting, particularly when tackling tasks that require systematic reasoning processes. On the other hand, COT prompting also poses new vulnerabilities in the form of backdoor attacks, wherein the model will output unintended malicious content under specific backdoor-triggered conditions during inference. Traditional methods for launching backdoor attacks involve either contaminating the training dataset with backdoored instances or directly manipulating the model parameters during deployment. However, these approaches are not practical for commercial LLMs that typically operate via API access. In this paper, we propose BadChain, the first backdoor attack against LLMs employing COT prompting, which does not require access to the training dataset or model parameters and imposes low computational overhead. BadChain leverages the inherent reasoning capabilities of LLMs by inserting a backdoor reasoning step into the sequence of reasoning steps of the model output, thereby altering the final response when a backdoor trigger exists in the query prompt. Empirically, we show the effectiveness of BadChain for two COT strategies across four LLMs (Llama2, GPT-3.5, PaLM2, and GPT-4) and six complex benchmark tasks encompassing arithmetic, commonsense, and symbolic reasoning. Moreover, we show that LLMs endowed with stronger reasoning capabilities exhibit higher susceptibility to BadChain, exemplified by a high average attack success rate of 97.0% across the six benchmark tasks on GPT-4. Finally, we propose two defenses based on shuffling and demonstrate their overall ineffectiveness against BadChain. Therefore, BadChain remains a severe threat to LLMs, underscoring the urgency for the development of robust and effective future defenses.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Removal and Selection: Improving RGB-Infrared Object Detection via Coarse-to-Fine Fusion
Authors:
Tianyi Zhao,
Maoxun Yuan,
Feng Jiang,
Nan Wang,
Xingxing Wei
Abstract:
Object detection in visible (RGB) and infrared (IR) images has been widely applied in recent years. Leveraging the complementary characteristics of RGB and IR images, the object detector provides reliable and robust object localization from day to night. Most existing fusion strategies directly input RGB and IR images into deep neural networks, leading to inferior detection performance. However, t…
▽ More
Object detection in visible (RGB) and infrared (IR) images has been widely applied in recent years. Leveraging the complementary characteristics of RGB and IR images, the object detector provides reliable and robust object localization from day to night. Most existing fusion strategies directly input RGB and IR images into deep neural networks, leading to inferior detection performance. However, the RGB and IR features have modality-specific noise, these strategies will exacerbate the fused features along with the propagation. Inspired by the mechanism of the human brain processing multimodal information, in this paper, we introduce a new coarse-to-fine perspective to purify and fuse two modality features. Specifically, following this perspective, we design a Redundant Spectrum Removal module to coarsely remove interfering information within each modality and a Dynamic Feature Selection module to finely select the desired features for feature fusion. To verify the effectiveness of the coarse-to-fine fusion strategy, we construct a new object detector called the Removal and Selection Detector (RSDet). Extensive experiments on three RGB-IR object detection datasets verify the superior performance of our method.
△ Less
Submitted 7 May, 2024; v1 submitted 19 January, 2024;
originally announced January 2024.
-
Bridging Research and Readers: A Multi-Modal Automated Academic Papers Interpretation System
Authors:
Feng Jiang,
Kuang Wang,
Haizhou Li
Abstract:
In the contemporary information era, significantly accelerated by the advent of Large-scale Language Models, the proliferation of scientific literature is reaching unprecedented levels. Researchers urgently require efficient tools for reading and summarizing academic papers, uncovering significant scientific literature, and employing diverse interpretative methodologies. To address this burgeoning…
▽ More
In the contemporary information era, significantly accelerated by the advent of Large-scale Language Models, the proliferation of scientific literature is reaching unprecedented levels. Researchers urgently require efficient tools for reading and summarizing academic papers, uncovering significant scientific literature, and employing diverse interpretative methodologies. To address this burgeoning demand, the role of automated scientific literature interpretation systems has become paramount. However, prevailing models, both commercial and open-source, confront notable challenges: they often overlook multimodal data, grapple with summarizing over-length texts, and lack diverse user interfaces. In response, we introduce an open-source multi-modal automated academic paper interpretation system (MMAPIS) with three-step process stages, incorporating LLMs to augment its functionality. Our system first employs the hybrid modality preprocessing and alignment module to extract plain text, and tables or figures from documents separately. It then aligns this information based on the section names they belong to, ensuring that data with identical section names are categorized under the same section. Following this, we introduce a hierarchical discourse-aware summarization method. It utilizes the extracted section names to divide the article into shorter text segments, facilitating specific summarizations both within and between sections via LLMs with specific prompts. Finally, we have designed four types of diversified user interfaces, including paper recommendation, multimodal Q\&A, audio broadcasting, and interpretation blog, which can be widely applied across various scenarios. Our qualitative and quantitative evaluations underscore the system's superiority, especially in scientific summarization, where it outperforms solutions relying solely on GPT-4.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
UPDP: A Unified Progressive Depth Pruner for CNN and Vision Transformer
Authors:
Ji Liu,
Dehua Tang,
Yuanxian Huang,
Li Zhang,
Xiaocheng Zeng,
Dong Li,
Mingjie Lu,
**zhang Peng,
Yu Wang,
Fan Jiang,
Lu Tian,
Ashish Sirasao
Abstract:
Traditional channel-wise pruning methods by reducing network channels struggle to effectively prune efficient CNN models with depth-wise convolutional layers and certain efficient modules, such as popular inverted residual blocks. Prior depth pruning methods by reducing network depths are not suitable for pruning some efficient models due to the existence of some normalization layers. Moreover, fi…
▽ More
Traditional channel-wise pruning methods by reducing network channels struggle to effectively prune efficient CNN models with depth-wise convolutional layers and certain efficient modules, such as popular inverted residual blocks. Prior depth pruning methods by reducing network depths are not suitable for pruning some efficient models due to the existence of some normalization layers. Moreover, finetuning subnet by directly removing activation layers would corrupt the original model weights, hindering the pruned model from achieving high performance. To address these issues, we propose a novel depth pruning method for efficient models. Our approach proposes a novel block pruning strategy and progressive training method for the subnet. Additionally, we extend our pruning method to vision transformer models. Experimental results demonstrate that our method consistently outperforms existing depth pruning methods across various pruning configurations. We obtained three pruned ConvNeXtV1 models with our method applying on ConvNeXtV1, which surpass most SOTA efficient models with comparable inference performance. Our method also achieves state-of-the-art pruning performance on the vision transformer model.
△ Less
Submitted 12 January, 2024;
originally announced January 2024.
-
Brave: Byzantine-Resilient and Privacy-Preserving Peer-to-Peer Federated Learning
Authors:
Zhangchen Xu,
Fengqing Jiang,
Luyao Niu,
**yuan Jia,
Radha Poovendran
Abstract:
Federated learning (FL) enables multiple participants to train a global machine learning model without sharing their private training data. Peer-to-peer (P2P) FL advances existing centralized FL paradigms by eliminating the server that aggregates local models from participants and then updates the global model. However, P2P FL is vulnerable to (i) honest-but-curious participants whose objective is…
▽ More
Federated learning (FL) enables multiple participants to train a global machine learning model without sharing their private training data. Peer-to-peer (P2P) FL advances existing centralized FL paradigms by eliminating the server that aggregates local models from participants and then updates the global model. However, P2P FL is vulnerable to (i) honest-but-curious participants whose objective is to infer private training data of other participants, and (ii) Byzantine participants who can transmit arbitrarily manipulated local models to corrupt the learning process. P2P FL schemes that simultaneously guarantee Byzantine resilience and preserve privacy have been less studied. In this paper, we develop Brave, a protocol that ensures Byzantine Resilience And privacy-preserving property for P2P FL in the presence of both types of adversaries. We show that Brave preserves privacy by establishing that any honest-but-curious adversary cannot infer other participants' private data by observing their models. We further prove that Brave is Byzantine-resilient, which guarantees that all benign participants converge to an identical model that deviates from a global model trained without Byzantine adversaries by a bounded distance. We evaluate Brave against three state-of-the-art adversaries on a P2P FL for image classification tasks on benchmark datasets CIFAR10 and MNIST. Our results show that the global model learned with Brave in the presence of adversaries achieves comparable classification accuracy to a global model trained in the absence of any adversary.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model
Authors:
Dongdi Zhao,
Jianbo Ma,
Lu Lu,
**ke Li,
Xuan Ji,
Lei Zhu,
Fuming Fang,
Ming Liu,
Feijun Jiang
Abstract:
Far-field speech recognition is a challenging task that conventionally uses signal processing beamforming to attack noise and interference problem. But the performance has been found usually limited due to heavy reliance on environmental assumption. In this paper, we propose a unified multichannel far-field speech recognition system that combines the neural beamforming and transformer-based Listen…
▽ More
Far-field speech recognition is a challenging task that conventionally uses signal processing beamforming to attack noise and interference problem. But the performance has been found usually limited due to heavy reliance on environmental assumption. In this paper, we propose a unified multichannel far-field speech recognition system that combines the neural beamforming and transformer-based Listen, Spell, Attend (LAS) speech recognition system, which extends the end-to-end speech recognition system further to include speech enhancement. Such framework is then jointly trained to optimize the final objective of interest. Specifically, factored complex linear projection (fCLP) has been adopted to form the neural beamforming. Several pooling strategies to combine look directions are then compared in order to find the optimal approach. Moreover, information of the source direction is also integrated in the beamforming to explore the usefulness of source direction as a prior, which is usually available especially in multi-modality scenario. Experiments on different microphone array geometry are conducted to evaluate the robustness against spacing variance of microphone array. Large in-house databases are used to evaluate the effectiveness of the proposed framework and the proposed method achieve 19.26\% improvement when compared with a strong baseline.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Robust bilinear factor analysis based on the matrix-variate $t$ distribution
Authors:
Xuan Ma,
Jianhua Zhao,
Changchun Shang,
Fen Jiang,
Philip L. H. Yu
Abstract:
Factor Analysis based on multivariate $t$ distribution ($t$fa) is a useful robust tool for extracting common factors on heavy-tailed or contaminated data. However, $t$fa is only applicable to vector data. When $t$fa is applied to matrix data, it is common to first vectorize the matrix observations. This introduces two challenges for $t$fa: (i) the inherent matrix structure of the data is broken, a…
▽ More
Factor Analysis based on multivariate $t$ distribution ($t$fa) is a useful robust tool for extracting common factors on heavy-tailed or contaminated data. However, $t$fa is only applicable to vector data. When $t$fa is applied to matrix data, it is common to first vectorize the matrix observations. This introduces two challenges for $t$fa: (i) the inherent matrix structure of the data is broken, and (ii) robustness may be lost, as vectorized matrix data typically results in a high data dimension, which could easily lead to the breakdown of $t$fa. To address these issues, starting from the intrinsic matrix structure of matrix data, a novel robust factor analysis model, namely bilinear factor analysis built on the matrix-variate $t$ distribution ($t$bfa), is proposed in this paper. The novelty is that it is capable to simultaneously extract common factors for both row and column variables of interest on heavy-tailed or contaminated matrix data. Two efficient algorithms for maximum likelihood estimation of $t$bfa are developed. Closed-form expression for the Fisher information matrix to calculate the accuracy of parameter estimates are derived. Empirical studies are conducted to understand the proposed $t$bfa model and compare with related competitors. The results demonstrate the superiority and practicality of $t$bfa. Importantly, $t$bfa exhibits a significantly higher breakdown point than $t$fa, making it more suitable for matrix data.
△ Less
Submitted 4 January, 2024;
originally announced January 2024.
-
Large Language Model Enhanced Multi-Agent Systems for 6G Communications
Authors:
Feibo Jiang,
Li Dong,
Yubo Peng,
Kezhi Wang,
Kun Yang,
Cunhua Pan,
Dusit Niyato,
Octavia A. Dobre
Abstract:
The rapid development of the Large Language Model (LLM) presents huge opportunities for 6G communications, e.g., network optimization and management by allowing users to input task requirements to LLMs by nature language. However, directly applying native LLMs in 6G encounters various challenges, such as a lack of private communication data and knowledge, limited logical reasoning, evaluation, and…
▽ More
The rapid development of the Large Language Model (LLM) presents huge opportunities for 6G communications, e.g., network optimization and management by allowing users to input task requirements to LLMs by nature language. However, directly applying native LLMs in 6G encounters various challenges, such as a lack of private communication data and knowledge, limited logical reasoning, evaluation, and refinement abilities. Integrating LLMs with the capabilities of retrieval, planning, memory, evaluation and reflection in agents can greatly enhance the potential of LLMs for 6G communications. To this end, we propose a multi-agent system with customized communication knowledge and tools for solving communication related tasks using natural language, comprising three components: (1) Multi-agent Data Retrieval (MDR), which employs the condensate and inference agents to refine and summarize communication knowledge from the knowledge base, expanding the knowledge boundaries of LLMs in 6G communications; (2) Multi-agent Collaborative Planning (MCP), which utilizes multiple planning agents to generate feasible solutions for the communication related task from different perspectives based on the retrieved knowledge; (3) Multi-agent Evaluation and Reflecxion (MER), which utilizes the evaluation agent to assess the solutions, and applies the reflexion agent and refinement agent to provide improvement suggestions for current solutions. Finally, we validate the effectiveness of the proposed multi-agent system by designing a semantic communication system, as a case study of 6G communications.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Identifying and Mitigating Vulnerabilities in LLM-Integrated Applications
Authors:
Fengqing Jiang,
Zhangchen Xu,
Luyao Niu,
Boxin Wang,
**yuan Jia,
Bo Li,
Radha Poovendran
Abstract:
Large language models (LLMs) are increasingly deployed as the service backend for LLM-integrated applications such as code completion and AI-powered search. LLM-integrated applications serve as middleware to refine users' queries with domain-specific knowledge to better inform LLMs and enhance the responses. Despite numerous opportunities and benefits, LLM-integrated applications also introduce ne…
▽ More
Large language models (LLMs) are increasingly deployed as the service backend for LLM-integrated applications such as code completion and AI-powered search. LLM-integrated applications serve as middleware to refine users' queries with domain-specific knowledge to better inform LLMs and enhance the responses. Despite numerous opportunities and benefits, LLM-integrated applications also introduce new attack surfaces. Understanding, minimizing, and eliminating these emerging attack surfaces is a new area of research. In this work, we consider a setup where the user and LLM interact via an LLM-integrated application in the middle. We focus on the communication rounds that begin with user's queries and end with LLM-integrated application returning responses to the queries, powered by LLMs at the service backend. For this query-response protocol, we identify potential vulnerabilities that can originate from the malicious application developer or from an outsider threat initiator that is able to control the database access, manipulate and poison data that are high-risk for the user. Successful exploits of the identified vulnerabilities result in the users receiving responses tailored to the intent of a threat initiator. We assess such threats against LLM-integrated applications empowered by OpenAI GPT-3.5 and GPT-4. Our empirical results show that the threats can effectively bypass the restrictions and moderation policies of OpenAI, resulting in users receiving responses that contain bias, toxic content, privacy risk, and disinformation. To mitigate those threats, we identify and define four key properties, namely integrity, source identification, attack detectability, and utility preservation, that need to be satisfied by a safe LLM-integrated application. Based on these properties, we develop a lightweight, threat-agnostic defense that mitigates both insider and outsider threats.
△ Less
Submitted 28 November, 2023; v1 submitted 7 November, 2023;
originally announced November 2023.
-
Boot and Switch: Alternating Distillation for Zero-Shot Dense Retrieval
Authors:
Fan Jiang,
Qiongkai Xu,
Tom Drummond,
Trevor Cohn
Abstract:
Neural 'dense' retrieval models are state of the art for many datasets, however these models often exhibit limited domain transfer ability. Existing approaches to adaptation are unwieldy, such as requiring explicit supervision, complex model architectures, or massive external models. We present $\texttt{ABEL}$, a simple but effective unsupervised method to enhance passage retrieval in zero-shot se…
▽ More
Neural 'dense' retrieval models are state of the art for many datasets, however these models often exhibit limited domain transfer ability. Existing approaches to adaptation are unwieldy, such as requiring explicit supervision, complex model architectures, or massive external models. We present $\texttt{ABEL}$, a simple but effective unsupervised method to enhance passage retrieval in zero-shot settings. Our technique follows a straightforward loop: a dense retriever learns from supervision signals provided by a reranker, and subsequently, the reranker is updated based on feedback from the improved retriever. By iterating this loop, the two components mutually enhance one another's performance. Experimental results demonstrate that our unsupervised $\texttt{ABEL}$ model outperforms both leading supervised and unsupervised retrievers on the BEIR benchmark. Meanwhile, it exhibits strong adaptation abilities to tasks and domains that were unseen during training. By either fine-tuning $\texttt{ABEL}$ on labelled data or integrating it with existing supervised dense retrievers, we achieve state-of-the-art results.\footnote{Source code is available at \url{https://github.com/Fantabulous-J/BootSwitch}.}
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Noisy Self-Training with Synthetic Queries for Dense Retrieval
Authors:
Fan Jiang,
Tom Drummond,
Trevor Cohn
Abstract:
Although existing neural retrieval models reveal promising results when training data is abundant and the performance keeps improving as training data increases, collecting high-quality annotated data is prohibitively costly. To this end, we introduce a novel noisy self-training framework combined with synthetic queries, showing that neural retrievers can be improved in a self-evolution manner wit…
▽ More
Although existing neural retrieval models reveal promising results when training data is abundant and the performance keeps improving as training data increases, collecting high-quality annotated data is prohibitively costly. To this end, we introduce a novel noisy self-training framework combined with synthetic queries, showing that neural retrievers can be improved in a self-evolution manner with no reliance on any external models. Experimental results show that our method improves consistently over existing methods on both general-domain (e.g., MS-MARCO) and out-of-domain (i.e., BEIR) retrieval benchmarks. Extra analysis on low-resource settings reveals that our method is data efficient and outperforms competitive baselines, with as little as 30% of labelled training data. Further extending the framework for reranker training demonstrates that the proposed method is general and yields additional gains on tasks of diverse domains.\footnote{Source code is available at \url{https://github.com/Fantabulous-J/Self-Training-DPR}}
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
How AI-driven Digital Twins Can Empower Mobile Networks
Authors:
Tong Li,
Fenyu Jiang,
Qiaohong Yu,
Wenzhen Huang,
Tao Jiang,
Depeng **
Abstract:
The growing complexity of next-generation networks exacerbates the modeling and algorithmic flaws of conventional network optimization methodology. In this paper, we propose a mobile network digital twin (MNDT) architecture for 6G networks. To address the modeling and algorithmic shortcomings, the MNDT uses a simulation-optimization structure. The feedback from the network simulation engine, which…
▽ More
The growing complexity of next-generation networks exacerbates the modeling and algorithmic flaws of conventional network optimization methodology. In this paper, we propose a mobile network digital twin (MNDT) architecture for 6G networks. To address the modeling and algorithmic shortcomings, the MNDT uses a simulation-optimization structure. The feedback from the network simulation engine, which serves as validation for the optimizer's decision outcomes, is used explicitly to train artificial intelligence (AI) empowered optimizers iteratively. In practice, we develop a network digital twin prototype system leveraging data-driven technology to accurately model the behaviors of mobile network elements (e.g., mobile users and base stations), wireless environments, and network performance. An AI-powered network optimizer has been developed based on the deployed MNDT prototype system for providing reliable and optimized network configurations. The results of the experiments demonstrate that the proposed MNDT infrastructure can provide practical network optimization solutions while adapting to the more complex environment.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs
Authors:
Junying Chen,
Xidong Wang,
Anningzhe Gao,
Feng Jiang,
Shunian Chen,
Hongbo Zhang,
Dingjie Song,
Wenya Xie,
Chuyi Kong,
Jianquan Li,
Xiang Wan,
Haizhou Li,
Benyou Wang
Abstract:
Adapting a language model into a specific domain, a.k.a `domain adaption', is a common practice when specialized knowledge, e.g. medicine, is not encapsulated in a general language model like Llama2. The challenge lies in the heterogeneity of data across the two training stages, as it varies in languages, genres, or formats. To tackle this and simplify the learning protocol, we propose to transfor…
▽ More
Adapting a language model into a specific domain, a.k.a `domain adaption', is a common practice when specialized knowledge, e.g. medicine, is not encapsulated in a general language model like Llama2. The challenge lies in the heterogeneity of data across the two training stages, as it varies in languages, genres, or formats. To tackle this and simplify the learning protocol, we propose to transform heterogeneous data, from the both pre-training and supervised stages, into a unified, simple input-output pair format. We validate the new protocol in the domains where proprietary LLMs like ChatGPT perform relatively poorly, such as Traditional Chinese Medicine. The developed model, HuatuoGPT-II, has shown state-of-the-art performance in Chinese medicine domain on a number of benchmarks, e.g. medical licensing exams. It even outperforms proprietary models like ChatGPT and GPT-4 in some aspects, especially in Traditional Chinese Medicine. Expert manual evaluations further validate HuatuoGPT-II's advantages over existing LLMs. Notably, HuatuoGPT-II was benchmarked in a fresh Chinese National Medical Licensing Examination where it achieved the best performance, showcasing not only its effectiveness but also its generalization capabilities.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Probabilistic Inference of the Structure and Orbit of Milky Way Satellites with Semi-Analytic Modeling
Authors:
Dylan Folsom,
Oren Slone,
Mariangela Lisanti,
Fangzhou Jiang,
Manoj Kaplinghat
Abstract:
Semi-analytic modeling furnishes an efficient avenue for characterizing the properties of dark matter halos associated with satellites of Milky Way-like systems, as it easily accounts for uncertainties arising from halo-to-halo variance, the orbital disruption of satellites, baryonic feedback, and the stellar-to-halo mass (SMHM) relation. We use the SatGen semi-analytic satellite generator -- whic…
▽ More
Semi-analytic modeling furnishes an efficient avenue for characterizing the properties of dark matter halos associated with satellites of Milky Way-like systems, as it easily accounts for uncertainties arising from halo-to-halo variance, the orbital disruption of satellites, baryonic feedback, and the stellar-to-halo mass (SMHM) relation. We use the SatGen semi-analytic satellite generator -- which incorporates both empirical models of the galaxy-halo connection in the field as well as analytic prescriptions for the orbital evolution of these satellites after they enter a host galaxy -- to create large samples of Milky Way-like systems and their satellites. By selecting satellites in the sample that match the observed properties of a particular dwarf galaxy, we can then infer arbitrary properties of the satellite galaxy within the Cold Dark Matter paradigm. For the Milky Way's classical dwarfs, we provide inferred values (with associated uncertainties) for the maximum circular velocity $v_{max}$ and the radius $r_{max}$ at which it occurs, varying over two choices of feedback model and two prescriptions for the SMHM relation that populate dark matter halos with physically distinct galaxies. While simple empirical scaling relations can recover the median inferred value for $v_{max}$ and $r_{max}$, this approach provides realistic correlated uncertainties and aids interpretability through variation of the model. For these different models, we also demonstrate how the internal properties of a satellite's dark matter profile correlate with its orbit, and we show that it is difficult to reproduce observations of the Fornax dwarf without strong baryonic feedback. The technique developed in this work is flexible in its application of observational data and can leverage arbitrary information about the satellite galaxies to make inferences about their dark matter halos and population statistics.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Quantifying Self-diagnostic Atomic Knowledge in Chinese Medical Foundation Model: A Computational Analysis
Authors:
Yaxin Fan,
Feng Jiang,
Benyou Wang,
Peifeng Li,
Haizhou Li
Abstract:
Foundation Models (FMs) have the potential to revolutionize the way users self-diagnose through search engines by offering direct and efficient suggestions. Recent studies primarily focused on the quality of FMs evaluated by GPT-4 or their ability to pass medical exams, no studies have quantified the extent of self-diagnostic atomic knowledge stored in FMs' memory, which is the basis of foundation…
▽ More
Foundation Models (FMs) have the potential to revolutionize the way users self-diagnose through search engines by offering direct and efficient suggestions. Recent studies primarily focused on the quality of FMs evaluated by GPT-4 or their ability to pass medical exams, no studies have quantified the extent of self-diagnostic atomic knowledge stored in FMs' memory, which is the basis of foundation models to provide factual and reliable suggestions. In this paper, we first constructed a benchmark of Self-diagnostic Atomic Knowledge (SdAK), including the most common types of atomic knowledge involved in self-diagnostic queries, with 17 atomic types and a total of 14, 048 pieces of atomic knowledge. Then, we evaluated both generic and open-source Chinese medical FMs on the benchmark. The experimental results showcase that generic FMs perform better than medical FMs in terms of self-diagnostic atomic knowledge. Error analysis revealed that both generic and medical FMs are sycophantic, e.g., always catering to users' claims when it comes to unknown knowledge. We further explored different types of data commonly adopted for fine-tuning medical FMs, i.e., real-world, semi-distilled, and distilled data, and found that distilled data can benefit FMs most. The code and data are available at https://github.com/FreedomIntelligence/SDAK.
△ Less
Submitted 1 April, 2024; v1 submitted 18 October, 2023;
originally announced October 2023.
-
Till the core collapses: the evolution and properties of self-interacting dark matter subhalos
Authors:
Zhichao Carton Zeng,
Annika H. G. Peter,
Xiaolong Du,
Shengqi Yang,
Andrew Benson,
Francis-Yan Cyr-Racine,
Fangzhou Jiang,
Charlie Mace,
R. Benton Metcalf
Abstract:
One of the hottest questions in the cosmology of self-interacting dark matter (SIDM) is whether scatterings can induce detectable core-collapse in halos by the present day. Because gravitational tides can accelerate core-collapse, the most promising targets to observe core-collapse are satellite galaxies and subhalo systems. However, simulating small subhalos is computationally intensive, especial…
▽ More
One of the hottest questions in the cosmology of self-interacting dark matter (SIDM) is whether scatterings can induce detectable core-collapse in halos by the present day. Because gravitational tides can accelerate core-collapse, the most promising targets to observe core-collapse are satellite galaxies and subhalo systems. However, simulating small subhalos is computationally intensive, especially when subhalos start to core-collapse. In this work, we present a hierarchical framework for simulating a population of SIDM subhalos, which reduces the computation time to linear order in the total number of subhalos. With this method, we simulate substructure lensing systems with multiple velocity-dependent SIDM models, and show how subhalo evolution depends on the SIDM model, subhalo mass and orbits. We find that an SIDM cross section of $\gtrsim 200$ cm$^2$/g at velocity scales relevant for subhalos' internal heat transfer is needed for a significant fraction of subhalos to core-collapse in a typical lens system at redshift $z=0.5$, and that core-collapse has unique observable features in lensing. We show quantitatively that core-collapse in subhalos is typically accelerated compared to field halos, except when the SIDM cross section is non-negligible ($\gtrsim \mathcal{O}(1)$ cm$^2$/g) at subhalos' orbital velocities, in which case evaporation by the host can delay core-collapse. This suggests that substructure lensing can be used to probe velocity-dependent SIDM models, especially if line-of-sight structures (field halos) can be distinguished from lens-plane subhalos. Intriguingly, we find that core-collapse in subhalos can explain the recently reported ultra-steep density profiles of substructures found by lensing with the \emph{Hubble Space Telescope}
△ Less
Submitted 4 November, 2023; v1 submitted 15 October, 2023;
originally announced October 2023.
-
Revisiting Multi-modal 3D Semantic Segmentation in Real-world Autonomous Driving
Authors:
Feng Jiang,
Chao** Tu,
Gang Zhang,
Jun Li,
Hanqing Huang,
Junyu Lin,
Di Feng,
Jian Pu
Abstract:
LiDAR and camera are two critical sensors for multi-modal 3D semantic segmentation and are supposed to be fused efficiently and robustly to promise safety in various real-world scenarios. However, existing multi-modal methods face two key challenges: 1) difficulty with efficient deployment and real-time execution; and 2) drastic performance degradation under weak calibration between LiDAR and came…
▽ More
LiDAR and camera are two critical sensors for multi-modal 3D semantic segmentation and are supposed to be fused efficiently and robustly to promise safety in various real-world scenarios. However, existing multi-modal methods face two key challenges: 1) difficulty with efficient deployment and real-time execution; and 2) drastic performance degradation under weak calibration between LiDAR and cameras. To address these challenges, we propose CPGNet-LCF, a new multi-modal fusion framework extending the LiDAR-only CPGNet. CPGNet-LCF solves the first challenge by inheriting the easy deployment and real-time capabilities of CPGNet. For the second challenge, we introduce a novel weak calibration knowledge distillation strategy during training to improve the robustness against the weak calibration. CPGNet-LCF achieves state-of-the-art performance on the nuScenes and SemanticKITTI benchmarks. Remarkably, it can be easily deployed to run in 20ms per frame on a single Tesla V100 GPU using TensorRT TF16 mode. Furthermore, we benchmark performance over four weak calibration levels, demonstrating the robustness of our proposed approach.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Rayleigh-Taylor Instability in Stratified Compressible Fluids with/without the Interfacial Surface Tension
Authors:
Fei Jiang,
Han Jiang,
Song Jiang
Abstract:
Guo--Tice formally established in 2011 that the Rayleigh--Taylor instability inevitably occurs within stratified compressible viscous fluids in a slab domain $\mathbb{R}^2\times (h_-,h_+)$, irrespecive of the presence of interfacial surface tension, where the instability solutions are non-periodic with respect to both horizontal spacial variables $x_1$ and $x_2$, by applying a so-called ''normal m…
▽ More
Guo--Tice formally established in 2011 that the Rayleigh--Taylor instability inevitably occurs within stratified compressible viscous fluids in a slab domain $\mathbb{R}^2\times (h_-,h_+)$, irrespecive of the presence of interfacial surface tension, where the instability solutions are non-periodic with respect to both horizontal spacial variables $x_1$ and $x_2$, by applying a so-called ''normal mode'' method and a modified variational method to the linearized (motion) equations. It is a long-standing open problem, however, whether Guo--Tice's conclusion can be rigorously verified by the (original) nonlinear equations. This challenge arises due to the failure of constructing a growing mode solution, which is non-periodic with respect to both horizontal spacial variables, to the linearized equations defined on a slab domain. In the present work, we circumvent the difficulty related to growing mode solutions by develo** an alternative approximate scheme. In essence, our approach hinges on constructing the horizontally periodic growing mode solution of the linearized equations to approximate the {\it nonlinear} Rayleigh--Taylor instability solutions, which do not exhibit horizontal periodicity. Thanks to this new approximate scheme, we can apply Guo--Hallstrom--Spirn's bootstrap instability method to the nonlinear equations in Lagrangian coordinates, and thus prove Guo--Tice's conclusion. In particular, our approximate method could also be applied to other instability solutions characterized by non-periodic motion in a slab domain, such as the Parker instability and thermal instability.
△ Less
Submitted 23 September, 2023;
originally announced September 2023.
-
BDEC:Brain Deep Embedded Clustering model
Authors:
Xiaoxiao Ma,
Chunzhi Yi,
Zhicai Zhong,
Hui Zhou,
Baichun Wei,
Haiqi Zhu,
Feng Jiang
Abstract:
An essential premise for neuroscience brain network analysis is the successful segmentation of the cerebral cortex into functionally homogeneous regions. Resting-state functional magnetic resonance imaging (rs-fMRI), capturing the spontaneous activities of the brain, provides the potential for cortical parcellation. Previous parcellation methods can be roughly categorized into three groups, mainly…
▽ More
An essential premise for neuroscience brain network analysis is the successful segmentation of the cerebral cortex into functionally homogeneous regions. Resting-state functional magnetic resonance imaging (rs-fMRI), capturing the spontaneous activities of the brain, provides the potential for cortical parcellation. Previous parcellation methods can be roughly categorized into three groups, mainly employing either local gradient, global similarity, or a combination of both. The traditional clustering algorithms, such as "K-means" and "Spectral clustering" may affect the reproducibility or the biological interpretation of parcellations; The region growing-based methods influence the expression of functional homogeneity in the brain at a large scale; The parcellation method based on probabilistic graph models inevitably introduce model assumption biases. In this work, we develop an assumption-free model called as BDEC, which leverages the robust data fitting capability of deep learning. To the best of our knowledge, this is the first study that uses deep learning algorithm for rs-fMRI-based parcellation. By comparing with nine commonly used brain parcellation methods, the BDEC model demonstrates significantly superior performance in various functional homogeneity indicators. Furthermore, it exhibits favorable results in terms of validity, network analysis, task homogeneity, and generalization capability. These results suggest that the BDEC parcellation captures the functional characteristics of the brain and holds promise for future voxel-wise brain network analysis in the dimensionality reduction of fMRI data.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
Robust Indoor Localization with Ranging-IMU Fusion
Authors:
Fan Jiang,
David Caruso,
Ashutosh Dhekne,
Qi Qu,
Jakob Julian Engel,
**g Dong
Abstract:
Indoor wireless ranging localization is a promising approach for low-power and high-accuracy localization of wearable devices. A primary challenge in this domain stems from non-line of sight propagation of radio waves. This study tackles a fundamental issue in wireless ranging: the unpredictability of real-time multipath determination, especially in challenging conditions such as when there is no…
▽ More
Indoor wireless ranging localization is a promising approach for low-power and high-accuracy localization of wearable devices. A primary challenge in this domain stems from non-line of sight propagation of radio waves. This study tackles a fundamental issue in wireless ranging: the unpredictability of real-time multipath determination, especially in challenging conditions such as when there is no direct line of sight. We achieve this by fusing range measurements with inertial measurements obtained from a low cost Inertial Measurement Unit (IMU). For this purpose, we introduce a novel asymmetric noise model crafted specifically for non-Gaussian multipath disturbances. Additionally, we present a novel Levenberg-Marquardt (LM)-family trust-region adaptation of the iSAM2 fusion algorithm, which is optimized for robust performance for our ranging-IMU fusion problem. We evaluate our solution in a densely occupied real office environment. Our proposed solution can achieve temporally consistent localization with an average absolute accuracy of $\sim$0.3m in real-world settings. Furthermore, our results indicate that we can achieve comparable accuracy even with infrequent (1Hz) range measurements.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Kinetic Suppression of Photoinduced Halide Migration in Wide Bandgap Perovskites via Surface Passivation
Authors:
Farhad Akrami,
Fangyuan Jiang,
Rajiv Giridharagopal,
David S. Ginger
Abstract:
In this work, we study the kinetics of photoinduced halide migration in FA$_{0.8}$Cs$_{0.2}$Pb(I$_{0.8}$Br$_{0.2}$)$_3$ wide (~1.69 eV) bandgap perovskites and show halide migration slows down following surface passivation with (3-aminopropyl) trimethoxysilane (APTMS). We use scanning Kelvin probe microscopy (SKPM) to probe the contact potential difference (CPD) shift under illumination, and the k…
▽ More
In this work, we study the kinetics of photoinduced halide migration in FA$_{0.8}$Cs$_{0.2}$Pb(I$_{0.8}$Br$_{0.2}$)$_3$ wide (~1.69 eV) bandgap perovskites and show halide migration slows down following surface passivation with (3-aminopropyl) trimethoxysilane (APTMS). We use scanning Kelvin probe microscopy (SKPM) to probe the contact potential difference (CPD) shift under illumination, and the kinetics of surface potential relaxation in the dark. Our results show APTMS-passivated perovskites exhibit a smaller CPD shift under illumination, and a slower surface potential relaxation in the dark. We compare the evolution of the photoluminescence spectra of APTMS-passivated and unpassivated perovskites under illumination. We find that APTMS-passivated perovskites exhibit more than 5 times slower photoluminescence redshift, consistent with the slower surface potential relaxation as observed by SKPM. These observations provide evidence for kinetic suppression of photoinduced halide migration in APTMS-passivated samples, likely due to reduced halide vacancy densities, opening avenues to more efficient and stable devices.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
A Novel Catastrophic Condition for Periodically Time-varying Convolutional Encoders Based on Time-varying Equivalent Convolutional Encoders
Authors:
Fan Jiang
Abstract:
A convolutional encoder is said to be catastrophic if it maps an information sequence of infinite weight into a code sequence of finite weight. As a consequence of this map**, a finite number of channel errors may cause an infinite number of information bit errors when decoding. This situation should be avoided. A catastrophic condition to determine if a time-invariant convolutional encoder is c…
▽ More
A convolutional encoder is said to be catastrophic if it maps an information sequence of infinite weight into a code sequence of finite weight. As a consequence of this map**, a finite number of channel errors may cause an infinite number of information bit errors when decoding. This situation should be avoided. A catastrophic condition to determine if a time-invariant convolutional encoder is catastrophic or not is stated in \cite{Massey:LSC}. Palazzo developed this condition for periodically time-varying convolutional encoders in \cite{Palazzo:Analysis}. Since Palazzo's condition is based on the state transition table of the constituent encoders, its complexity increases exponentially with the number of memory elements in the encoders. A novel catastrophic condition making use of time-varying equivalent convolutional encoders is presented in this letter. A technique to convert a catastrophic periodically time-varying convolutional encoder into a non-catastrophic one can also be developed based on these encoders. Since they do not involve the state transitions of the convolutional encoder, the time complexity of these methods grows linearly with the encoder memory.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
Large AI Model Empowered Multimodal Semantic Communications
Authors:
Feibo Jiang,
Yubo Peng,
Li Dong,
Kezhi Wang,
Kun Yang,
Cunhua Pan,
Xiaohu You
Abstract:
Multimodal signals, including text, audio, image and video, can be integrated into Semantic Communication (SC) for providing an immersive experience with low latency and high quality at the semantic level. However, the multimodal SC has several challenges, including data heterogeneity, semantic ambiguity, and signal fading. Recent advancements in large AI models, particularly in Multimodal Languag…
▽ More
Multimodal signals, including text, audio, image and video, can be integrated into Semantic Communication (SC) for providing an immersive experience with low latency and high quality at the semantic level. However, the multimodal SC has several challenges, including data heterogeneity, semantic ambiguity, and signal fading. Recent advancements in large AI models, particularly in Multimodal Language Model (MLM) and Large Language Model (LLM), offer potential solutions for these issues. To this end, we propose a Large AI Model-based Multimodal SC (LAM-MSC) framework, in which we first present the MLM-based Multimodal Alignment (MMA) that utilizes the MLM to enable the transformation between multimodal and unimodal data while preserving semantic consistency. Then, a personalized LLM-based Knowledge Base (LKB) is proposed, which allows users to perform personalized semantic extraction or recovery through the LLM. This effectively addresses the semantic ambiguity. Finally, we apply the Conditional Generative adversarial networks-based channel Estimation (CGE) to obtain Channel State Information (CSI). This approach effectively mitigates the impact of fading channels in SC. Finally, we conduct simulations that demonstrate the superior performance of the LAM-MSC framework.
△ Less
Submitted 3 September, 2023;
originally announced September 2023.
-
MDTD: A Multi Domain Trojan Detector for Deep Neural Networks
Authors:
Arezoo Rajabi,
Surudhi Asokraj,
Fengqing Jiang,
Luyao Niu,
Bhaskar Ramasubramanian,
Jim Ritcey,
Radha Poovendran
Abstract:
Machine learning models that use deep neural networks (DNNs) are vulnerable to backdoor attacks. An adversary carrying out a backdoor attack embeds a predefined perturbation called a trigger into a small subset of input samples and trains the DNN such that the presence of the trigger in the input results in an adversary-desired output class. Such adversarial retraining however needs to ensure that…
▽ More
Machine learning models that use deep neural networks (DNNs) are vulnerable to backdoor attacks. An adversary carrying out a backdoor attack embeds a predefined perturbation called a trigger into a small subset of input samples and trains the DNN such that the presence of the trigger in the input results in an adversary-desired output class. Such adversarial retraining however needs to ensure that outputs for inputs without the trigger remain unaffected and provide high classification accuracy on clean samples. In this paper, we propose MDTD, a Multi-Domain Trojan Detector for DNNs, which detects inputs containing a Trojan trigger at testing time. MDTD does not require knowledge of trigger-embedding strategy of the attacker and can be applied to a pre-trained DNN model with image, audio, or graph-based inputs. MDTD leverages an insight that input samples containing a Trojan trigger are located relatively farther away from a decision boundary than clean samples. MDTD estimates the distance to a decision boundary using adversarial learning methods and uses this distance to infer whether a test-time input sample is Trojaned or not. We evaluate MDTD against state-of-the-art Trojan detection methods across five widely used image-based datasets: CIFAR100, CIFAR10, GTSRB, SVHN, and Flowers102; four graph-based datasets: AIDS, WinMal, Toxicant, and COLLAB; and the SpeechCommand audio dataset. MDTD effectively identifies samples that contain different types of Trojan triggers. We evaluate MDTD against adaptive attacks where an adversary trains a robust DNN to increase (decrease) distance of benign (Trojan) inputs from a decision boundary.
△ Less
Submitted 2 September, 2023; v1 submitted 29 August, 2023;
originally announced August 2023.
-
LAMBO: Large Language Model Empowered Edge Intelligence
Authors:
Li Dong,
Feibo Jiang,
Yubo Peng,
Kezhi Wang,
Kun Yang,
Cunhua Pan,
Robert Schober
Abstract:
Next-generation edge intelligence is anticipated to bring huge benefits to various applications, e.g., offloading systems. However, traditional deep offloading architectures face several issues, including heterogeneous constraints, partial perception, uncertain generalization, and lack of tractability. In this context, the integration of offloading with large language models (LLMs) presents numero…
▽ More
Next-generation edge intelligence is anticipated to bring huge benefits to various applications, e.g., offloading systems. However, traditional deep offloading architectures face several issues, including heterogeneous constraints, partial perception, uncertain generalization, and lack of tractability. In this context, the integration of offloading with large language models (LLMs) presents numerous advantages. Therefore, we propose an LLM-Based Offloading (LAMBO) framework for mobile edge computing (MEC), which comprises four components: (i) Input embedding (IE), which is used to represent the information of the offloading system with constraints and prompts through learnable vectors with high quality; (ii) Asymmetric encoderdecoder (AED) model, which is a decision-making module with a deep encoder and a shallow decoder. It can achieve high performance based on multi-head self-attention schemes; (iii) Actor-critic reinforcement learning (ACRL) module, which is employed to pre-train the whole AED for different optimization tasks under corresponding prompts; and (iv) Active learning from expert feedback (ALEF), which can be used to finetune the decoder part of the AED while adapting to dynamic environmental changes. Our simulation results corroborate the advantages of the proposed LAMBO framework.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator
Authors:
Chuyi Kong,
Yaxin Fan,
Xiang Wan,
Feng Jiang,
Benyou Wang
Abstract:
The unparalleled performance of closed-sourced ChatGPT has sparked efforts towards its democratization, with notable strides made by leveraging real user and ChatGPT dialogues, as evidenced by Vicuna. However, due to challenges in gathering dialogues involving human participation, current endeavors like Baize and UltraChat rely on ChatGPT conducting roleplay to simulate humans based on instruction…
▽ More
The unparalleled performance of closed-sourced ChatGPT has sparked efforts towards its democratization, with notable strides made by leveraging real user and ChatGPT dialogues, as evidenced by Vicuna. However, due to challenges in gathering dialogues involving human participation, current endeavors like Baize and UltraChat rely on ChatGPT conducting roleplay to simulate humans based on instructions, resulting in overdependence on seeds, diminished human-likeness, limited topic diversity, and an absence of genuine multi-round conversational dynamics. To address the above issues, we propose a paradigm to simulate human behavior better and explore the benefits of incorporating more human-like questions in multi-turn conversations. Specifically, we directly target human questions extracted from genuine human-machine conversations as a learning goal and provide a novel user simulator called `Socratic'. The experimental results show our response model, `PlatoLM', achieves SoTA performance among LLaMA-based 7B models in MT-Bench. Our findings further demonstrate that our method introduces highly human-like questioning patterns and rich topic structures, which can teach the response model better than previous works in multi-round conversations.
△ Less
Submitted 27 May, 2024; v1 submitted 21 August, 2023;
originally announced August 2023.
-
Data-Driven Reachability Analysis of Pedestrians Using Behavior Modes
Authors:
August Söderlund,
Frank J. Jiang,
Vandana Narri,
Amr Alanwar,
Karl H. Johansson
Abstract:
In this paper, we present a data-driven approach for safely predicting the future state sets of pedestrians. Previous approaches to predicting the future state sets of pedestrians either do not provide safety guarantees or are overly conservative. Moreover, an additional challenge is the selection or identification of a model that sufficiently captures the motion of pedestrians. To address these i…
▽ More
In this paper, we present a data-driven approach for safely predicting the future state sets of pedestrians. Previous approaches to predicting the future state sets of pedestrians either do not provide safety guarantees or are overly conservative. Moreover, an additional challenge is the selection or identification of a model that sufficiently captures the motion of pedestrians. To address these issues, this paper introduces the idea of splitting previously collected, historical pedestrian trajectories into different behavior modes for performing data-driven reachability analysis. Through this proposed approach, we are able to use data-driven reachability analysis to capture the future state sets of pedestrians, while being less conservative and still maintaining safety guarantees. Furthermore, this approach is modular and can support different approaches for behavior splitting. To illustrate the efficacy of the approach, we implement our method with a basic behavior-splitting module and evaluate the implementation on an open-source data set of real pedestrian trajectories. In this evaluation, we find that the modal reachable sets are less conservative and more descriptive of the future state sets of the pedestrian.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
CMB: A Comprehensive Medical Benchmark in Chinese
Authors:
Xidong Wang,
Guiming Hardy Chen,
Dingjie Song,
Zhiyi Zhang,
Zhihong Chen,
Qingying Xiao,
Feng Jiang,
Jianquan Li,
Xiang Wan,
Benyou Wang,
Haizhou Li
Abstract:
Large Language Models (LLMs) provide a possibility to make a great breakthrough in medicine. The establishment of a standardized medical benchmark becomes a fundamental cornerstone to measure progression. However, medical environments in different regions have their local characteristics, e.g., the ubiquity and significance of traditional Chinese medicine within China. Therefore, merely translatin…
▽ More
Large Language Models (LLMs) provide a possibility to make a great breakthrough in medicine. The establishment of a standardized medical benchmark becomes a fundamental cornerstone to measure progression. However, medical environments in different regions have their local characteristics, e.g., the ubiquity and significance of traditional Chinese medicine within China. Therefore, merely translating English-based medical evaluation may result in \textit{contextual incongruities} to a local region. To solve the issue, we propose a localized medical benchmark called CMB, a Comprehensive Medical Benchmark in Chinese, designed and rooted entirely within the native Chinese linguistic and cultural framework. While traditional Chinese medicine is integral to this evaluation, it does not constitute its entirety. Using this benchmark, we have evaluated several prominent large-scale LLMs, including ChatGPT, GPT-4, dedicated Chinese LLMs, and LLMs specialized in the medical domain. We hope this benchmark provide first-hand experience in existing LLMs for medicine and also facilitate the widespread adoption and enhancement of medical LLMs within China. Our data and code are publicly available at https://github.com/FreedomIntelligence/CMB.
△ Less
Submitted 4 April, 2024; v1 submitted 17 August, 2023;
originally announced August 2023.
-
Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder
Authors:
Yusheng Dai,
Hang Chen,
Jun Du,
Xiaofei Ding,
Ning Ding,
Feijun Jiang,
Chin-Hui Lee
Abstract:
In recent research, slight performance improvement is observed from automatic speech recognition systems to audio-visual speech recognition systems in the end-to-end framework with low-quality videos. Unmatching convergence rates and specialized input representations between audio and visual modalities are considered to cause the problem. In this paper, we propose two novel techniques to improve a…
▽ More
In recent research, slight performance improvement is observed from automatic speech recognition systems to audio-visual speech recognition systems in the end-to-end framework with low-quality videos. Unmatching convergence rates and specialized input representations between audio and visual modalities are considered to cause the problem. In this paper, we propose two novel techniques to improve audio-visual speech recognition (AVSR) under a pre-training and fine-tuning training framework. First, we explore the correlation between lip shapes and syllable-level subword units in Mandarin to establish good frame-level syllable boundaries from lip shapes. This enables accurate alignment of video and audio streams during visual model pre-training and cross-modal fusion. Next, we propose an audio-guided cross-modal fusion encoder (CMFE) neural network to utilize main training parameters for multiple cross-modal attention layers to make full use of modality complementarity. Experiments on the MISP2021-AVSR data set show the effectiveness of the two proposed techniques. Together, using only a relatively small amount of training data, the final system achieves better performances than state-of-the-art systems with more complex front-ends and back-ends.
△ Less
Submitted 8 March, 2024; v1 submitted 14 August, 2023;
originally announced August 2023.
-
Architecture Optimization Dramatically Improves Reverse Bias Stability in Perovskite Solar Cells: A Role of Polymer Hole Transport Layers
Authors:
Fangyuan Jiang,
Yangwei Shi,
Tanka R. Rana,
Daniel Morales,
Isaac Gould,
Declan P. McCarthy,
Joel Smith,
Grey Christoforo,
Hannah Contreras,
Stephen Barlow,
Aditya D. Mohite,
Henry Snaith,
Seth R. Marder,
J. Devin MacKenzie,
Michael D. McGehee,
David S. Ginger
Abstract:
We report that device architecture engineering has a substantial impact on the reverse bias instability that has been reported as a critical issue in commercializing perovskite solar cells. We demonstrate breakdown voltages exceeding -15 V in typical pin structured perovskite solar cells via two steps: i) using polymer hole transporting materials; ii) using a more electrochemically stable gold ele…
▽ More
We report that device architecture engineering has a substantial impact on the reverse bias instability that has been reported as a critical issue in commercializing perovskite solar cells. We demonstrate breakdown voltages exceeding -15 V in typical pin structured perovskite solar cells via two steps: i) using polymer hole transporting materials; ii) using a more electrochemically stable gold electrode. While device degradation can be exacerbated by higher reverse bias and prolonged exposure, our as-fabricated perovskite solar cells completely recover their performance even after stressing at -7 V for 9 hours both in the dark and under partial illumination. Following these observations, we systematically discuss and compare the reverse bias driven degradation pathways in perovskite solar cells with different device architectures. Our model highlights the role of electrochemical reaction rates and species in dictating the reverse bias stability of perovskite solar cells.
△ Less
Submitted 15 August, 2023;
originally announced August 2023.
-
Current Status and Trends of Engineering Entrepreneurship Education in Australian Universities
Authors:
Jianhua Li,
Sophie Mckenzie,
Richard Dazeley,
Frank Jiang,
Keshav Sood
Abstract:
This research sheds light on the present and future landscape of Engineering Entrepreneurship Education (EEE) by exploring varied approaches and models adopted in Australian universities, evaluating program effectiveness, and offering recommendations for curriculum enhancement. While EEE programs have been in existence for over two decades, their efficacy remains underexplored. Using a multi-metho…
▽ More
This research sheds light on the present and future landscape of Engineering Entrepreneurship Education (EEE) by exploring varied approaches and models adopted in Australian universities, evaluating program effectiveness, and offering recommendations for curriculum enhancement. While EEE programs have been in existence for over two decades, their efficacy remains underexplored. Using a multi-method approach encompassing self-reflection, sco** review, surveys, and interviews, this study addresses key research questions regarding the state, challenges, trends, and effectiveness of EEE. Findings reveal challenges like resource limitations and propose solutions such as experiential learning and industry partnerships. These insights underscore the importance of tailored EEE and inform teaching strategies and curriculum development, benefiting educators and policymakers worldwide.
△ Less
Submitted 14 August, 2023;
originally announced August 2023.
-
Effects From Extra Die Rolls and Choosing the Highest or Lowest
Authors:
Fan Jiang,
Elvin Jiang
Abstract:
This paper looks into the gain or loss from rolling a fair die multiple times and choosing the highest or lowest number as the outcome over rolling the die just once. Specifically, this paper gives a general formula for the expected value of choosing the highest or lowest value of any number of die rolls and sides. It also shows how, for a fixed number of rolls, the ratio between this expected val…
▽ More
This paper looks into the gain or loss from rolling a fair die multiple times and choosing the highest or lowest number as the outcome over rolling the die just once. Specifically, this paper gives a general formula for the expected value of choosing the highest or lowest value of any number of die rolls and sides. It also shows how, for a fixed number of rolls, the ratio between this expected value and the number of sides converges as the number of sides increases asymptotically. The converging behavior of this ratio helps formulate the aforementioned gain or loss.
△ Less
Submitted 6 August, 2023;
originally announced August 2023.
-
GrammarGPT: Exploring Open-Source LLMs for Native Chinese Grammatical Error Correction with Supervised Fine-Tuning
Authors:
Yaxin Fan,
Feng Jiang,
Peifeng Li,
Haizhou Li
Abstract:
Grammatical error correction aims to correct ungrammatical sentences automatically. Recently, some work has demonstrated the excellent capabilities of closed-source Large Language Models (LLMs, e.g., ChatGPT) in grammatical error correction. However, the potential of open-source LLMs remains unexplored. In this paper, we introduced GrammarGPT, an open-source LLM, to preliminary explore its potenti…
▽ More
Grammatical error correction aims to correct ungrammatical sentences automatically. Recently, some work has demonstrated the excellent capabilities of closed-source Large Language Models (LLMs, e.g., ChatGPT) in grammatical error correction. However, the potential of open-source LLMs remains unexplored. In this paper, we introduced GrammarGPT, an open-source LLM, to preliminary explore its potential for native Chinese grammatical error correction. The core recipe of GrammarGPT is to leverage the hybrid dataset of ChatGPT-generated and human-annotated. For grammatical errors with clues, we proposed a heuristic method to guide ChatGPT to generate ungrammatical sentences by providing those clues. For grammatical errors without clues, we collected ungrammatical sentences from publicly available websites and manually corrected them. In addition, we employed an error-invariant augmentation method to enhance the ability of the model to correct native Chinese grammatical errors. We ultimately constructed about 1k parallel data and utilized these data to fine-tune open-source LLMs (e.g., Phoenix, released by The Chinese University of Hong Kong, Shenzhen) with instruction tuning. The experimental results show that GrammarGPT outperforms the existing SOTA system significantly. Although model parameters are 20x larger than the SOTA baseline, the required amount of data for instruction tuning is 1200x smaller, illustrating the potential of open-source LLMs on native CGEC. Our GrammarGPT ranks $3^{rd}$ on NLPCC2023 SharedTask1, demonstrating our approach's effectiveness. The code and data are available at \url{https://github.com/FreedomIntelligence/GrammarGPT}.
△ Less
Submitted 17 August, 2023; v1 submitted 25 July, 2023;
originally announced July 2023.
-
Pro-PRIME: A general Temperature-Guided Language model to engineer enhanced Stability and Activity in Proteins
Authors:
Pan Tan,
Mingchen Li,
Yuanxi Yu,
Fan Jiang,
Lirong Zheng,
Banghao Wu,
Xinyu Sun,
Liqi Kang,
Jie Song,
Liang Zhang,
Yi Xiong,
Wanli Ouyang,
Zhiqiang Hu,
Guisheng Fan,
Yufeng Pei,
Liang Hong
Abstract:
Designing protein mutants of both high stability and activity is a critical yet challenging task in protein engineering. Here, we introduce Pro-PRIME, a deep learning zero-shot model, which can suggest protein mutants of improved stability and activity without any prior experimental mutagenesis data. By leveraging temperature-guided language modelling, Pro-PRIME demonstrated superior predictive po…
▽ More
Designing protein mutants of both high stability and activity is a critical yet challenging task in protein engineering. Here, we introduce Pro-PRIME, a deep learning zero-shot model, which can suggest protein mutants of improved stability and activity without any prior experimental mutagenesis data. By leveraging temperature-guided language modelling, Pro-PRIME demonstrated superior predictive power compared to current state-of-the-art models on the public mutagenesis dataset over 33 proteins. Furthermore, we carried out wet experiments to test Pro-PRIME on five distinct proteins to engineer certain physicochemical properties, including thermal stability, rates of RNA polymerization and DNA cleavage, hydrolase activity, antigen-antibody binding affinity, or even the nonnatural properties, e.g., the ability to polymerize non-natural nucleic acid or resilience to extreme alkaline conditions. Surprisingly, about 40% AI-designed mutants show better performance than the one before mutation for all five proteins studied and for all properties targeted for engineering. Hence, Pro-PRIME demonstrates the general applicability in protein engineering.
△ Less
Submitted 13 May, 2024; v1 submitted 24 July, 2023;
originally announced July 2023.
-
Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text
Authors:
Lingyi Yang,
Feng Jiang,
Haizhou Li
Abstract:
The remarkable capabilities of large-scale language models, such as ChatGPT, in text generation have impressed readers and spurred researchers to devise detectors to mitigate potential risks, including misinformation, phishing, and academic dishonesty. Despite this, most previous studies have been predominantly geared towards creating detectors that differentiate between purely ChatGPT-generated t…
▽ More
The remarkable capabilities of large-scale language models, such as ChatGPT, in text generation have impressed readers and spurred researchers to devise detectors to mitigate potential risks, including misinformation, phishing, and academic dishonesty. Despite this, most previous studies have been predominantly geared towards creating detectors that differentiate between purely ChatGPT-generated texts and human-authored texts. This approach, however, fails to work on discerning texts generated through human-machine collaboration, such as ChatGPT-polished texts. Addressing this gap, we introduce a novel dataset termed HPPT (ChatGPT-polished academic abstracts), facilitating the construction of more robust detectors. It diverges from extant corpora by comprising pairs of human-written and ChatGPT-polished abstracts instead of purely ChatGPT-generated texts. Additionally, we propose the "Polish Ratio" method, an innovative measure of the degree of modification made by ChatGPT compared to the original human-written text. It provides a mechanism to measure the degree of ChatGPT influence in the resulting text. Our experimental results show our proposed model has better robustness on the HPPT dataset and two existing datasets (HC3 and CDB). Furthermore, the "Polish Ratio" we proposed offers a more comprehensive explanation by quantifying the degree of ChatGPT involvement.
△ Less
Submitted 30 December, 2023; v1 submitted 21 July, 2023;
originally announced July 2023.
-
Two-Sample and Change-Point Inference for Non-Euclidean Valued Time Series
Authors:
Feiyu Jiang,
Changbo Zhu,
Xiaofeng Shao
Abstract:
Data objects taking value in a general metric space have become increasingly common in modern data analysis. In this paper, we study two important statistical inference problems, namely, two-sample testing and change-point detection, for such non-Euclidean data under temporal dependence. Typical examples of non-Euclidean valued time series include yearly mortality distributions, time-varying netwo…
▽ More
Data objects taking value in a general metric space have become increasingly common in modern data analysis. In this paper, we study two important statistical inference problems, namely, two-sample testing and change-point detection, for such non-Euclidean data under temporal dependence. Typical examples of non-Euclidean valued time series include yearly mortality distributions, time-varying networks, and covariance matrix time series. To accommodate unknown temporal dependence, we advance the self-normalization (SN) technique (Shao, 2010) to the inference of non-Euclidean time series, which is substantially different from the existing SN-based inference for functional time series that reside in Hilbert space (Zhang et al., 2011). Theoretically, we propose new regularity conditions that could be easier to check than those in the recent literature, and derive the limiting distributions of the proposed test statistics under both null and local alternatives. For change-point detection problem, we also derive the consistency for the change-point location estimator, and combine our proposed change-point test with wild binary segmentation to perform multiple change-point estimation. Numerical simulations demonstrate the effectiveness and robustness of our proposed tests compared with existing methods in the literature. Finally, we apply our tests to two-sample inference in mortality data and change-point detection in cryptocurrency data.
△ Less
Submitted 9 July, 2023;
originally announced July 2023.
-
Large AI Model-Based Semantic Communications
Authors:
Feibo Jiang,
Yubo Peng,
Li Dong,
Kezhi Wang,
Kun Yang,
Cunhua Pan,
Xiaohu You
Abstract:
Semantic communication (SC) is an emerging intelligent paradigm, offering solutions for various future applications like metaverse, mixed-reality, and the Internet of everything. However, in current SC systems, the construction of the knowledge base (KB) faces several issues, including limited knowledge representation, frequent knowledge updates, and insecure knowledge sharing. Fortunately, the de…
▽ More
Semantic communication (SC) is an emerging intelligent paradigm, offering solutions for various future applications like metaverse, mixed-reality, and the Internet of everything. However, in current SC systems, the construction of the knowledge base (KB) faces several issues, including limited knowledge representation, frequent knowledge updates, and insecure knowledge sharing. Fortunately, the development of the large AI model provides new solutions to overcome above issues. Here, we propose a large AI model-based SC framework (LAM-SC) specifically designed for image data, where we first design the segment anything model (SAM)-based KB (SKB) that can split the original image into different semantic segments by universal semantic knowledge. Then, we present an attention-based semantic integration (ASI) to weigh the semantic segments generated by SKB without human participation and integrate them as the semantic-aware image. Additionally, we propose an adaptive semantic compression (ASC) encoding to remove redundant information in semantic features, thereby reducing communication overhead. Finally, through simulations, we demonstrate the effectiveness of the LAM-SC framework and the significance of the large AI model-based KB development in future SC paradigms.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
Roots, trace, and extendability of flat nonnegative smooth functions
Authors:
Fushuai Jiang
Abstract:
Building on the univariate techniques developed by Ray and Schmidt-Hieber, we study the class $\mathcal{F}^s(\mathbb{R}^n)$ of multivariate nonnegative smooth functions that are sufficiently flat near their zeroes, which guarantees that $\varphi^r$ has Hölder differentiability $rs$ whenever $\varphi \in \mathcal{F}^s$. We then construct a continuous Whitney extension map that recovers an…
▽ More
Building on the univariate techniques developed by Ray and Schmidt-Hieber, we study the class $\mathcal{F}^s(\mathbb{R}^n)$ of multivariate nonnegative smooth functions that are sufficiently flat near their zeroes, which guarantees that $\varphi^r$ has Hölder differentiability $rs$ whenever $\varphi \in \mathcal{F}^s$. We then construct a continuous Whitney extension map that recovers an $\mathcal{F}^s$ function from prescribed jets. Finally, we prove a Brudnyi-Shvartsman Finiteness Principle for the class $\mathcal{F}^s$, thereby providing a necessary and sufficient condition for a nonnegative function defined on an arbitrary subset of $\mathbb{R}^n$ to be $\mathcal{F}^s$-extendable to all of $\mathbb{R}^n$.
△ Less
Submitted 9 January, 2024; v1 submitted 28 June, 2023;
originally announced June 2023.
-
Polynomial Logical Zonotopes: A Set Representation for Reachability Analysis of Logical Systems
Authors:
Amr Alanwar,
Frank J. Jiang,
Karl H. Johansson
Abstract:
In this paper, we introduce a set representation called polynomial logical zonotopes for performing exact and computationally efficient reachability analysis on logical systems. Polynomial logical zonotopes are a generalization of logical zonotopes, which are able to represent up to 2^n binary vectors using only n generators. Due to their construction, logical zonotopes are only able to support ex…
▽ More
In this paper, we introduce a set representation called polynomial logical zonotopes for performing exact and computationally efficient reachability analysis on logical systems. Polynomial logical zonotopes are a generalization of logical zonotopes, which are able to represent up to 2^n binary vectors using only n generators. Due to their construction, logical zonotopes are only able to support exact computations of some logical operations (XOR, NOT, XNOR), while other operations (AND, NAND, OR, NOR) result in over-approximations in the reduced space, i.e., generator space. In order to perform all fundamental logical operations exactly, we formulate a generalization of logical zonotopes that is constructed by dependent generators and exponent matrices. We prove that through this polynomial-like construction, we are able to perform all of the fundamental logical operations (XOR, NOT, XNOR, AND, NAND, OR, NOR) exactly in the generator space. While we are able to perform all of the logical operations exactly, this comes with a slight increase in computational complexity compared to logical zonotopes. We show that we can use polynomial logical zonotopes to perform exact reachability analysis while retaining a low computational complexity. To illustrate and showcase the computational benefits of polynomial logical zonotopes, we present the results of performing reachability analysis on two use cases: (1) safety verification of an intersection crossing protocol and (2) reachability analysis on a high-dimensional Boolean function. Moreover, to highlight the extensibility of logical zonotopes, we include an additional use case where we perform a computationally tractable exhaustive search for the key of a linear feedback shift register.
△ Less
Submitted 1 March, 2024; v1 submitted 21 June, 2023;
originally announced June 2023.
-
OpenDataVal: a Unified Benchmark for Data Valuation
Authors:
Kevin Fu Jiang,
Weixin Liang,
James Zou,
Yongchan Kwon
Abstract:
Assessing the quality and impact of individual data points is critical for improving model performance and mitigating undesirable biases within the training dataset. Several data valuation algorithms have been proposed to quantify data quality, however, there lacks a systemic and standardized benchmarking system for data valuation. In this paper, we introduce OpenDataVal, an easy-to-use and unifie…
▽ More
Assessing the quality and impact of individual data points is critical for improving model performance and mitigating undesirable biases within the training dataset. Several data valuation algorithms have been proposed to quantify data quality, however, there lacks a systemic and standardized benchmarking system for data valuation. In this paper, we introduce OpenDataVal, an easy-to-use and unified benchmark framework that empowers researchers and practitioners to apply and compare various data valuation algorithms. OpenDataVal provides an integrated environment that includes (i) a diverse collection of image, natural language, and tabular datasets, (ii) implementations of eleven different state-of-the-art data valuation algorithms, and (iii) a prediction model API that can import any models in scikit-learn. Furthermore, we propose four downstream machine learning tasks for evaluating the quality of data values. We perform benchmarking analysis using OpenDataVal, quantifying and comparing the efficacy of state-of-the-art data valuation approaches. We find that no single algorithm performs uniformly best across all tasks, and an appropriate algorithm should be employed for a user's downstream task. OpenDataVal is publicly available at https://opendataval.github.io with comprehensive documentation. Furthermore, we provide a leaderboard where researchers can evaluate the effectiveness of their own data valuation algorithms.
△ Less
Submitted 13 October, 2023; v1 submitted 18 June, 2023;
originally announced June 2023.
-
Multi-Scale Simulation of Complex Systems: A Perspective of Integrating Knowledge and Data
Authors:
Huandong Wang,
Huan Yan,
Can Rong,
Yuan Yuan,
Fenyu Jiang,
Zhenyu Han,
Hongjie Sui,
Depeng **,
Yong Li
Abstract:
Complex system simulation has been playing an irreplaceable role in understanding, predicting, and controlling diverse complex systems. In the past few decades, the multi-scale simulation technique has drawn increasing attention for its remarkable ability to overcome the challenges of complex system simulation with unknown mechanisms and expensive computational costs. In this survey, we will syste…
▽ More
Complex system simulation has been playing an irreplaceable role in understanding, predicting, and controlling diverse complex systems. In the past few decades, the multi-scale simulation technique has drawn increasing attention for its remarkable ability to overcome the challenges of complex system simulation with unknown mechanisms and expensive computational costs. In this survey, we will systematically review the literature on multi-scale simulation of complex systems from the perspective of knowledge and data. Firstly, we will present background knowledge about simulating complex system simulation and the scales in complex systems. Then, we divide the main objectives of multi-scale modeling and simulation into five categories by considering scenarios with clear scale and scenarios with unclear scale, respectively. After summarizing the general methods for multi-scale simulation based on the clues of knowledge and data, we introduce the adopted methods to achieve different objectives. Finally, we introduce the applications of multi-scale simulation in typical matter systems and social systems.
△ Less
Submitted 17 June, 2023;
originally announced June 2023.
-
Matrix GARCH Model: Inference and Application
Authors:
Cheng Yu,
Dong Li,
Feiyu Jiang,
Ke Zhu
Abstract:
Matrix-variate time series data are largely available in applications. However, no attempt has been made to study their conditional heteroskedasticity that is often observed in economic and financial data. To address this gap, we propose a novel matrix generalized autoregressive conditional heteroskedasticity (GARCH) model to capture the dynamics of conditional row and column covariance matrices o…
▽ More
Matrix-variate time series data are largely available in applications. However, no attempt has been made to study their conditional heteroskedasticity that is often observed in economic and financial data. To address this gap, we propose a novel matrix generalized autoregressive conditional heteroskedasticity (GARCH) model to capture the dynamics of conditional row and column covariance matrices of matrix time series. The key innovation of the matrix GARCH model is the use of a univariate GARCH specification for the trace of conditional row or column covariance matrix, which allows for the identification of conditional row and column covariance matrices. Moreover, we introduce a quasi maximum likelihood estimator (QMLE) for model estimation and develop a portmanteau test for model diagnostic checking. Simulation studies are conducted to assess the finite-sample performance of the QMLE and portmanteau test. To handle large dimensional matrix time series, we also propose a matrix factor GARCH model. Finally, we demonstrate the superiority of the matrix GARCH and matrix factor GARCH models over existing multivariate GARCH-type models in volatility forecasting and portfolio allocations using three applications on credit default swap prices, global stock sector indices, and future prices.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
HuatuoGPT, towards Taming Language Model to Be a Doctor
Authors:
Hongbo Zhang,
Junying Chen,
Feng Jiang,
Fei Yu,
Zhihong Chen,
Jianquan Li,
Guiming Chen,
Xiangbo Wu,
Zhiyi Zhang,
Qingying Xiao,
Xiang Wan,
Benyou Wang,
Haizhou Li
Abstract:
In this paper, we present HuatuoGPT, a large language model (LLM) for medical consultation. The core recipe of HuatuoGPT is to leverage both \textit{distilled data from ChatGPT} and \textit{real-world data from doctors} in the supervised fine-tuned stage. The responses of ChatGPT are usually detailed, well-presented and informative while it cannot perform like a doctor in many aspects, e.g. for in…
▽ More
In this paper, we present HuatuoGPT, a large language model (LLM) for medical consultation. The core recipe of HuatuoGPT is to leverage both \textit{distilled data from ChatGPT} and \textit{real-world data from doctors} in the supervised fine-tuned stage. The responses of ChatGPT are usually detailed, well-presented and informative while it cannot perform like a doctor in many aspects, e.g. for integrative diagnosis. We argue that real-world data from doctors would be complementary to distilled data in the sense the former could tame a distilled language model to perform like doctors. To better leverage the strengths of both data, we train a reward model to align the language model with the merits that both data bring, following an RLAIF (reinforced learning from AI feedback) fashion. To evaluate and benchmark the models, we propose a comprehensive evaluation scheme (including automatic and manual metrics). Experimental results demonstrate that HuatuoGPT achieves state-of-the-art results in performing medical consultation among open-source LLMs in GPT-4 evaluation, human evaluation, and medical benchmark datasets. It is worth noting that by using additional real-world data and RLAIF, the distilled language model (i.e., HuatuoGPT) outperforms its teacher model ChatGPT in most cases. Our code, data, and models are publicly available at \url{https://github.com/FreedomIntelligence/HuatuoGPT}. The online demo is available at \url{https://www.HuatuoGPT.cn/}.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
Advancing Topic Segmentation and Outline Generation in Chinese Texts: The Paragraph-level Topic Representation, Corpus, and Benchmark
Authors:
Feng Jiang,
Weihao Liu,
Xiaomin Chu,
Peifeng Li,
Qiaoming Zhu,
Haizhou Li
Abstract:
Topic segmentation and outline generation strive to divide a document into coherent topic sections and generate corresponding subheadings, unveiling the discourse topic structure of a document. Compared with sentence-level topic structure, the paragraph-level topic structure can quickly grasp and understand the overall context of the document from a higher level, benefitting many downstream tasks…
▽ More
Topic segmentation and outline generation strive to divide a document into coherent topic sections and generate corresponding subheadings, unveiling the discourse topic structure of a document. Compared with sentence-level topic structure, the paragraph-level topic structure can quickly grasp and understand the overall context of the document from a higher level, benefitting many downstream tasks such as summarization, discourse parsing, and information retrieval. However, the lack of large-scale, high-quality Chinese paragraph-level topic structure corpora restrained relative research and applications. To fill this gap, we build the Chinese paragraph-level topic representation, corpus, and benchmark in this paper. Firstly, we propose a hierarchical paragraph-level topic structure representation with three layers to guide the corpus construction. Then, we employ a two-stage man-machine collaborative annotation method to construct the largest Chinese Paragraph-level Topic Structure corpus (CPTS), achieving high quality. We also build several strong baselines, including ChatGPT, to validate the computability of CPTS on two fundamental tasks (topic segmentation and outline generation) and preliminarily verified its usefulness for the downstream task (discourse parsing).
△ Less
Submitted 26 March, 2024; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Topic-driven Distant Supervision Framework for Macro-level Discourse Parsing
Authors:
Feng Jiang,
Longwang He,
Peifeng Li,
Qiaoming Zhu,
Haizhou Li
Abstract:
Discourse parsing, the task of analyzing the internal rhetorical structure of texts, is a challenging problem in natural language processing. Despite the recent advances in neural models, the lack of large-scale, high-quality corpora for training remains a major obstacle. Recent studies have attempted to overcome this limitation by using distant supervision, which utilizes results from other NLP t…
▽ More
Discourse parsing, the task of analyzing the internal rhetorical structure of texts, is a challenging problem in natural language processing. Despite the recent advances in neural models, the lack of large-scale, high-quality corpora for training remains a major obstacle. Recent studies have attempted to overcome this limitation by using distant supervision, which utilizes results from other NLP tasks (e.g., sentiment polarity, attention matrix, and segmentation probability) to parse discourse trees. However, these methods do not take into account the differences between in-domain and out-of-domain tasks, resulting in lower performance and inability to leverage the high-quality in-domain data for further improvement. To address these issues, we propose a distant supervision framework that leverages the relations between topic structure and rhetorical structure. Specifically, we propose two distantly supervised methods, based on transfer learning and the teacher-student model, that narrow the gap between in-domain and out-of-domain tasks through label map** and oracle annotation. Experimental results on the MCDTB and RST-DT datasets show that our methods achieve the best performance in both distant-supervised and supervised scenarios.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Uncovering the Potential of ChatGPT for Discourse Analysis in Dialogue: An Empirical Study
Authors:
Yaxin Fan,
Feng Jiang,
Peifeng Li,
Haizhou Li
Abstract:
Large language models, like ChatGPT, have shown remarkable capability in many downstream tasks, yet their ability to understand discourse structures of dialogues remains less explored, where it requires higher level capabilities of understanding and reasoning. In this paper, we aim to systematically inspect ChatGPT's performance in two discourse analysis tasks: topic segmentation and discourse par…
▽ More
Large language models, like ChatGPT, have shown remarkable capability in many downstream tasks, yet their ability to understand discourse structures of dialogues remains less explored, where it requires higher level capabilities of understanding and reasoning. In this paper, we aim to systematically inspect ChatGPT's performance in two discourse analysis tasks: topic segmentation and discourse parsing, focusing on its deep semantic understanding of linear and hierarchical discourse structures underlying dialogue. To instruct ChatGPT to complete these tasks, we initially craft a prompt template consisting of the task description, output format, and structured input. Then, we conduct experiments on four popular topic segmentation datasets and two discourse parsing datasets. The experimental results showcase that ChatGPT demonstrates proficiency in identifying topic structures in general-domain conversations yet struggles considerably in specific-domain conversations. We also found that ChatGPT hardly understands rhetorical structures that are more complex than topic structures. Our deeper investigation indicates that ChatGPT can give more reasonable topic structures than human annotations but only linearly parses the hierarchical rhetorical structures. In addition, we delve into the impact of in-context learning (e.g., chain-of-thought) on ChatGPT and conduct the ablation study on various prompt components, which can provide a research foundation for future work. The code is available at \url{https://github.com/yxfanSuda/GPTforDDA}.
△ Less
Submitted 5 March, 2024; v1 submitted 15 May, 2023;
originally announced May 2023.
-
Dual Intent Enhanced Graph Neural Network for Session-based New Item Recommendation
Authors:
Di **,
Luzhi Wang,
Yizhen Zheng,
Guojie Song,
Fei Jiang,
Xiang Li,
Wei Lin,
Shirui Pan
Abstract:
Recommender systems are essential to various fields, e.g., e-commerce, e-learning, and streaming media. At present, graph neural networks (GNNs) for session-based recommendations normally can only recommend items existing in users' historical sessions. As a result, these GNNs have difficulty recommending items that users have never interacted with (new items), which leads to a phenomenon of inform…
▽ More
Recommender systems are essential to various fields, e.g., e-commerce, e-learning, and streaming media. At present, graph neural networks (GNNs) for session-based recommendations normally can only recommend items existing in users' historical sessions. As a result, these GNNs have difficulty recommending items that users have never interacted with (new items), which leads to a phenomenon of information cocoon. Therefore, it is necessary to recommend new items to users. As there is no interaction between new items and users, we cannot include new items when building session graphs for GNN session-based recommender systems. Thus, it is challenging to recommend new items for users when using GNN-based methods. We regard this challenge as '\textbf{G}NN \textbf{S}ession-based \textbf{N}ew \textbf{I}tem \textbf{R}ecommendation (GSNIR)'. To solve this problem, we propose a dual-intent enhanced graph neural network for it. Due to the fact that new items are not tied to historical sessions, the users' intent is difficult to predict. We design a dual-intent network to learn user intent from an attention mechanism and the distribution of historical data respectively, which can simulate users' decision-making process in interacting with a new item. To solve the challenge that new items cannot be learned by GNNs, inspired by zero-shot learning (ZSL), we infer the new item representation in GNN space by using their attributes. By outputting new item probabilities, which contain recommendation scores of the corresponding items, the new items with higher scores are recommended to users. Experiments on two representative real-world datasets show the superiority of our proposed method. The case study from the real-world verifies interpretability benefits brought by the dual-intent module and the new item reasoning module. The code is available at Github: https://github.com/Ee1s/NirGNN
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
A quantitative comparison between velocity dependent SIDM cross sections constrained by the gravothermal and isothermal models
Authors:
Shengqi Yang,
Fangzhou Jiang,
Andrew Benson,
Yi-Ming Zhong,
Charlie Mace,
Xiaolong Du,
Zhichao Carton Zeng,
Annika H. G. Peter,
Moritz S. Fischer
Abstract:
One necessary step for probing the nature of self-interacting dark matter (SIDM) particles with astrophysical observations is to pin down any possible velocity dependence in the SIDM cross section. Major challenges for achieving this goal include eliminating, or mitigating, the impact of the baryonic components and tidal effects within the dark matter halos of interest -- the effects of these proc…
▽ More
One necessary step for probing the nature of self-interacting dark matter (SIDM) particles with astrophysical observations is to pin down any possible velocity dependence in the SIDM cross section. Major challenges for achieving this goal include eliminating, or mitigating, the impact of the baryonic components and tidal effects within the dark matter halos of interest -- the effects of these processes can be highly degenerate with those of dark matter self-interactions at small scales. In this work we select 9 isolated galaxies and brightest cluster galaxies (BCGs) with baryonic components small enough such that the baryonic gravitational potentials do not significantly influence the halo gravothermal evolution processes. We then constrain the parameters of Rutherford and Moller scattering cross section models with the measured rotation curves and stellar kinematics through the gravothermal fluid formalism and isothermal method. Cross sections constrained by the two methods are consistent at $1σ$ confidence level, but the isothermal method prefers cross sections greater than the gravothermal approach constraints by a factor of $\sim3$.
△ Less
Submitted 26 April, 2024; v1 submitted 8 May, 2023;
originally announced May 2023.
-
Topic Shift Detection in Chinese Dialogues: Corpus and Benchmark
Authors:
Jiangyi Lin,
Yaxin Fan,
Feng Jiang,
Xiaomin Chu,
Peifeng Li
Abstract:
Dialogue topic shift detection is to detect whether an ongoing topic has shifted or should shift in a dialogue, which can be divided into two categories, i.e., response-known task and response-unknown task. Currently, only a few investigated the latter, because it is still a challenge to predict the topic shift without the response information. In this paper, we first annotate a Chinese Natural To…
▽ More
Dialogue topic shift detection is to detect whether an ongoing topic has shifted or should shift in a dialogue, which can be divided into two categories, i.e., response-known task and response-unknown task. Currently, only a few investigated the latter, because it is still a challenge to predict the topic shift without the response information. In this paper, we first annotate a Chinese Natural Topic Dialogue (CNTD) corpus consisting of 1308 dialogues to fill the gap in the Chinese natural conversation topic corpus. And then we focus on the response-unknown task and propose a teacher-student framework based on hierarchical contrastive learning to predict the topic shift without the response. Specifically, the response at high-level teacher-student is introduced to build the contrastive learning between the response and the context, while the label contrastive learning is constructed at low-level student. The experimental results on our Chinese CNTD and English TIAGE show the effectiveness of our proposed model.
△ Less
Submitted 2 May, 2023;
originally announced May 2023.
-
Learned Focused Plenoptic Image Compression with Microimage Preprocessing and Global Attention
Authors:
Kedeng Tong,
Xin **,
Yuqing Yang,
Chen Wang,
**shi Kang,
Fan Jiang
Abstract:
Focused plenoptic cameras can record spatial and angular information of the light field (LF) simultaneously with higher spatial resolution relative to traditional plenoptic cameras, which facilitate various applications in computer vision. However, the existing plenoptic image compression methods present ineffectiveness to the captured images due to the complex micro-textures generated by the micr…
▽ More
Focused plenoptic cameras can record spatial and angular information of the light field (LF) simultaneously with higher spatial resolution relative to traditional plenoptic cameras, which facilitate various applications in computer vision. However, the existing plenoptic image compression methods present ineffectiveness to the captured images due to the complex micro-textures generated by the microlens relay imaging and long-distance correlations among the microimages. In this paper, a lossy end-to-end learning architecture is proposed to compress the focused plenoptic images efficiently. First, a data preprocessing scheme is designed according to the imaging principle to remove the sub-aperture image ineffective pixels in the recorded light field and align the microimages to the rectangular grid. Then, the global attention module with large receptive field is proposed to capture the global correlation among the feature maps using pixel-wise vector attention computed in the resampling process. Also, a new image dataset consisting of 1910 focused plenoptic images with content and depth diversity is built to benefit training and testing. Extensive experimental evaluations demonstrate the effectiveness of the proposed approach. It outperforms intra coding of HEVC and VVC by an average of 62.57% and 51.67% bitrate reduction on the 20 preprocessed focused plenoptic images, respectively. Also, it achieves 18.73% bitrate saving and generates perceptually pleasant reconstructions compared to the state-of-the-art end-to-end image compression methods, which benefits the applications of focused plenoptic cameras greatly. The dataset and code are publicly available at https://github.com/VincentChandelier/GACN.
△ Less
Submitted 30 April, 2023;
originally announced May 2023.