-
MUSE-Net: Missingness-aware mUlti-branching Self-attention Encoder for Irregular Longitudinal Electronic Health Records
Authors:
Zekai Wang,
Tieming Liu,
Bing Yao
Abstract:
The era of big data has made vast amounts of clinical data readily available, particularly in the form of electronic health records (EHRs), which provides unprecedented opportunities for develo** data-driven diagnostic tools to enhance clinical decision making. However, the application of EHRs in data-driven modeling faces challenges such as irregularly spaced multi-variate time series, issues o…
▽ More
The era of big data has made vast amounts of clinical data readily available, particularly in the form of electronic health records (EHRs), which provides unprecedented opportunities for develo** data-driven diagnostic tools to enhance clinical decision making. However, the application of EHRs in data-driven modeling faces challenges such as irregularly spaced multi-variate time series, issues of incompleteness, and data imbalance. Realizing the full data potential of EHRs hinges on the development of advanced analytical models. In this paper, we propose a novel Missingness-aware mUlti-branching Self-attention Encoder (MUSE-Net) to cope with the challenges in modeling longitudinal EHRs for data-driven disease prediction. The MUSE-Net leverages a multi-task Gaussian process (MGP) with missing value masks for data imputation, a multi-branching architecture to address the data imbalance problem, and a time-aware self-attention encoder to account for the irregularly spaced time interval in longitudinal EHRs. We evaluate the proposed MUSE-Net using both synthetic and real-world datasets. Experimental results show that our MUSE-Net outperforms existing methods that are widely used to investigate longitudinal signals.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
SPIRONet: Spatial-Frequency Learning and Topological Channel Interaction Network for Vessel Segmentation
Authors:
De-Xing Huang,
Xiao-Hu Zhou,
Xiao-Liang Xie,
Shi-Qi Liu,
Shuang-Yi Wang,
Zhen-Qiu Feng,
Mei-Jiang Gui,
Hao Li,
Tian-Yu Xiang,
Bo-Xian Yao,
Zeng-Guang Hou
Abstract:
Automatic vessel segmentation is paramount for develo** next-generation interventional navigation systems. However, current approaches suffer from suboptimal segmentation performances due to significant challenges in intraoperative images (i.e., low signal-to-noise ratio, small or slender vessels, and strong interference). In this paper, a novel spatial-frequency learning and topological channel…
▽ More
Automatic vessel segmentation is paramount for develo** next-generation interventional navigation systems. However, current approaches suffer from suboptimal segmentation performances due to significant challenges in intraoperative images (i.e., low signal-to-noise ratio, small or slender vessels, and strong interference). In this paper, a novel spatial-frequency learning and topological channel interaction network (SPIRONet) is proposed to address the above issues. Specifically, dual encoders are utilized to comprehensively capture local spatial and global frequency vessel features. Then, a cross-attention fusion module is introduced to effectively fuse spatial and frequency features, thereby enhancing feature discriminability. Furthermore, a topological channel interaction module is designed to filter out task-irrelevant responses based on graph neural networks. Extensive experimental results on several challenging datasets (CADSA, CAXF, DCA1, and XCAD) demonstrate state-of-the-art performances of our method. Moreover, the inference speed of SPIRONet is 21 FPS with a 512x512 input size, surpassing clinical real-time requirements (6~12FPS). These promising outcomes indicate SPIRONet's potential for integration into vascular interventional navigation systems. Code is available at https://github.com/Dxhuang-CASIA/SPIRONet.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Coordinated RSMA for Integrated Sensing and Communication in Emergency UAV Systems
Authors:
Binghan Yao,
Ruoguang Li,
Yingyang Chen,
Li Wang
Abstract:
Recently, unmanned aerial vehicle (UAV)-enabled integrated sensing and communication (ISAC) is emerging as a promising technique for achieving robust and rapid emergency response capabilities. Such a novel framework offers high-quality and cost-efficient C\&S services due to the intrinsic flexibility and mobility of UAVs. In parallel, rate-splitting multiple access (RSMA) is able to achieve a tail…
▽ More
Recently, unmanned aerial vehicle (UAV)-enabled integrated sensing and communication (ISAC) is emerging as a promising technique for achieving robust and rapid emergency response capabilities. Such a novel framework offers high-quality and cost-efficient C\&S services due to the intrinsic flexibility and mobility of UAVs. In parallel, rate-splitting multiple access (RSMA) is able to achieve a tailor-made communication by splitting the messages into private and common parts with adjustable rates, making it suitable for on-demand data transmission in disaster scenarios. In this paper, we propose a coordinated RSMA for integrated sensing and communication (CoRSMA-ISAC) scheme in emergency UAV system to facilitate search and rescue operations, where a number of ISAC UAVs simultaneously communicate with multiple communication survivors (CSs) and detect a potentially trapped survivor (TS) in a coordinated manner. Towards this end, an optimization problem is formulated to maximize the weighted sum rate (WSR) of the system, subject to the sensing signal-to-noise ratio (SNR) requirement. In order to solve the formulated non-convex problem, we first decompose it into three subproblems, i.e., UAV-CS association, UAV deployment, as well as beamforming optimization and rate allocation. Subsequently, we introduce an iterative optimization approach leveraging K-Means, successive convex approximation (SCA), and semi-definite relaxation (SDR) algorithms to reframe the subproblems into a more tractable form and efficiently solve them. Simulation results demonstrate that the proposed CoRSMA-ISAC scheme is superior to conventional space division multiple access (SDMA), non-orthogonal multiple access (NOMA), and orthogonal multiple access (OMA) in terms of both communication and sensing performance.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Sparse-view Signal-domain Photoacoustic Tomography Reconstruction Method Based on Neural Representation
Authors:
Bowei Yao,
Yi Zeng,
Haizhao Dai,
Qing Wu,
Youshen Xiao,
Fei Gao,
Yuyao Zhang,
**gyi Yu,
Xiran Cai
Abstract:
Photoacoustic tomography is a hybrid biomedical technology, which combines the advantages of acoustic and optical imaging. However, for the conventional image reconstruction method, the image quality is affected obviously by artifacts under the condition of sparse sampling. in this paper, a novel model-based sparse reconstruction method via implicit neural representation was proposed for improving…
▽ More
Photoacoustic tomography is a hybrid biomedical technology, which combines the advantages of acoustic and optical imaging. However, for the conventional image reconstruction method, the image quality is affected obviously by artifacts under the condition of sparse sampling. in this paper, a novel model-based sparse reconstruction method via implicit neural representation was proposed for improving the image quality reconstructed from sparse data. Specially, the initial acoustic pressure distribution was modeled as a continuous function of spatial coordinates, and parameterized by a multi-layer perceptron. The weights of multi-layer perceptron were determined by training the network in self-supervised manner. And the total variation regularization term was used to offer the prior knowledge. We compared our result with some ablation studies, and the results show that out method outperforms existing methods on simulation and experimental data. Under the sparse sampling condition, our method can suppress the artifacts and avoid the ill-posed problem effectively, which reconstruct images with higher signal-to-noise ratio and contrast-to-noise ratio than traditional methods. The high-quality results for sparse data make the proposed method hold the potential for further decreasing the hardware cost of photoacoustic tomography system.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Sunnie: An Anthropomorphic LLM-Based Conversational Agent for Mental Well-Being Activity Recommendation
Authors:
Siyi Wu,
Feixue Han,
Bingsheng Yao,
Tianyi Xie,
Xuan Zhao,
Dakuo Wang
Abstract:
A longstanding challenge in mental well-being support is the reluctance of people to adopt psychologically beneficial activities, often due to lack of motivation, low perceived trustworthiness, and limited personalization of recommendations. Chatbots have shown promise in promoting positive mental health practices, yet their rigid interaction flows and less human-like conversational experiences pr…
▽ More
A longstanding challenge in mental well-being support is the reluctance of people to adopt psychologically beneficial activities, often due to lack of motivation, low perceived trustworthiness, and limited personalization of recommendations. Chatbots have shown promise in promoting positive mental health practices, yet their rigid interaction flows and less human-like conversational experiences present significant limitations. In this work, we explore whether the anthropomorphic design (both LLM's persona design and conversational experience design) can enhance users' perception of the system and their willingness to adopt mental well-being activity recommendations. To this end, we introduce Sunnie, an anthropomorphic LLM-based conversational agent designed to offer personalized well-being support through multi-turn conversation and recommend practical actions grounded in positive psychology and social psychology. An empirical user study comparing the user experience with Sunnie and with a traditional survey-based activity recommendation system suggests that the anthropomorphic characteristics of Sunnie significantly enhance users' perception of the system and the overall usability; nevertheless, users' willingness to adopt activity recommendations did not change significantly.
△ Less
Submitted 13 June, 2024; v1 submitted 22 May, 2024;
originally announced May 2024.
-
"I Wish There Were an AI": Challenges and AI Potential in Cancer Patient-Provider Communication
Authors:
Ziqi Yang,
Xuhai Xu,
Bingsheng Yao,
Jiachen Li,
Jennifer Bagdasarian,
Guodong Gao,
Dakuo Wang
Abstract:
Patient-provider communication has been crucial to cancer patients' survival after their cancer treatments. However, the research community and patients themselves often overlook the communication challenges after cancer treatments as they are overshadowed by the severity of the patient's illness and the variety and rarity of the cancer disease itself. Meanwhile, the recent technical advances in A…
▽ More
Patient-provider communication has been crucial to cancer patients' survival after their cancer treatments. However, the research community and patients themselves often overlook the communication challenges after cancer treatments as they are overshadowed by the severity of the patient's illness and the variety and rarity of the cancer disease itself. Meanwhile, the recent technical advances in AI, especially in Large Language Models (LLMs) with versatile natural language interpretation and generation ability, demonstrate great potential to support communication in complex real-world medical situations. By interviewing six healthcare providers and eight cancer patients, our goal is to explore the providers' and patients' communication barriers in the post-cancer treatment recovery period, their expectations for future communication technologies, and the potential of AI technologies in this context. Our findings reveal several challenges in current patient-provider communication, including the knowledge and timing gaps between cancer patients and providers, their collaboration obstacles, and resource limitations. Moreover, based on providers' and patients' needs and expectations, we summarize a set of design implications for intelligent communication systems, especially with the power of LLMs. Our work sheds light on the design of future AI-powered systems for patient-provider communication under high-stake and high-uncertainty situations.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
Towards Reliable and Empathetic Depression-Diagnosis-Oriented Chats
Authors:
Kunyao Lan,
Cong Ming,
Binwei Yao,
Lu Chen,
Mengyue Wu
Abstract:
Chatbots can serve as a viable tool for preliminary depression diagnosis via interactive conversations with potential patients. Nevertheless, the blend of task-oriented and chit-chat in diagnosis-related dialogues necessitates professional expertise and empathy. Such unique requirements challenge traditional dialogue frameworks geared towards single optimization goals. To address this, we propose…
▽ More
Chatbots can serve as a viable tool for preliminary depression diagnosis via interactive conversations with potential patients. Nevertheless, the blend of task-oriented and chit-chat in diagnosis-related dialogues necessitates professional expertise and empathy. Such unique requirements challenge traditional dialogue frameworks geared towards single optimization goals. To address this, we propose an innovative ontology definition and generation framework tailored explicitly for depression diagnosis dialogues, combining the reliability of task-oriented conversations with the appeal of empathy-related chit-chat. We further apply the framework to D$^4$, the only existing public dialogue dataset on depression diagnosis-oriented chats. Exhaustive experimental results indicate significant improvements in task completion and emotional support generation in depression diagnosis, fostering a more comprehensive approach to task-oriented chat dialogue system development and its applications in digital mental health.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Rethinking the Representation in Federated Unsupervised Learning with Non-IID Data
Authors:
Xinting Liao,
Weiming Liu,
Chaochao Chen,
Pengyang Zhou,
Fengyuan Yu,
Huabin Zhu,
Binhui Yao,
Tao Wang,
Xiaolin Zheng,
Yanchao Tan
Abstract:
Federated learning achieves effective performance in modeling decentralized data. In practice, client data are not well-labeled, which makes it potential for federated unsupervised learning (FUSL) with non-IID data. However, the performance of existing FUSL methods suffers from insufficient representations, i.e., (1) representation collapse entanglement among local and global models, and (2) incon…
▽ More
Federated learning achieves effective performance in modeling decentralized data. In practice, client data are not well-labeled, which makes it potential for federated unsupervised learning (FUSL) with non-IID data. However, the performance of existing FUSL methods suffers from insufficient representations, i.e., (1) representation collapse entanglement among local and global models, and (2) inconsistent representation spaces among local models. The former indicates that representation collapse in local model will subsequently impact the global model and other local models. The latter means that clients model data representation with inconsistent parameters due to the deficiency of supervision signals. In this work, we propose FedU2 which enhances generating uniform and unified representation in FUSL with non-IID data. Specifically, FedU2 consists of flexible uniform regularizer (FUR) and efficient unified aggregator (EUA). FUR in each client avoids representation collapse via dispersing samples uniformly, and EUA in server promotes unified representation by constraining consistent client model updating. To extensively validate the performance of FedU2, we conduct both cross-device and cross-silo evaluation experiments on two benchmark datasets, i.e., CIFAR10 and CIFAR100.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
VidLA: Video-Language Alignment at Scale
Authors:
Mamshad Nayeem Rizve,
Fan Fei,
Jayakrishnan Unnikrishnan,
Son Tran,
Benjamin Z. Yao,
Belinda Zeng,
Mubarak Shah,
Trishul Chilimbi
Abstract:
In this paper, we propose VidLA, an approach for video-language alignment at scale. There are two major limitations of previous video-language alignment approaches. First, they do not capture both short-range and long-range temporal dependencies and typically employ complex hierarchical deep network architectures that are hard to integrate with existing pretrained image-text foundation models. To…
▽ More
In this paper, we propose VidLA, an approach for video-language alignment at scale. There are two major limitations of previous video-language alignment approaches. First, they do not capture both short-range and long-range temporal dependencies and typically employ complex hierarchical deep network architectures that are hard to integrate with existing pretrained image-text foundation models. To effectively address this limitation, we instead keep the network architecture simple and use a set of data tokens that operate at different temporal resolutions in a hierarchical manner, accounting for the temporally hierarchical nature of videos. By employing a simple two-tower architecture, we are able to initialize our video-language model with pretrained image-text foundation models, thereby boosting the final performance. Second, existing video-language alignment works struggle due to the lack of semantically aligned large-scale training data. To overcome it, we leverage recent LLMs to curate the largest video-language dataset to date with better visual grounding. Furthermore, unlike existing video-text datasets which only contain short clips, our dataset is enriched with video clips of varying durations to aid our temporally hierarchical data tokens in extracting better representations at varying temporal scales. Overall, empirical results show that our proposed approach surpasses state-of-the-art methods on multiple retrieval benchmarks, especially on longer videos, and performs competitively on classification benchmarks.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
The Effect of Different Optimization Strategies to Physics-Constrained Deep Learning for Soil Moisture Estimation
Authors:
Jianxin Xie,
Bing Yao,
Zheyu Jiang
Abstract:
Soil moisture is a key hydrological parameter that has significant importance to human society and the environment. Accurate modeling and monitoring of soil moisture in crop fields, especially in the root zone (top 100 cm of soil), is essential for improving agricultural production and crop yield with the help of precision irrigation and farming tools. Realizing the full sensor data potential depe…
▽ More
Soil moisture is a key hydrological parameter that has significant importance to human society and the environment. Accurate modeling and monitoring of soil moisture in crop fields, especially in the root zone (top 100 cm of soil), is essential for improving agricultural production and crop yield with the help of precision irrigation and farming tools. Realizing the full sensor data potential depends greatly on advanced analytical and predictive domain-aware models. In this work, we propose a physics-constrained deep learning (P-DL) framework to integrate physics-based principles on water transport and water sensing signals for effective reconstruction of the soil moisture dynamics. We adopt three different optimizers, namely Adam, RMSprop, and GD, to minimize the loss function of P-DL during the training process. In the illustrative case study, we demonstrate the empirical convergence of Adam optimizers outperforms the other optimization methods in both mini-batch and full-batch training.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Physics-constrained Active Learning for Soil Moisture Estimation and Optimal Sensor Placement
Authors:
Jianxin Xie,
Bing Yao,
Zheyu Jiang
Abstract:
Soil moisture is a crucial hydrological state variable that has significant importance to the global environment and agriculture. Precise monitoring of soil moisture in crop fields is critical to reducing agricultural drought and improving crop yield. In-situ soil moisture sensors, which are buried at pre-determined depths and distributed across the field, are promising solutions for monitoring so…
▽ More
Soil moisture is a crucial hydrological state variable that has significant importance to the global environment and agriculture. Precise monitoring of soil moisture in crop fields is critical to reducing agricultural drought and improving crop yield. In-situ soil moisture sensors, which are buried at pre-determined depths and distributed across the field, are promising solutions for monitoring soil moisture. However, high-density sensor deployment is neither economically feasible nor practical. Thus, to achieve a higher spatial resolution of soil moisture dynamics using a limited number of sensors, we integrate a physics-based agro-hydrological model based on Richards' equation in a physics-constrained deep learning framework to accurately predict soil moisture dynamics in the soil's root zone. This approach ensures that soil moisture estimates align well with sensor observations while obeying physical laws at the same time. Furthermore, to strategically identify the locations for sensor placement, we introduce a novel active learning framework that combines space-filling design and physics residual-based sampling to maximize data acquisition potential with limited sensors. Our numerical results demonstrate that integrating Physics-constrained Deep Learning (P-DL) with an active learning strategy within a unified framework--named the Physics-constrained Active Learning (P-DAL) framework--significantly improves the predictive accuracy and effectiveness of field-scale soil moisture monitoring using in-situ sensors.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention
Authors:
Tianyi Zhang,
Jonah Wonkyu Yi,
Bowen Yao,
Zhaozhuo Xu,
Anshumali Shrivastava
Abstract:
Large language model inference on Central Processing Units (CPU) is challenging due to the vast quantities of expensive Multiply-Add (MAD) matrix operations in the attention computations. In this paper, we argue that there is a rare gem in modern CPUs, Single-Instruction-Multiple-Data (SIMD) registers, which allow for ultra-low-latency lookups in batch. We leverage this unique capability of CPUs t…
▽ More
Large language model inference on Central Processing Units (CPU) is challenging due to the vast quantities of expensive Multiply-Add (MAD) matrix operations in the attention computations. In this paper, we argue that there is a rare gem in modern CPUs, Single-Instruction-Multiple-Data (SIMD) registers, which allow for ultra-low-latency lookups in batch. We leverage this unique capability of CPUs to propose NoMAD-Attention, an efficient attention algorithm that replaces MAD operations with in-register lookups. Through hardware-aware algorithmic designs, NoMAD-Attention achieves the computation of attention scores using repeated fast accesses to SIMD registers despite their highly limited sizes. Moreover, NoMAD-Attention works with pre-trained attention-based LLMs without model finetuning. Empirical evaluations demonstrate that NoMAD-Attention maintains the quality of the original LLMs well, and speeds up the 4-bit quantized LLaMA-7B-based model by up to 2$\times$ at 16k context length. Our results are reproducible at https://github.com/tonyzhang617/nomad-dist.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
Human-Centered Privacy Research in the Age of Large Language Models
Authors:
Tianshi Li,
Sauvik Das,
Hao-** Lee,
Dakuo Wang,
Bingsheng Yao,
Zhi** Zhang
Abstract:
The emergence of large language models (LLMs), and their increased use in user-facing systems, has led to substantial privacy concerns. To date, research on these privacy concerns has been model-centered: exploring how LLMs lead to privacy risks like memorization, or can be used to infer personal characteristics about people from their content. We argue that there is a need for more research focus…
▽ More
The emergence of large language models (LLMs), and their increased use in user-facing systems, has led to substantial privacy concerns. To date, research on these privacy concerns has been model-centered: exploring how LLMs lead to privacy risks like memorization, or can be used to infer personal characteristics about people from their content. We argue that there is a need for more research focusing on the human aspect of these privacy issues: e.g., research on how design paradigms for LLMs affect users' disclosure behaviors, users' mental models and preferences for privacy controls, and the design of tools, systems, and artifacts that empower end-users to reclaim ownership over their personal data. To build usable, efficient, and privacy-friendly systems powered by these models with imperfect privacy properties, our goal is to initiate discussions to outline an agenda for conducting human-centered research on privacy issues in LLM-powered systems. This Special Interest Group (SIG) aims to bring together researchers with backgrounds in usable security and privacy, human-AI collaboration, NLP, or any other related domains to share their perspectives and experiences on this problem, to help our community establish a collective understanding of the challenges, research opportunities, research methods, and strategies to collaborate with researchers outside of HCI.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Exploring Parent's Needs for Children-Centered AI to Support Preschoolers' Storytelling and Reading Activities
Authors:
Yuling Sun,
Jiali Liu,
Bingsheng Yao,
Jiaju Chen,
Dakuo Wang,
Xiaojuan Ma,
Yuxuan Lu,
Ying Xu,
Liang He
Abstract:
Interactive storytelling is vital for preschooler development. While children's interactive partners have traditionally been their parents and teachers, recent advances in artificial intelligence (AI) have sparked a surge of AI-based storytelling technologies. As these technologies become increasingly ubiquitous in preschoolers' lives, questions arise regarding how they function in practical story…
▽ More
Interactive storytelling is vital for preschooler development. While children's interactive partners have traditionally been their parents and teachers, recent advances in artificial intelligence (AI) have sparked a surge of AI-based storytelling technologies. As these technologies become increasingly ubiquitous in preschoolers' lives, questions arise regarding how they function in practical storytelling scenarios and, in particular, how parents, the most critical stakeholders, experience and perceive these technologies. This paper investigates these questions through a qualitative study with 17 parents of children aged 3-6. Our findings suggest that even though AI-based storytelling technologies provide more immersive and engaging interaction, they still cannot meet parents' expectations due to a series of interactive, functional, and algorithmic challenges. We elaborate on these challenges and discuss the possible implications of future AI-based storytelling technologies for preschoolers. We conclude by highlighting the design implications for future AI-based storytelling technologies.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Who Changed the Destiny of Rural Students, and How?: Unpacking ICT-Mediated Remote Education in Rural China
Authors:
Yuling Sun,
Xiuqi Zhu,
Xiaomu Zhou,
Bingsheng Yao,
Kai Zhang,
Dakuo Wang,
Jiaju Chen,
Liang He
Abstract:
The proliferation of Information and Communication Technologies (ICTs) has shown great promise in addressing educational challenges facing rural areas. However, the complex rural context poses significant challenges to the effective utilization of these technologies. This paper examines the empirical integration of live-streaming-based remote classrooms (LSRC) through a qualitative study in rural…
▽ More
The proliferation of Information and Communication Technologies (ICTs) has shown great promise in addressing educational challenges facing rural areas. However, the complex rural context poses significant challenges to the effective utilization of these technologies. This paper examines the empirical integration of live-streaming-based remote classrooms (LSRC) through a qualitative study in rural China. Our findings suggest that while LSRC enables rural students equal access to high-quality educational resources, its practical integration faces numerous challenges. In particular, we emphasize the crucial role of local teachers in addressing these challenges, ultimately achieving the desired improvement of students' learning outcomes. We also examine the impact of LSRC on the original rural education ecosystem. Building upon our findings, we call for a reconsideration of interaction paradigms and evaluation systems of ICT-mediated rural education, emphasizing the significance of rural teachers. We conclude by discussing the implications for future ICT-mediated technology interventions in rural settings.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
First-principles Based 3D Virtual Simulation Testing for Discovering SOTIF Corner Cases of Autonomous Driving
Authors:
Lehang Li,
Haokuan Wu,
Botao Yao,
Tianyu He,
Shuohan Huang,
Chuanyi Liu
Abstract:
3D virtual simulation, which generates diversified test scenarios and tests full-stack of Autonomous Driving Systems (ADSes) modules dynamically as a whole, is a promising approach for Safety of The Intended Functionality (SOTIF) ADS testing. However, as different configurations of a test scenario will affect the sensor perceptions and environment interaction, e.g. light pulses emitted by the LiDA…
▽ More
3D virtual simulation, which generates diversified test scenarios and tests full-stack of Autonomous Driving Systems (ADSes) modules dynamically as a whole, is a promising approach for Safety of The Intended Functionality (SOTIF) ADS testing. However, as different configurations of a test scenario will affect the sensor perceptions and environment interaction, e.g. light pulses emitted by the LiDAR sensor will undergo backscattering and attenuation, which is usually overlooked by existing works, leading to false positives or wrong results. Moreover, the input space of an ADS is extremely large, with infinite number of possible initial scenarios and mutations, along both temporal and spatial domains.
This paper proposes a first-principles based sensor modeling and environment interaction scheme, and integrates it into CARLA simulator. With this scheme, a long-overlooked category of adverse weather related corner cases are discovered, along with their root causes. Moreover, a meta-heuristic algorithm is designed based on several empirical insights, which guide both seed scenarios and mutations, significantly reducing the search dimensions of scenarios and enhancing the efficiency of corner case identification. Experimental results show that under identical simulation setups, our algorithm discovers about four times as many corner cases as compared to state-of-the-art work.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Organ-Level Radiation Doses from CT Scans for 10,000 Chinese Subjects Undergoing Physical Examinations: The Feasibility of AI-Based Multi-organ CT Image Segmentation and Near Real-time Monte Carlo Dose Computing
Authors:
Zirui Ye,
Bei Yao,
Haoran Zheng,
Li Tao,
Ripeng Wang,
Yang Lu,
Yankui Chang,
Xi Pei,
Zhi Chen,
Xie George Xu
Abstract:
Considering the increasing trend of physical examinations in China, the escalating frequency of Computed Tomography (CT) scans has amplified concerns regarding population radiation exposure and its consequent risks. The challenges mainly manifest in two aspects: one is the rapid construction of patient-specific human phantoms, and the other is the fast Monte Carlo (MC) simulation of radiation dose…
▽ More
Considering the increasing trend of physical examinations in China, the escalating frequency of Computed Tomography (CT) scans has amplified concerns regarding population radiation exposure and its consequent risks. The challenges mainly manifest in two aspects: one is the rapid construction of patient-specific human phantoms, and the other is the fast Monte Carlo (MC) simulation of radiation dose. Hence, this study aims to demonstrate a near real-time MC patient-specific organ dose computation method, for the first time, involving automatic segmentation across a large dataset of 11,482 subjects undergoing chest CT scans. We developed a preliminary software platform, integrating the automatic segmentation software DeepViewer and the GPU-accelerated MC engine ARCHER-CT. Comparisons with traditional dosimetry methods revealed up to 100% discrepancies for a few subjects in organ-level dose, underscoring the patient-specific method's superior accuracy. This study paves the way for more accurate radiation risk assessment, crucial in the era of patient-specific radiation dosimetry.
△ Less
Submitted 21 January, 2024;
originally announced January 2024.
-
Federated Learning via Input-Output Collaborative Distillation
Authors:
Xuan Gong,
Shanglin Li,
Yuxiang Bao,
Barry Yao,
Yawen Huang,
Ziyan Wu,
Baochang Zhang,
Yefeng Zheng,
David Doermann
Abstract:
Federated learning (FL) is a machine learning paradigm in which distributed local nodes collaboratively train a central model without sharing individually held private data. Existing FL methods either iteratively share local model parameters or deploy co-distillation. However, the former is highly susceptible to private data leakage, and the latter design relies on the prerequisites of task-releva…
▽ More
Federated learning (FL) is a machine learning paradigm in which distributed local nodes collaboratively train a central model without sharing individually held private data. Existing FL methods either iteratively share local model parameters or deploy co-distillation. However, the former is highly susceptible to private data leakage, and the latter design relies on the prerequisites of task-relevant real data. Instead, we propose a data-free FL framework based on local-to-central collaborative distillation with direct input and output space exploitation. Our design eliminates any requirement of recursive local parameter exchange or auxiliary task-relevant data to transfer knowledge, thereby giving direct privacy control to local users. In particular, to cope with the inherent data heterogeneity across locals, our technique learns to distill input on which each local model produces consensual yet unique results to represent each expertise. Our proposed FL framework achieves notable privacy-utility trade-offs with extensive experiments on image classification and segmentation tasks under various real-world heterogeneous federated learning settings on both natural and medical images.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Bergeron: Combating Adversarial Attacks through a Conscience-Based Alignment Framework
Authors:
Matthew Pisano,
Peter Ly,
Abraham Sanders,
Bingsheng Yao,
Dakuo Wang,
Tomek Strzalkowski,
Mei Si
Abstract:
Research into AI alignment has grown considerably since the recent introduction of increasingly capable Large Language Models (LLMs). Unfortunately, modern methods of alignment still fail to fully prevent harmful responses when models are deliberately attacked. These attacks can trick seemingly aligned models into giving manufacturing instructions for dangerous materials, inciting violence, or rec…
▽ More
Research into AI alignment has grown considerably since the recent introduction of increasingly capable Large Language Models (LLMs). Unfortunately, modern methods of alignment still fail to fully prevent harmful responses when models are deliberately attacked. These attacks can trick seemingly aligned models into giving manufacturing instructions for dangerous materials, inciting violence, or recommending other immoral acts. To help mitigate this issue, we introduce Bergeron: a framework designed to improve the robustness of LLMs against attacks without any additional parameter fine-tuning. Bergeron is organized into two tiers; with a secondary LLM emulating the conscience of a protected, primary LLM. This framework better safeguards the primary model against incoming attacks while monitoring its output for any harmful content. Empirical analysis shows that, by using Bergeron to complement models with existing alignment training, we can improve the robustness and safety of multiple, commonly used commercial and open-source LLMs.
△ Less
Submitted 15 March, 2024; v1 submitted 16 November, 2023;
originally announced December 2023.
-
Human Still Wins over LLM: An Empirical Study of Active Learning on Domain-Specific Annotation Tasks
Authors:
Yuxuan Lu,
Bingsheng Yao,
Shao Zhang,
Yun Wang,
Peng Zhang,
Tun Lu,
Toby Jia-Jun Li,
Dakuo Wang
Abstract:
Large Language Models (LLMs) have demonstrated considerable advances, and several claims have been made about their exceeding human performance. However, in real-world tasks, domain knowledge is often required. Low-resource learning methods like Active Learning (AL) have been proposed to tackle the cost of domain expert annotation, raising this question: Can LLMs surpass compact models trained wit…
▽ More
Large Language Models (LLMs) have demonstrated considerable advances, and several claims have been made about their exceeding human performance. However, in real-world tasks, domain knowledge is often required. Low-resource learning methods like Active Learning (AL) have been proposed to tackle the cost of domain expert annotation, raising this question: Can LLMs surpass compact models trained with expert annotations in domain-specific tasks? In this work, we conduct an empirical experiment on four datasets from three different domains comparing SOTA LLMs with small models trained on expert annotations with AL. We found that small models can outperform GPT-3.5 with a few hundreds of labeled data, and they achieve higher or similar performance with GPT-4 despite that they are hundreds time smaller. Based on these findings, we posit that LLM predictions can be used as a warmup method in real-world applications and human experts remain indispensable in tasks involving data annotation driven by domain-specific knowledge.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
More Samples or More Prompts? Exploring Effective In-Context Sampling for LLM Few-Shot Prompt Engineering
Authors:
Bingsheng Yao,
Guiming Chen,
Ruishi Zou,
Yuxuan Lu,
Jiachen Li,
Shao Zhang,
Yisi Sang,
Sijia Liu,
James Hendler,
Dakuo Wang
Abstract:
While most existing works on LLM prompting techniques focus only on how to select a better set of data samples inside one single prompt input (In-Context Learning or ICL), why can not we design and leverage multiple prompts together to further improve the LLM's performance? In this work, we propose In-Context Sampling (ICS), a low-resource LLM prompting technique to produce confident predictions b…
▽ More
While most existing works on LLM prompting techniques focus only on how to select a better set of data samples inside one single prompt input (In-Context Learning or ICL), why can not we design and leverage multiple prompts together to further improve the LLM's performance? In this work, we propose In-Context Sampling (ICS), a low-resource LLM prompting technique to produce confident predictions by optimizing the construction of multiple ICL prompt inputs. Extensive experiments with three open-source LLMs (FlanT5-XL, Mistral-7B, and Mixtral-8x7B) on four NLI datasets (e-SNLI, Multi-NLI, ANLI, and Contract-NLI) and one QA dataset (CommonsenseQA) illustrate that ICS can consistently enhance LLMs' performance. An in-depth evaluation with three data similarity-based ICS strategies suggests that these strategies can further elevate LLM's performance, which sheds light on a new yet promising future research direction.
△ Less
Submitted 2 April, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
FairytaleCQA: Integrating a Commonsense Knowledge Graph into Children's Storybook Narratives
Authors:
Jiaju Chen,
Yuxuan Lu,
Shao Zhang,
Bingsheng Yao,
Yuanzhe Dong,
Ying Xu,
Yunyao Li,
Qianwen Wang,
Dakuo Wang,
Yuling Sun
Abstract:
AI models (including LLM) often rely on narrative question-answering (QA) datasets to provide customized QA functionalities to support downstream children education applications; however, existing datasets only include QA pairs that are grounded within the given storybook content, but children can learn more when teachers refer the storybook content to real-world knowledge (e.g., commonsense knowl…
▽ More
AI models (including LLM) often rely on narrative question-answering (QA) datasets to provide customized QA functionalities to support downstream children education applications; however, existing datasets only include QA pairs that are grounded within the given storybook content, but children can learn more when teachers refer the storybook content to real-world knowledge (e.g., commonsense knowledge). We introduce the FairytaleCQA dataset, which is annotated by children education experts, to supplement 278 storybook narratives with educationally appropriate commonsense knowledge. The dataset has 5,868 QA pairs that not only originate from the storybook narrative but also contain the commonsense knowledge grounded by an external knowledge graph (i.e., ConceptNet). A follow-up experiment shows that a smaller model (T5-large) fine-tuned with FairytaleCQA reliably outperforms much larger prompt-engineered LLM (e.g., GPT-4) in this new QA-pair generation task (QAG). This result suggests that: 1) our dataset brings novel challenges to existing LLMs, and 2) human experts' data annotation are still critical as they have much nuanced knowledge that LLMs do not know in the children educational domain.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
CROP: Conservative Reward for Model-based Offline Policy Optimization
Authors:
Hao Li,
Xiao-Hu Zhou,
Xiao-Liang Xie,
Shi-Qi Liu,
Zhen-Qiu Feng,
Xiao-Yin Liu,
Mei-Jiang Gui,
Tian-Yu Xiang,
De-Xing Huang,
Bo-Xian Yao,
Zeng-Guang Hou
Abstract:
Offline reinforcement learning (RL) aims to optimize policy using collected data without online interactions. Model-based approaches are particularly appealing for addressing offline RL challenges due to their capability to mitigate the limitations of offline data through data generation using models. Prior research has demonstrated that introducing conservatism into the model or Q-function during…
▽ More
Offline reinforcement learning (RL) aims to optimize policy using collected data without online interactions. Model-based approaches are particularly appealing for addressing offline RL challenges due to their capability to mitigate the limitations of offline data through data generation using models. Prior research has demonstrated that introducing conservatism into the model or Q-function during policy optimization can effectively alleviate the prevalent distribution drift problem in offline RL. However, the investigation into the impacts of conservatism in reward estimation is still lacking. This paper proposes a novel model-based offline RL algorithm, Conservative Reward for model-based Offline Policy optimization (CROP), which conservatively estimates the reward in model training. To achieve a conservative reward estimation, CROP simultaneously minimizes the estimation error and the reward of random actions. Theoretical analysis shows that this conservative reward mechanism leads to a conservative policy evaluation and helps mitigate distribution drift. Experiments on D4RL benchmarks showcase that the performance of CROP is comparable to the state-of-the-art baselines. Notably, CROP establishes an innovative connection between offline and online RL, highlighting that offline RL problems can be tackled by adopting online RL techniques to the empirical Markov decision process trained with a conservative reward. The source code is available with https://github.com/G0K0URURI/CROP.git.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
'Don't Get Too Technical with Me': A Discourse Structure-Based Framework for Science Journalism
Authors:
Ronald Cardenas,
Bingsheng Yao,
Dakuo Wang,
Yufang Hou
Abstract:
Science journalism refers to the task of reporting technical findings of a scientific paper as a less technical news article to the general public audience. We aim to design an automated system to support this real-world task (i.e., automatic science journalism) by 1) introducing a newly-constructed and real-world dataset (SciTechNews), with tuples of a publicly-available scientific paper, its cor…
▽ More
Science journalism refers to the task of reporting technical findings of a scientific paper as a less technical news article to the general public audience. We aim to design an automated system to support this real-world task (i.e., automatic science journalism) by 1) introducing a newly-constructed and real-world dataset (SciTechNews), with tuples of a publicly-available scientific paper, its corresponding news article, and an expert-written short summary snippet; 2) proposing a novel technical framework that integrates a paper's discourse structure with its metadata to guide generation; and, 3) demonstrating with extensive automatic and human experiments that our framework outperforms other baseline methods (e.g. Alpaca and ChatGPT) in elaborating a content plan meaningful for the target audience, simplifying the information selected, and producing a coherent final report in a layman's style.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
"Mango Mango, How to Let The Lettuce Dry Without A Spinner?'': Exploring User Perceptions of Using An LLM-Based Conversational Assistant Toward Cooking Partner
Authors:
Szeyi Chan,
Jiachen Li,
Bingsheng Yao,
Amama Mahmood,
Chien-Ming Huang,
Holly Jimison,
Elizabeth D Mynatt,
Dakuo Wang
Abstract:
The rapid advancement of the Large Language Model (LLM) has created numerous potentials for integration with conversational assistants (CAs) assisting people in their daily tasks, particularly due to their extensive flexibility. However, users' real-world experiences interacting with these assistants remain unexplored. In this research, we chose cooking, a complex daily task, as a scenario to inve…
▽ More
The rapid advancement of the Large Language Model (LLM) has created numerous potentials for integration with conversational assistants (CAs) assisting people in their daily tasks, particularly due to their extensive flexibility. However, users' real-world experiences interacting with these assistants remain unexplored. In this research, we chose cooking, a complex daily task, as a scenario to investigate people's successful and unsatisfactory experiences while receiving assistance from an LLM-based CA, Mango Mango. We discovered that participants value the system's ability to provide extensive information beyond the recipe, offer customized instructions based on context, and assist them in dynamically planning the task. However, they expect the system to be more adaptive to oral conversation and provide more suggestive responses to keep users actively involved. Recognizing that users began treating our LLM-CA as a personal assistant or even a partner rather than just a recipe-reading tool, we propose several design considerations for future development.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
LLM-Powered Conversational Voice Assistants: Interaction Patterns, Opportunities, Challenges, and Design Guidelines
Authors:
Amama Mahmood,
Junxiang Wang,
Bingsheng Yao,
Dakuo Wang,
Chien-Ming Huang
Abstract:
Conventional Voice Assistants (VAs) rely on traditional language models to discern user intent and respond to their queries, leading to interactions that often lack a broader contextual understanding, an area in which Large Language Models (LLMs) excel. However, current LLMs are largely designed for text-based interactions, thus making it unclear how user interactions will evolve if their modality…
▽ More
Conventional Voice Assistants (VAs) rely on traditional language models to discern user intent and respond to their queries, leading to interactions that often lack a broader contextual understanding, an area in which Large Language Models (LLMs) excel. However, current LLMs are largely designed for text-based interactions, thus making it unclear how user interactions will evolve if their modality is changed to voice. In this work, we investigate whether LLMs can enrich VA interactions via an exploratory study with participants (N=20) using a ChatGPT-powered VA for three scenarios (medical self-diagnosis, creative planning, and debate) with varied constraints, stakes, and objectivity. We observe that LLM-powered VA elicits richer interaction patterns that vary across tasks, showing its versatility. Notably, LLMs absorb the majority of VA intent recognition failures. We additionally discuss the potential of harnessing LLMs for more resilient and fluid user-VA interactions and provide design guidelines for tailoring LLMs for voice assistance.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
Rethinking Human-AI Collaboration in Complex Medical Decision Making: A Case Study in Sepsis Diagnosis
Authors:
Shao Zhang,
Jianing Yu,
Xuhai Xu,
Changchang Yin,
Yuxuan Lu,
Bingsheng Yao,
Melanie Tory,
Lace M. Padilla,
Jeffrey Caterino,
** Zhang,
Dakuo Wang
Abstract:
Today's AI systems for medical decision support often succeed on benchmark datasets in research papers but fail in real-world deployment. This work focuses on the decision making of sepsis, an acute life-threatening systematic infection that requires an early diagnosis with high uncertainty from the clinician. Our aim is to explore the design requirements for AI systems that can support clinical e…
▽ More
Today's AI systems for medical decision support often succeed on benchmark datasets in research papers but fail in real-world deployment. This work focuses on the decision making of sepsis, an acute life-threatening systematic infection that requires an early diagnosis with high uncertainty from the clinician. Our aim is to explore the design requirements for AI systems that can support clinical experts in making better decisions for the early diagnosis of sepsis. The study begins with a formative study investigating why clinical experts abandon an existing AI-powered Sepsis predictive module in their electrical health record (EHR) system. We argue that a human-centered AI system needs to support human experts in the intermediate stages of a medical decision-making process (e.g., generating hypotheses or gathering data), instead of focusing only on the final decision. Therefore, we build SepsisLab based on a state-of-the-art AI algorithm and extend it to predict the future projection of sepsis development, visualize the prediction uncertainty, and propose actionable suggestions (i.e., which additional laboratory tests can be collected) to reduce such uncertainty. Through heuristic evaluation with six clinicians using our prototype system, we demonstrate that SepsisLab enables a promising human-AI collaboration paradigm for the future of AI-assisted sepsis diagnosis and other high-stakes medical decision making.
△ Less
Submitted 26 February, 2024; v1 submitted 17 September, 2023;
originally announced September 2023.
-
"It's a Fair Game", or Is It? Examining How Users Navigate Disclosure Risks and Benefits When Using LLM-Based Conversational Agents
Authors:
Zhi** Zhang,
Michelle Jia,
Hao-** Lee,
Bingsheng Yao,
Sauvik Das,
Ada Lerner,
Dakuo Wang,
Tianshi Li
Abstract:
The widespread use of Large Language Model (LLM)-based conversational agents (CAs), especially in high-stakes domains, raises many privacy concerns. Building ethical LLM-based CAs that respect user privacy requires an in-depth understanding of the privacy risks that concern users the most. However, existing research, primarily model-centered, does not provide insight into users' perspectives. To b…
▽ More
The widespread use of Large Language Model (LLM)-based conversational agents (CAs), especially in high-stakes domains, raises many privacy concerns. Building ethical LLM-based CAs that respect user privacy requires an in-depth understanding of the privacy risks that concern users the most. However, existing research, primarily model-centered, does not provide insight into users' perspectives. To bridge this gap, we analyzed sensitive disclosures in real-world ChatGPT conversations and conducted semi-structured interviews with 19 LLM-based CA users. We found that users are constantly faced with trade-offs between privacy, utility, and convenience when using LLM-based CAs. However, users' erroneous mental models and the dark patterns in system design limited their awareness and comprehension of the privacy risks. Additionally, the human-like interactions encouraged more sensitive disclosures, which complicated users' ability to navigate the trade-offs. We discuss practical design guidelines and the needs for paradigm shifts to protect the privacy of LLM-based CA users.
△ Less
Submitted 1 April, 2024; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Talk2Care: Facilitating Asynchronous Patient-Provider Communication with Large-Language-Model
Authors:
Ziqi Yang,
Xuhai Xu,
Bingsheng Yao,
Shao Zhang,
Ethan Rogers,
Stephen Intille,
Nawar Shara,
Guodong Gordon Gao,
Dakuo Wang
Abstract:
Despite the plethora of telehealth applications to assist home-based older adults and healthcare providers, basic messaging and phone calls are still the most common communication methods, which suffer from limited availability, information loss, and process inefficiencies. One promising solution to facilitate patient-provider communication is to leverage large language models (LLMs) with their po…
▽ More
Despite the plethora of telehealth applications to assist home-based older adults and healthcare providers, basic messaging and phone calls are still the most common communication methods, which suffer from limited availability, information loss, and process inefficiencies. One promising solution to facilitate patient-provider communication is to leverage large language models (LLMs) with their powerful natural conversation and summarization capability. However, there is a limited understanding of LLMs' role during the communication. We first conducted two interview studies with both older adults (N=10) and healthcare providers (N=9) to understand their needs and opportunities for LLMs in patient-provider asynchronous communication. Based on the insights, we built an LLM-powered communication system, Talk2Care, and designed interactive components for both groups: (1) For older adults, we leveraged the convenience and accessibility of voice assistants (VAs) and built an LLM-powered VA interface for effective information collection. (2) For health providers, we built an LLM-based dashboard to summarize and present important health information based on older adults' conversations with the VA. We further conducted two user studies with older adults and providers to evaluate the usability of the system. The results showed that Talk2Care could facilitate the communication process, enrich the health information collected from older adults, and considerably save providers' efforts and time. We envision our work as an initial exploration of LLMs' capability in the intersection of healthcare and interpersonal communication.
△ Less
Submitted 3 February, 2024; v1 submitted 17 September, 2023;
originally announced September 2023.
-
A multinode quantum network over a metropolitan area
Authors:
Jian-Long Liu,
Xi-Yu Luo,
Yong Yu,
Chao-Yang Wang,
Bin Wang,
Yi Hu,
Jun Li,
Ming-Yang Zheng,
Bo Yao,
Zi Yan,
Da Teng,
**-Wei Jiang,
Xiao-Bing Liu,
Xiu-** Xie,
Jun Zhang,
Qing-He Mao,
Xiao Jiang,
Qiang Zhang,
Xiao-Hui Bao,
Jian-Wei Pan
Abstract:
Towards realizing the future quantum internet, a pivotal milestone entails the transition from two-node proof-of-principle experiments conducted in laboratories to comprehensive, multi-node setups on large scales. Here, we report on the debut implementation of a multi-node entanglement-based quantum network over a metropolitan area. We equipped three quantum nodes with atomic quantum memories and…
▽ More
Towards realizing the future quantum internet, a pivotal milestone entails the transition from two-node proof-of-principle experiments conducted in laboratories to comprehensive, multi-node setups on large scales. Here, we report on the debut implementation of a multi-node entanglement-based quantum network over a metropolitan area. We equipped three quantum nodes with atomic quantum memories and their telecom interfaces, and combined them into a scalable phase-stabilized architecture through a server node. We demonstrated heralded entanglement generation between two quantum nodes situated 12.5 km apart, and the storage of entanglement exceeding the round-trip communication time. We also showed the concurrent entanglement generation on three links. Our work provides a metropolitan-scale testbed for the evaluation and exploration of multi-node quantum network protocols and starts a new stage of quantum internet research.
△ Less
Submitted 31 August, 2023;
originally announced September 2023.
-
Faster Stochastic Algorithms for Minimax Optimization under Polyak--Łojasiewicz Conditions
Authors:
Lesi Chen,
Boyuan Yao,
Luo Luo
Abstract:
This paper considers stochastic first-order algorithms for minimax optimization under Polyak--Łojasiewicz (PL) conditions. We propose SPIDER-GDA for solving the finite-sum problem of the form $\min_x \max_y f(x,y)\triangleq \frac{1}{n} \sum_{i=1}^n f_i(x,y)$, where the objective function $f(x,y)$ is $μ_x$-PL in $x$ and $μ_y$-PL in $y$; and each $f_i(x,y)$ is $L$-smooth. We prove SPIDER-GDA could f…
▽ More
This paper considers stochastic first-order algorithms for minimax optimization under Polyak--Łojasiewicz (PL) conditions. We propose SPIDER-GDA for solving the finite-sum problem of the form $\min_x \max_y f(x,y)\triangleq \frac{1}{n} \sum_{i=1}^n f_i(x,y)$, where the objective function $f(x,y)$ is $μ_x$-PL in $x$ and $μ_y$-PL in $y$; and each $f_i(x,y)$ is $L$-smooth. We prove SPIDER-GDA could find an $ε$-optimal solution within ${\mathcal O}\left((n + \sqrt{n}\,κ_xκ_y^2)\log (1/ε)\right)$ stochastic first-order oracle (SFO) complexity, which is better than the state-of-the-art method whose SFO upper bound is ${\mathcal O}\big((n + n^{2/3}κ_xκ_y^2)\log (1/ε)\big)$, where $κ_x\triangleq L/μ_x$ and $κ_y\triangleq L/μ_y$. For the ill-conditioned case, we provide an accelerated algorithm to reduce the computational cost further. It achieves $\tilde{\mathcal O}\big((n+\sqrt{n}\,κ_xκ_y)\log^2 (1/ε)\big)$ SFO upper bound when $κ_y \gtrsim \sqrt{n}$. Our ideas also can be applied to the more general setting that the objective function only satisfies PL condition for one variable. Numerical experiments validate the superiority of proposed methods.
△ Less
Submitted 28 July, 2023;
originally announced July 2023.
-
Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data
Authors:
Xuhai Xu,
Bingsheng Yao,
Yuanzhe Dong,
Saadia Gabriel,
Hong Yu,
James Hendler,
Marzyeh Ghassemi,
Anind K. Dey,
Dakuo Wang
Abstract:
Advances in large language models (LLMs) have empowered a variety of applications. However, there is still a significant gap in research when it comes to understanding and enhancing the capabilities of LLMs in the field of mental health. In this work, we present a comprehensive evaluation of multiple LLMs on various mental health prediction tasks via online text data, including Alpaca, Alpaca-LoRA…
▽ More
Advances in large language models (LLMs) have empowered a variety of applications. However, there is still a significant gap in research when it comes to understanding and enhancing the capabilities of LLMs in the field of mental health. In this work, we present a comprehensive evaluation of multiple LLMs on various mental health prediction tasks via online text data, including Alpaca, Alpaca-LoRA, FLAN-T5, GPT-3.5, and GPT-4. We conduct a broad range of experiments, covering zero-shot prompting, few-shot prompting, and instruction fine-tuning. The results indicate a promising yet limited performance of LLMs with zero-shot and few-shot prompt designs for mental health tasks. More importantly, our experiments show that instruction finetuning can significantly boost the performance of LLMs for all tasks simultaneously. Our best-finetuned models, Mental-Alpaca and Mental-FLAN-T5, outperform the best prompt design of GPT-3.5 (25 and 15 times bigger) by 10.9% on balanced accuracy and the best of GPT-4 (250 and 150 times bigger) by 4.8%. They further perform on par with the state-of-the-art task-specific language model. We also conduct an exploratory case study on LLMs' capability on mental health reasoning tasks, illustrating the promising capability of certain models such as GPT-4. We summarize our findings into a set of action guidelines for potential methods to enhance LLMs' capability for mental health tasks. Meanwhile, we also emphasize the important limitations before achieving deployability in real-world mental health settings, such as known racial and gender bias. We highlight the important ethical risks accompanying this line of research.
△ Less
Submitted 28 January, 2024; v1 submitted 26 July, 2023;
originally announced July 2023.
-
Automated Identication of Atrial Fibrillation from Single-lead ECGs Using Multi-branching ResNet
Authors:
Jianxin Xie,
Stavros Stavrakis,
Bing Yao
Abstract:
Atrial fibrillation (AF) is the most common cardiac arrhythmia, which is clinically identified with irregular and rapid heartbeat rhythm. AF puts a patient at risk of forming blood clots, which can eventually lead to heart failure, stroke, or even sudden death. It is of critical importance to develop an advanced analytical model that can effectively interpret the electrocardiography (ECG) signals…
▽ More
Atrial fibrillation (AF) is the most common cardiac arrhythmia, which is clinically identified with irregular and rapid heartbeat rhythm. AF puts a patient at risk of forming blood clots, which can eventually lead to heart failure, stroke, or even sudden death. It is of critical importance to develop an advanced analytical model that can effectively interpret the electrocardiography (ECG) signals and provide decision support for accurate AF diagnostics. In this paper, we propose an innovative deep-learning method for automated AF identification from single-lead ECGs. We first engage the continuous wavelet transform (CWT) to extract time-frequency features from ECG signals. Then, we develop a convolutional neural network (CNN) structure that incorporates ResNet for effective network training and multi-branching architectures for addressing the imbalanced data issue to process the 2D time-frequency features for AF classification. We evaluate the proposed methodology using two real-world ECG databases. The experimental results show a superior performance of our method compared with traditional deep learning models.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
PersonaPKT: Building Personalized Dialogue Agents via Parameter-efficient Knowledge Transfer
Authors:
Xu Han,
Bin Guo,
Yoon Jung,
Benjamin Yao,
Yu Zhang,
Xiaohu Liu,
Chenlei Guo
Abstract:
Personalized dialogue agents (DAs) powered by large pre-trained language models (PLMs) often rely on explicit persona descriptions to maintain personality consistency. However, such descriptions may not always be available or may pose privacy concerns. To tackle this bottleneck, we introduce PersonaPKT, a lightweight transfer learning approach that can build persona-consistent dialogue models with…
▽ More
Personalized dialogue agents (DAs) powered by large pre-trained language models (PLMs) often rely on explicit persona descriptions to maintain personality consistency. However, such descriptions may not always be available or may pose privacy concerns. To tackle this bottleneck, we introduce PersonaPKT, a lightweight transfer learning approach that can build persona-consistent dialogue models without explicit persona descriptions. By representing each persona as a continuous vector, PersonaPKT learns implicit persona-specific features directly from a small number of dialogue samples produced by the same persona, adding less than 0.1% trainable parameters for each persona on top of the PLM backbone. Empirical results demonstrate that PersonaPKT effectively builds personalized DAs with high storage efficiency, outperforming various baselines in terms of persona consistency while maintaining good response generation quality. In addition, it enhances privacy protection by avoiding explicit persona descriptions. Overall, PersonaPKT is an effective solution for creating personalized DAs that respect user privacy.
△ Less
Submitted 13 June, 2023;
originally announced June 2023.
-
Giant Enhancement of Magnonic Frequency Combs by Exceptional Points
Authors:
Congyi Wang,
**wei Rao,
Zhijian Chen,
Kaixin Zhao,
Liaoxin Sun,
Bimu Yao,
Tao Yu,
Yi-Pu Wang,
Wei Lu
Abstract:
With their incomparable time-frequency accuracy, frequency combs have significantly advanced precision spectroscopy, ultra-sensitive detection, and atomic clocks. Traditional methods to create photonic, phononic, and magnonic frequency combs hinge on material nonlinearities which are often weak, necessitating high power densities to surpass their initiation thresholds, which subsequently limits th…
▽ More
With their incomparable time-frequency accuracy, frequency combs have significantly advanced precision spectroscopy, ultra-sensitive detection, and atomic clocks. Traditional methods to create photonic, phononic, and magnonic frequency combs hinge on material nonlinearities which are often weak, necessitating high power densities to surpass their initiation thresholds, which subsequently limits their applications. Here, we introduce a novel nonlinear process to efficiently generate magnonic frequency combs (MFCs) by exploiting exceptional points (EPs) in a coupled system comprising a pump-induced magnon mode and a Kittel mode. Even without any cavity, our method greatly improves the efficiency of nonlinear frequency conversion and achieves optimal MFCs at low pump power. Additionally, our novel nonlinear process enables excellent tunability of EPs using the polarization and power of the pump, simplifying MFC generation and manipulation. Our work establishes a synergistic relationship between non-Hermitian physics and MFCs, which is advantages for coherent/quantum information processing and ultra-sensitive detection.
△ Less
Submitted 3 June, 2023;
originally announced June 2023.
-
Reducing Communication for Split Learning by Randomized Top-k Sparsification
Authors:
Fei Zheng,
Chaochao Chen,
Lingjuan Lyu,
Binhui Yao
Abstract:
Split learning is a simple solution for Vertical Federated Learning (VFL), which has drawn substantial attention in both research and application due to its simplicity and efficiency. However, communication efficiency is still a crucial issue for split learning. In this paper, we investigate multiple communication reduction methods for split learning, including cut layer size reduction, top-k spar…
▽ More
Split learning is a simple solution for Vertical Federated Learning (VFL), which has drawn substantial attention in both research and application due to its simplicity and efficiency. However, communication efficiency is still a crucial issue for split learning. In this paper, we investigate multiple communication reduction methods for split learning, including cut layer size reduction, top-k sparsification, quantization, and L1 regularization. Through analysis of the cut layer size reduction and top-k sparsification, we further propose randomized top-k sparsification, to make the model generalize and converge better. This is done by selecting top-k elements with a large probability while also having a small probability to select non-top-k elements. Empirical results show that compared with other communication-reduction methods, our proposed randomized top-k sparsification achieves a better model performance under the same compression level.
△ Less
Submitted 29 May, 2023;
originally announced May 2023.
-
PPGenCDR: A Stable and Robust Framework for Privacy-Preserving Cross-Domain Recommendation
Authors:
Xinting Liao,
Weiming Liu,
Xiaolin Zheng,
Binhui Yao,
Chaochao Chen
Abstract:
Privacy-preserving cross-domain recommendation (PPCDR) refers to preserving the privacy of users when transferring the knowledge from source domain to target domain for better performance, which is vital for the long-term development of recommender systems. Existing work on cross-domain recommendation (CDR) reaches advanced and satisfying recommendation performance, but mostly neglects preserving…
▽ More
Privacy-preserving cross-domain recommendation (PPCDR) refers to preserving the privacy of users when transferring the knowledge from source domain to target domain for better performance, which is vital for the long-term development of recommender systems. Existing work on cross-domain recommendation (CDR) reaches advanced and satisfying recommendation performance, but mostly neglects preserving privacy. To fill this gap, we propose a privacy-preserving generative cross-domain recommendation (PPGenCDR) framework for PPCDR. PPGenCDR includes two main modules, i.e., stable privacy-preserving generator module, and robust cross-domain recommendation module. Specifically, the former isolates data from different domains with a generative adversarial network (GAN) based model, which stably estimates the distribution of private data in the source domain with Renyi differential privacy (RDP) technique. Then the latter aims to robustly leverage the perturbed but effective knowledge from the source domain with the raw data in target domain to improve recommendation performance. Three key modules, i.e., (1) selective privacy preserver, (2) GAN stabilizer, and (3) robustness conductor, guarantee the cost-effective trade-off between utility and privacy, the stability of GAN when using RDP, and the robustness of leveraging transferable knowledge accordingly. The extensive empirical studies on Douban and Amazon datasets demonstrate that PPGenCDR significantly outperforms the state-of-the-art recommendation models while preserving privacy.
△ Less
Submitted 11 May, 2023;
originally announced May 2023.
-
AMELI: Enhancing Multimodal Entity Linking with Fine-Grained Attributes
Authors:
Barry Menglong Yao,
Yu Chen,
Qifan Wang,
Sijia Wang,
Minqian Liu,
Zhiyang Xu,
Licheng Yu,
Lifu Huang
Abstract:
We propose attribute-aware multimodal entity linking, where the input is a mention described with a text and image, and the goal is to predict the corresponding target entity from a multimodal knowledge base (KB) where each entity is also described with a text description, a visual image and a set of attributes and values. To support this research, we construct AMELI, a large-scale dataset consist…
▽ More
We propose attribute-aware multimodal entity linking, where the input is a mention described with a text and image, and the goal is to predict the corresponding target entity from a multimodal knowledge base (KB) where each entity is also described with a text description, a visual image and a set of attributes and values. To support this research, we construct AMELI, a large-scale dataset consisting of 18,472 reviews and 35,598 products. To establish baseline performance on AMELI, we experiment with the current state-of-the-art multimodal entity linking approaches and our enhanced attribute-aware model and demonstrate the importance of incorporating the attribute information into the entity linking process. To be best of our knowledge, we are the first to build benchmark dataset and solutions for the attribute-aware multimodal entity linking task. Datasets and codes will be made publicly available.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
Benchmarking LLM-based Machine Translation on Cultural Awareness
Authors:
Binwei Yao,
Ming Jiang,
Diyi Yang,
Junjie Hu
Abstract:
Translating cultural-specific content is crucial for effective cross-cultural communication. However, many MT systems still struggle to translate sentences containing cultural-specific entities accurately and understandably. Recent advancements in in-context learning utilize lightweight prompts to guide large language models (LLMs) in machine translation tasks. Nevertheless, the effectiveness of t…
▽ More
Translating cultural-specific content is crucial for effective cross-cultural communication. However, many MT systems still struggle to translate sentences containing cultural-specific entities accurately and understandably. Recent advancements in in-context learning utilize lightweight prompts to guide large language models (LLMs) in machine translation tasks. Nevertheless, the effectiveness of this approach in enhancing machine translation with cultural awareness remains uncertain. To address this gap, we introduce a new data curation pipeline to construct a culturally relevant parallel corpus, enriched with annotations of cultural-specific items. Furthermore, we devise a novel evaluation metric to assess the understandability of translations in a reference-free manner by GPT-4. We evaluate a variety of neural machine translation (NMT) and LLM-based MT systems using our dataset. Additionally, we propose several prompting strategies for LLMs to incorporate external and internal cultural knowledge into the translation process. Our results demonstrate that eliciting explanations can significantly enhance the understandability of cultural-specific entities, especially those without well-known translations.
△ Less
Submitted 22 March, 2024; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Beyond Labels: Empowering Human Annotators with Natural Language Explanations through a Novel Active-Learning Architecture
Authors:
Bingsheng Yao,
Ishan **dal,
Lucian Popa,
Yannis Katsis,
Sayan Ghosh,
Lihong He,
Yuxuan Lu,
Shashank Srivastava,
Yunyao Li,
James Hendler,
Dakuo Wang
Abstract:
Real-world domain experts (e.g., doctors) rarely annotate only a decision label in their day-to-day workflow without providing explanations. Yet, existing low-resource learning techniques, such as Active Learning (AL), that aim to support human annotators mostly focus on the label while neglecting the natural language explanation of a data point. This work proposes a novel AL architecture to suppo…
▽ More
Real-world domain experts (e.g., doctors) rarely annotate only a decision label in their day-to-day workflow without providing explanations. Yet, existing low-resource learning techniques, such as Active Learning (AL), that aim to support human annotators mostly focus on the label while neglecting the natural language explanation of a data point. This work proposes a novel AL architecture to support experts' real-world need for label and explanation annotations in low-resource scenarios. Our AL architecture leverages an explanation-generation model to produce explanations guided by human explanations, a prediction model that utilizes generated explanations toward prediction faithfully, and a novel data diversity-based AL sampling strategy that benefits from the explanation annotations. Automated and human evaluations demonstrate the effectiveness of incorporating explanations into AL sampling and the improved human annotation efficiency and trustworthiness with our AL architecture. Additional ablation studies illustrate the potential of our AL architecture for transfer learning, generalizability, and integration with large language models (LLMs). While LLMs exhibit exceptional explanation-generation capabilities for relatively simple tasks, their effectiveness in complex real-world tasks warrants further in-depth study.
△ Less
Submitted 23 October, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Are Human Explanations Always Helpful? Towards Objective Evaluation of Human Natural Language Explanations
Authors:
Bingsheng Yao,
Prithviraj Sen,
Lucian Popa,
James Hendler,
Dakuo Wang
Abstract:
Human-annotated labels and explanations are critical for training explainable NLP models. However, unlike human-annotated labels whose quality is easier to calibrate (e.g., with a majority vote), human-crafted free-form explanations can be quite subjective. Before blindly using them as ground truth to train ML models, a vital question needs to be asked: How do we evaluate a human-annotated explana…
▽ More
Human-annotated labels and explanations are critical for training explainable NLP models. However, unlike human-annotated labels whose quality is easier to calibrate (e.g., with a majority vote), human-crafted free-form explanations can be quite subjective. Before blindly using them as ground truth to train ML models, a vital question needs to be asked: How do we evaluate a human-annotated explanation's quality? In this paper, we build on the view that the quality of a human-annotated explanation can be measured based on its helpfulness (or impairment) to the ML models' performance for the desired NLP tasks for which the annotations were collected. In comparison to the commonly used Simulatability score, we define a new metric that can take into consideration the helpfulness of an explanation for model performance at both fine-tuning and inference. With the help of a unified dataset format, we evaluated the proposed metric on five datasets (e.g., e-SNLI) against two model architectures (T5 and BART), and the results show that our proposed metric can objectively evaluate the quality of human-annotated explanations, while Simulatability falls short.
△ Less
Submitted 22 May, 2023; v1 submitted 4 May, 2023;
originally announced May 2023.
-
KEPLET: Knowledge-Enhanced Pretrained Language Model with Topic Entity Awareness
Authors:
Yichuan Li,
Jialong Han,
Kyumin Lee,
Chengyuan Ma,
Benjamin Yao,
Derek Liu
Abstract:
In recent years, Pre-trained Language Models (PLMs) have shown their superiority by pre-training on unstructured text corpus and then fine-tuning on downstream tasks. On entity-rich textual resources like Wikipedia, Knowledge-Enhanced PLMs (KEPLMs) incorporate the interactions between tokens and mentioned entities in pre-training, and are thus more effective on entity-centric tasks such as entity…
▽ More
In recent years, Pre-trained Language Models (PLMs) have shown their superiority by pre-training on unstructured text corpus and then fine-tuning on downstream tasks. On entity-rich textual resources like Wikipedia, Knowledge-Enhanced PLMs (KEPLMs) incorporate the interactions between tokens and mentioned entities in pre-training, and are thus more effective on entity-centric tasks such as entity linking and relation classification. Although exploiting Wikipedia's rich structures to some extent, conventional KEPLMs still neglect a unique layout of the corpus where each Wikipedia page is around a topic entity (identified by the page URL and shown in the page title). In this paper, we demonstrate that KEPLMs without incorporating the topic entities will lead to insufficient entity interaction and biased (relation) word semantics. We thus propose KEPLET, a novel Knowledge-Enhanced Pre-trained LanguagE model with Topic entity awareness. In an end-to-end manner, KEPLET identifies where to add the topic entity's information in a Wikipedia sentence, fuses such information into token and mentioned entities representations, and supervises the network learning, through which it takes topic entities back into consideration. Experiments demonstrated the generality and superiority of KEPLET which was applied to two representative KEPLMs, achieving significant improvements on four entity-centric tasks.
△ Less
Submitted 2 May, 2023;
originally announced May 2023.
-
Set Theory with Urelements
Authors:
Bokai Yao
Abstract:
This dissertation aims to provide a comprehensive account of set theory with urelements. In Chapter 1, I present mathematical and philosophical motivations for studying urelement set theory and lay out the necessary technical preliminaries. Chapter 2 is devoted to the axiomatization of urelement set theory, where I introduce a hierarchy of axioms and discuss how ZFC with urelements should be axiom…
▽ More
This dissertation aims to provide a comprehensive account of set theory with urelements. In Chapter 1, I present mathematical and philosophical motivations for studying urelement set theory and lay out the necessary technical preliminaries. Chapter 2 is devoted to the axiomatization of urelement set theory, where I introduce a hierarchy of axioms and discuss how ZFC with urelements should be axiomatized. The breakdown of this hierarchy of axioms in the absence of the Axiom of Choice is also explored. In Chapter 3, I investigate forcing with urelements and develop a new approach that addresses a drawback of the existing machinery. I demonstrate that forcing can preserve, destroy, and recover the axioms isolated in Chapter 2 and discuss how Boolean ultrapowers can be applied in urelement set theory. Chapter 4 delves into class theory with urelements. I first discuss the issue of axiomatizing urelement class theory and then explore the second-order reflection principle with urelements. In particular, assuming large cardinals, I construct a model of second-order reflection where the principle of limitation of size fails.
△ Less
Submitted 18 June, 2023; v1 submitted 24 March, 2023;
originally announced March 2023.
-
Meter-scale strong coupling between magnons and photons
Authors:
**wei Rao,
C. Y. Wang,
Bimu Yao,
Z. J. Chen,
K. X. Zhao,
Wei Lu
Abstract:
We experimentally realize a meter-scale strong coupling effect between magnons and photons at room temperature, with a coherent coupling of 20 m and a dissipative coupling of 7.6 m. To this end, we integrate a saturable gain into a microwave cavity and then couple this active cavity to a magnon mode via a long coaxial cable. The gain compensates for the cavity dissipation, but preserves the cavity…
▽ More
We experimentally realize a meter-scale strong coupling effect between magnons and photons at room temperature, with a coherent coupling of 20 m and a dissipative coupling of 7.6 m. To this end, we integrate a saturable gain into a microwave cavity and then couple this active cavity to a magnon mode via a long coaxial cable. The gain compensates for the cavity dissipation, but preserves the cavity radiation that mediates the indirect photon-magnon coupling. It thus enables the long-range strong photon-magnon coupling. With full access to traveling waves, we demonstrate a remote control of photon-magnon coupling by modulating the phase and amplitude of traveling waves, rather than reconfiguring subsystems themselves. Our method for realizing long-range strong coupling in cavity magnonics provides a general idea for other physical systems. Our experimental achievements may promote the construction of information networks based on cavity magnonics.
△ Less
Submitted 9 August, 2023; v1 submitted 20 March, 2023;
originally announced March 2023.
-
Time series anomaly detection with reconstruction-based state-space models
Authors:
Fan Wang,
Keli Wang,
Boyu Yao
Abstract:
Recent advances in digitization have led to the availability of multivariate time series data in various domains, enabling real-time monitoring of operations. Identifying abnormal data patterns and detecting potential failures in these scenarios are important yet rather challenging. In this work, we propose a novel unsupervised anomaly detection method for time series data. The proposed framework…
▽ More
Recent advances in digitization have led to the availability of multivariate time series data in various domains, enabling real-time monitoring of operations. Identifying abnormal data patterns and detecting potential failures in these scenarios are important yet rather challenging. In this work, we propose a novel unsupervised anomaly detection method for time series data. The proposed framework jointly learns the observation model and the dynamic model, and model uncertainty is estimated from normal samples. Specifically, a long short-term memory (LSTM)-based encoder-decoder is adopted to represent the map** between the observation space and the latent space. Bidirectional transitions of states are simultaneously modeled by leveraging backward and forward temporal information. Regularization of the latent space places constraints on the states of normal samples, and Mahalanobis distance is used to evaluate the abnormality level. Empirical studies on synthetic and real-world datasets demonstrate the superior performance of the proposed method in anomaly detection tasks.
△ Less
Submitted 9 October, 2023; v1 submitted 6 March, 2023;
originally announced March 2023.
-
Coherent Microwave Emission of a Gain-Driven Polariton
Authors:
Bimu Yao,
Y. S. Gui,
J. W. Rao,
Y. H. Zhang,
Wei Lu,
C. -M. Hu
Abstract:
By develo** a gain-embedded cavity magnonics platform, we create gain-driven polariton (GDP) that is activated by an amplified electromagnetic field. Distinct effects of gain-driven light-matter interaction, such as polariton auto-oscillations, polariton phase singularity, self-selection of a polariton bright mode, and gain-induced magnon-photon synchronization, are theoretically studied and exp…
▽ More
By develo** a gain-embedded cavity magnonics platform, we create gain-driven polariton (GDP) that is activated by an amplified electromagnetic field. Distinct effects of gain-driven light-matter interaction, such as polariton auto-oscillations, polariton phase singularity, self-selection of a polariton bright mode, and gain-induced magnon-photon synchronization, are theoretically studied and experimentally manifested. Utilizing the gain-sustained photon coherence of the GDP, we demonstrate polariton-based coherent microwave amplication (~ 40 dB) and achieve high-quality coherent microwave emission (Q > 10^9).
△ Less
Submitted 15 February, 2023;
originally announced February 2023.
-
Control of the magnon-polariton hybridization with a microwave pump
Authors:
C. Zhang,
**wei Rao,
C. Y. Wang,
Z. J. Chen,
K. X. Zhao,
Bimu Yao,
Xu-Guang Xu,
Wei Lu
Abstract:
Pump-induced magnon modes (PIMs) are recently discovered elementary excitations in ferrimagnets that offer significant tunability to spin dynamics. Here, we investigate the coupling between a PIM and cavity magnon polaritons (CMPs) by driving a cavity magnonic system away from equilibrium with a microwave pump. In our experiment, the Walker mode simultaneously couples with the PIM and cavity photo…
▽ More
Pump-induced magnon modes (PIMs) are recently discovered elementary excitations in ferrimagnets that offer significant tunability to spin dynamics. Here, we investigate the coupling between a PIM and cavity magnon polaritons (CMPs) by driving a cavity magnonic system away from equilibrium with a microwave pump. In our experiment, the Walker mode simultaneously couples with the PIM and cavity photons and thus combines two strongly coherent coupling processes in a single cavity structure. Such a PIM-CMP hybridization system acquires complementary properties from both the PIM and CMPs, allowing it to be freely manipulated by the magnetic field, the pump power and the pump frequency. These coherent manipulations exhibit unique behaviors beyond the intrinsic properties limited by the material nature and electromagnetic boundary conditions, thereby creating opportunities for extending the control of hybrid devices.
△ Less
Submitted 5 August, 2023; v1 submitted 16 February, 2023;
originally announced February 2023.
-
Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
Authors:
Yuliang Liu,
Shenggui Li,
Jiarui Fang,
Yanjun Shao,
Boyuan Yao,
Yang You
Abstract:
In recent years, large-scale models have demonstrated state-of-the-art performance across various domains. However, training such models requires various techniques to address the problem of limited computing power and memory on devices such as GPUs. Some commonly used techniques include pipeline parallelism, tensor parallelism, and activation checkpointing. While existing works have focused on fi…
▽ More
In recent years, large-scale models have demonstrated state-of-the-art performance across various domains. However, training such models requires various techniques to address the problem of limited computing power and memory on devices such as GPUs. Some commonly used techniques include pipeline parallelism, tensor parallelism, and activation checkpointing. While existing works have focused on finding efficient distributed execution plans (Zheng et al. 2022) and activation checkpoint scheduling (Herrmann et al. 2019, Beaumont et al. 2021}, there has been no method proposed to optimize these two plans jointly. Moreover, ahead-of-time compilation relies heavily on accurate memory and computing overhead estimation, which is often time-consuming and misleading. Existing training systems and machine learning pipelines either physically execute each operand or estimate memory usage with a scaled input tensor. To address these challenges, we introduce a system that can jointly optimize distributed execution and gradient checkpointing plans. Additionally, we provide an easy-to-use symbolic profiler that generates memory and computing statistics for any PyTorch model with a minimal time cost. Our approach allows users to parallelize their model training on the given hardware with minimum code change based. The source code is publicly available at Colossal-AI GitHub or https://github.com/hpcaitech/ColossalAI
△ Less
Submitted 21 February, 2023; v1 submitted 6 February, 2023;
originally announced February 2023.
-
A Monolithic Graphene-Functionalized Microlaser for Multispecies Gas Detection
Authors:
Yanhong Guo,
Zhaoyu Li,
Ning An,
Yongzheng Guo,
Yuchen Wang,
Yusen Yuan,
Hao Zhang,
Teng Tan,
Caihao Wu,
Bo Peng,
Giancarlo Soavi,
Yunjiang Rao,
Baicheng Yao
Abstract:
Optical microcavity enhanced light-matter interaction offers a powerful tool to develop fast and precise sensing techniques, spurring applications in the detection of biochemical targets ranging from cells, nanoparticles, and large molecules. However, the intrinsic inertness of such pristine microresonators limits their spread in new fields such as gas detection. Here, a functionalized microlaser…
▽ More
Optical microcavity enhanced light-matter interaction offers a powerful tool to develop fast and precise sensing techniques, spurring applications in the detection of biochemical targets ranging from cells, nanoparticles, and large molecules. However, the intrinsic inertness of such pristine microresonators limits their spread in new fields such as gas detection. Here, a functionalized microlaser sensor is realized by depositing graphene in an erbium-doped over-modal microsphere. By using a 980 nm pump, multiple laser lines excited in different mode families of the microresonator are co-generated in a single device. The interference between these splitting mode lasers produce beat notes in the electrical domain (0.2-1.1 MHz) with sub-kHz accuracy, thanks to the graphene-induced intracavity backward scattering. This allows for multispecies gas identification from a mixture, and ultrasensitive gas detection down to individual molecule.
△ Less
Submitted 19 January, 2023;
originally announced January 2023.
-
Forcing with Urelements
Authors:
Bokai Yao
Abstract:
I first isolate a hierarchy of axioms over ZFC that allows a proper class of urelements. The Collection Principle and Reflection Principle hold precisely when the urelements are arranged in a specific manner. I then turn to forcing with urelements. A new forcing machinery with urelements is proposed to address a problem with the existing approach regarding the property of fullness. Every new forci…
▽ More
I first isolate a hierarchy of axioms over ZFC that allows a proper class of urelements. The Collection Principle and Reflection Principle hold precisely when the urelements are arranged in a specific manner. I then turn to forcing with urelements. A new forcing machinery with urelements is proposed to address a problem with the existing approach regarding the property of fullness. Every new forcing relation is full just in case Collection holds in the ground model. Forcing with urelements can preserve, destroy, or recover various axioms within the hierarchy. Ground model definability fails when the ground model contains a proper class of urelements.
△ Less
Submitted 21 August, 2023; v1 submitted 27 December, 2022;
originally announced December 2022.