-
On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey
Authors:
Lin Long,
Rui Wang,
Ruixuan Xiao,
Junbo Zhao,
Xiao Ding,
Gang Chen,
Haobo Wang
Abstract:
Within the evolving landscape of deep learning, the dilemma of data quantity and quality has been a long-standing problem. The recent advent of Large Language Models (LLMs) offers a data-centric solution to alleviate the limitations of real-world data with synthetic data generation. However, current investigations into this field lack a unified framework and mostly stay on the surface. Therefore,…
▽ More
Within the evolving landscape of deep learning, the dilemma of data quantity and quality has been a long-standing problem. The recent advent of Large Language Models (LLMs) offers a data-centric solution to alleviate the limitations of real-world data with synthetic data generation. However, current investigations into this field lack a unified framework and mostly stay on the surface. Therefore, this paper provides an organization of relevant studies based on a generic workflow of synthetic data generation. By doing so, we highlight the gaps within existing research and outline prospective avenues for future study. This work aims to shepherd the academic and industrial communities towards deeper, more methodical inquiries into the capabilities and applications of LLMs-driven synthetic data generation.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents
Authors:
Ruixuan Xiao,
Wentao Ma,
Ke Wang,
Yuchuan Wu,
Junbo Zhao,
Haobo Wang,
Fei Huang,
Yongbin Li
Abstract:
LLM-based agents have emerged as promising tools, which are crafted to fulfill complex tasks by iterative planning and action. However, these agents are susceptible to undesired planning hallucinations when lacking specific knowledge for expertise-intensive tasks. To address this, preliminary attempts are made to enhance planning reliability by incorporating external workflow-related knowledge. De…
▽ More
LLM-based agents have emerged as promising tools, which are crafted to fulfill complex tasks by iterative planning and action. However, these agents are susceptible to undesired planning hallucinations when lacking specific knowledge for expertise-intensive tasks. To address this, preliminary attempts are made to enhance planning reliability by incorporating external workflow-related knowledge. Despite the promise, such infused knowledge is mostly disorganized and diverse in formats, lacking rigorous formalization and comprehensive comparisons. Motivated by this, we formalize different formats of workflow knowledge and present FlowBench, the first benchmark for workflow-guided planning. FlowBench covers 51 different scenarios from 6 domains, with knowledge presented in diverse formats. To assess different LLMs on FlowBench, we design a multi-tiered evaluation framework. We evaluate the efficacy of workflow knowledge across multiple formats, and the results indicate that current LLM agents need considerable improvements for satisfactory planning. We hope that our challenging benchmark can pave the way for future agent planning research.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Authors:
Renqiu Xia,
Song Mao,
Xiangchao Yan,
Hongbin Zhou,
Bo Zhang,
Haoyang Peng,
Jiahao Pi,
Daocheng Fu,
Wenjie Wu,
Hancheng Ye,
Shiyang Feng,
Bin Wang,
Chao Xu,
Conghui He,
Pinlong Cai,
Min Dou,
Botian Shi,
Sheng Zhou,
Yongwei Wang,
Bin Wang,
Junchi Yan,
Fei Wu,
Yu Qiao
Abstract:
Scientific documents record research findings and valuable human knowledge, comprising a vast corpus of high-quality data. Leveraging multi-modality data extracted from these documents and assessing large models' abilities to handle scientific document-oriented tasks is therefore meaningful. Despite promising advancements, large models still perform poorly on multi-page scientific document extract…
▽ More
Scientific documents record research findings and valuable human knowledge, comprising a vast corpus of high-quality data. Leveraging multi-modality data extracted from these documents and assessing large models' abilities to handle scientific document-oriented tasks is therefore meaningful. Despite promising advancements, large models still perform poorly on multi-page scientific document extraction and understanding tasks, and their capacity to process within-document data formats such as charts and equations remains under-explored. To address these issues, we present DocGenome, a structured document benchmark constructed by annotating 500K scientific documents from 153 disciplines in the arXiv open-access community, using our custom auto-labeling pipeline. DocGenome features four key characteristics: 1) Completeness: It is the first dataset to structure data from all modalities including 13 layout attributes along with their LaTeX source codes. 2) Logicality: It provides 6 logical relationships between different entities within each scientific document. 3) Diversity: It covers various document-oriented tasks, including document classification, visual grounding, document layout detection, document transformation, open-ended single-page QA and multi-page QA. 4) Correctness: It undergoes rigorous quality control checks conducted by a specialized team. We conduct extensive experiments to demonstrate the advantages of DocGenome and objectively evaluate the performance of large models on our benchmark.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Accurate Explanation Model for Image Classifiers using Class Association Embedding
Authors:
Ruitao Xie,
**gbang Chen,
Limai Jiang,
Rui Xiao,
Yi Pan,
Yunpeng Cai
Abstract:
Image classification is a primary task in data analysis where explainable models are crucially demanded in various applications. Although amounts of methods have been proposed to obtain explainable knowledge from the black-box classifiers, these approaches lack the efficiency of extracting global knowledge regarding the classification task, thus is vulnerable to local traps and often leads to poor…
▽ More
Image classification is a primary task in data analysis where explainable models are crucially demanded in various applications. Although amounts of methods have been proposed to obtain explainable knowledge from the black-box classifiers, these approaches lack the efficiency of extracting global knowledge regarding the classification task, thus is vulnerable to local traps and often leads to poor accuracy. In this study, we propose a generative explanation model that combines the advantages of global and local knowledge for explaining image classifiers. We develop a representation learning method called class association embedding (CAE), which encodes each sample into a pair of separated class-associated and individual codes. Recombining the individual code of a given sample with altered class-associated code leads to a synthetic real-looking sample with preserved individual characters but modified class-associated features and possibly flipped class assignments. A building-block coherency feature extraction algorithm is proposed that efficiently separates class-associated features from individual ones. The extracted feature space forms a low-dimensional manifold that visualizes the classification decision patterns. Explanation on each individual sample can be then achieved in a counter-factual generation manner which continuously modifies the sample in one direction, by shifting its class-associated code along a guided path, until its classification outcome is changed. We compare our method with state-of-the-art ones on explaining image classification tasks in the form of saliency maps, demonstrating that our method achieves higher accuracies. The code is available at https://github.com/xrt11/XAI-CODE.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Supporting Self-Reflection at Scale with Large Language Models: Insights from Randomized Field Experiments in Classrooms
Authors:
Harsh Kumar,
Ruiwei Xiao,
Benjamin Lawson,
Ilya Musabirov,
Jiakai Shi,
Xinyuan Wang,
Huayin Luo,
Joseph Jay Williams,
Anna Rafferty,
John Stamper,
Michael Liut
Abstract:
Self-reflection on learning experiences constitutes a fundamental cognitive process, essential for the consolidation of knowledge and the enhancement of learning efficacy. However, traditional methods to facilitate reflection often face challenges in personalization, immediacy of feedback, engagement, and scalability. Integration of Large Language Models (LLMs) into the reflection process could mi…
▽ More
Self-reflection on learning experiences constitutes a fundamental cognitive process, essential for the consolidation of knowledge and the enhancement of learning efficacy. However, traditional methods to facilitate reflection often face challenges in personalization, immediacy of feedback, engagement, and scalability. Integration of Large Language Models (LLMs) into the reflection process could mitigate these limitations. In this paper, we conducted two randomized field experiments in undergraduate computer science courses to investigate the potential of LLMs to help students engage in post-lesson reflection. In the first experiment (N=145), students completed a take-home assignment with the support of an LLM assistant; half of these students were then provided access to an LLM designed to facilitate self-reflection. The results indicated that the students assigned to LLM-guided reflection reported increased self-confidence and performed better on a subsequent exam two weeks later than their peers in the control condition. In the second experiment (N=112), we evaluated the impact of LLM-guided self-reflection against other scalable reflection methods, such as questionnaire-based activities and review of key lecture slides, after assignment. Our findings suggest that the students in the questionnaire and LLM-based reflection groups performed equally well and better than those who were only exposed to lecture slides, according to their scores on a proctored exam two weeks later on the same subject matter. These results underscore the utility of LLM-guided reflection and questionnaire-based activities in improving learning outcomes. Our work highlights that focusing solely on the accuracy of LLMs can overlook their potential to enhance metacognitive skills through practices such as self-reflection. We discuss the implications of our research for the Edtech community.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
Advancing Grounded Multimodal Named Entity Recognition via LLM-Based Reformulation and Box-Based Segmentation
Authors:
**yuan Li,
Ziyan Li,
Han Li,
Jianfei Yu,
Rui Xia,
Di Sun,
Gang Pan
Abstract:
Grounded Multimodal Named Entity Recognition (GMNER) task aims to identify named entities, entity types and their corresponding visual regions. GMNER task exhibits two challenging attributes: 1) The tenuous correlation between images and text on social media contributes to a notable proportion of named entities being ungroundable. 2) There exists a distinction between coarse-grained noun phrases u…
▽ More
Grounded Multimodal Named Entity Recognition (GMNER) task aims to identify named entities, entity types and their corresponding visual regions. GMNER task exhibits two challenging attributes: 1) The tenuous correlation between images and text on social media contributes to a notable proportion of named entities being ungroundable. 2) There exists a distinction between coarse-grained noun phrases used in similar tasks (e.g., phrase localization) and fine-grained named entities. In this paper, we propose RiVEG, a unified framework that reformulates GMNER into a joint MNER-VE-VG task by leveraging large language models (LLMs) as connecting bridges. This reformulation brings two benefits: 1) It enables us to optimize the MNER module for optimal MNER performance and eliminates the need to pre-extract region features using object detection methods, thus naturally addressing the two major limitations of existing GMNER methods. 2) The introduction of Entity Expansion Expression module and Visual Entailment (VE) module unifies Visual Grounding (VG) and Entity Grounding (EG). This endows the proposed framework with unlimited data and model scalability. Furthermore, to address the potential ambiguity stemming from the coarse-grained bounding box output in GMNER, we further construct the new Segmented Multimodal Named Entity Recognition (SMNER) task and corresponding Twitter-SMNER dataset aimed at generating fine-grained segmentation masks, and experimentally demonstrate the feasibility and effectiveness of using box prompt-based Segment Anything Model (SAM) to empower any GMNER model with the ability to accomplish the SMNER task. Extensive experiments demonstrate that RiVEG significantly outperforms SoTA methods on four datasets across the MNER, GMNER, and SMNER tasks.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Isolated anions induced high ionic conductivity
Authors:
Qifan Yang,
**g Xu,
Yuqi Wang,
Xiao Fu,
Ruijuan Xiao,
Hong Li
Abstract:
One of the key materials in solid-state lithium batteries is fast ion conductors. However, the Li+ ion transport in inorganic crystals involves complex factors, making it a mystery to find and design ion conductors with low migration barriers. In this work, a distinctive structural characteristic involving isolated anions has been discovered to enhance high ionic conductivity in crystals. It is an…
▽ More
One of the key materials in solid-state lithium batteries is fast ion conductors. However, the Li+ ion transport in inorganic crystals involves complex factors, making it a mystery to find and design ion conductors with low migration barriers. In this work, a distinctive structural characteristic involving isolated anions has been discovered to enhance high ionic conductivity in crystals. It is an effective way to create a smooth energy potential landscape and construct local pathways for lithium ion migration. By adjusting the spacing and arrangement of the isolated anions, these local pathways can connect with each other, leading to high ion conductivity. By designing different space groups and local environments of the Se2- anions in the Li8SiSe6 composition, combined with the ion transport properties obtained from AIMD simulations, we define isolated anions and find that local environment with higher point group symmetry promotes the formation of cage-like local transport channels. Additionally, the appropriate distance between neighboring isolated anions can create coplanar connections between adjacent cage-like channels. Furthermore, different types of isolated anions can be used to control the distribution of cage-like channels in the lattice. Based on the structural characteristic of isolated anions, we discovered compounds with isolated N3-, Cl-, I-, and S2- features from the crystal structure databases. The confirmation of ion transport in these structures validates the proposed design method of using isolated anions as structural features for fast ion conductors and leads to the discovery of several new fast ion conductor materials.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge
Authors:
Hongwei Bran Li,
Fernando Navarro,
Ivan Ezhov,
Amirhossein Bayat,
Dhritiman Das,
Florian Kofler,
Suprosanna Shit,
Diana Waldmannstetter,
Johannes C. Paetzold,
Xiaobin Hu,
Benedikt Wiestler,
Lucas Zimmer,
Tamaz Amiranashvili,
Chinmay Prabhakar,
Christoph Berger,
Jonas Weidner,
Michelle Alonso-Basant,
Arif Rashid,
Ujjwal Baid,
Wesam Adel,
Deniz Ali,
Bhakti Baheti,
Yingbin Bai,
Ishaan Bhatt,
Sabri Can Cetindag
, et al. (55 additional authors not shown)
Abstract:
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de…
▽ More
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the development and evaluation of automated segmentation algorithms. Accurately modeling and quantifying this variability is essential for enhancing the robustness and clinical applicability of these algorithms. We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ), which was organized in conjunction with International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2020 and 2021. The challenge focuses on the uncertainty quantification of medical image segmentation which considers the omnipresence of inter-rater variability in imaging datasets. The large collection of images with multi-rater annotations features various modalities such as MRI and CT; various organs such as the brain, prostate, kidney, and pancreas; and different image dimensions 2D-vs-3D. A total of 24 teams submitted different solutions to the problem, combining various baseline models, Bayesian neural networks, and ensemble model techniques. The obtained results indicate the importance of the ensemble models, as well as the need for further research to develop efficient 3D methods for uncertainty quantification methods in 3D segmentation tasks.
△ Less
Submitted 24 June, 2024; v1 submitted 19 March, 2024;
originally announced May 2024.
-
SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations
Authors:
Fanfan Wang,
Heqing Ma,
Jianfei Yu,
Rui Xia,
Erik Cambria
Abstract:
The ability to understand emotions is an essential component of human-like artificial intelligence, as emotions greatly influence human cognition, decision making, and social interactions. In addition to emotion recognition in conversations, the task of identifying the potential causes behind an individual's emotional state in conversations, is of great importance in many application scenarios. We…
▽ More
The ability to understand emotions is an essential component of human-like artificial intelligence, as emotions greatly influence human cognition, decision making, and social interactions. In addition to emotion recognition in conversations, the task of identifying the potential causes behind an individual's emotional state in conversations, is of great importance in many application scenarios. We organize SemEval-2024 Task 3, named Multimodal Emotion Cause Analysis in Conversations, which aims at extracting all pairs of emotions and their corresponding causes from conversations. Under different modality settings, it consists of two subtasks: Textual Emotion-Cause Pair Extraction in Conversations (TECPE) and Multimodal Emotion-Cause Pair Extraction in Conversations (MECPE). The shared task has attracted 143 registrations and 216 successful submissions. In this paper, we introduce the task, dataset and evaluation settings, summarize the systems of the top teams, and discuss the findings of the participants.
△ Less
Submitted 10 June, 2024; v1 submitted 19 May, 2024;
originally announced May 2024.
-
Spatial-temporal manipulations of visible nanosecond sub-pulse sequences in an actively Q-switched Pr:YLF laser
Authors:
Shengbo Xu,
Yunru Chen,
Ran Xia,
Changcheng Duan,
Qingrui Zeng,
Yu Xiao,
Xiahui Tang,
Gang Xu
Abstract:
Pulsed visible lasers either by Q-switching or mode locking have been attracting intense attentions both in solid-state laser and fiber laser. Here, we report on the simultaneous manipulation of reconfigurable sub-pulse sequences and customizable high-order vortex beams in an actively Q-switched visible laser. On the one hand, pulse sequences with up to 4 sub-pulses could be generated and fully co…
▽ More
Pulsed visible lasers either by Q-switching or mode locking have been attracting intense attentions both in solid-state laser and fiber laser. Here, we report on the simultaneous manipulation of reconfigurable sub-pulse sequences and customizable high-order vortex beams in an actively Q-switched visible laser. On the one hand, pulse sequences with up to 4 sub-pulses could be generated and fully controlled by means of an acoustic-optic modulator driven by an arbitrary waveform generator. Both pulse number and pulse intensity can be manipulated through the programmable step-signal, which is also theoretically simulated through the rate equations. On the other hand, assisted by the off-axis pum** technique and the astigmatic mode conversion, the laser cavity could emit high-quality vortex beams carrying Laguerre-Gaussian modes up to 30th order. To the best of our knowledge, this is the most flexible active manipulations not only on the intensity distribution of the transverse modes but also on the temporal distribution of the pulse sequences in a visible laser. The versatile manipulating techniques in this work could be immediately implemented into all other solid-state lasers to obtain sub-pulse vortex beams, which may provide enhanced functionality and flexibility for a large range of laser systems.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Active Galactic Nuclei and STaR fOrmation in Nearby Galaxies (AGNSTRONG). I. Sample and Strategy
Authors:
Huynh Anh N. Le,
Chen Qin,
Yongquan Xue,
Shifu Zhu,
Kim Ngan N. Nguyen,
Ruisong Xia,
Xiaozhi Lin
Abstract:
We introduce our project, AGNSTRONG (Active Galactic Nuclei and STaR fOrmation in Nearby Galaxies). Our research goals encompass investigating the kinematic properties of ionized and molecular gas outflows, understanding the impact of AGN feedback, and exploring the coevolution dynamics between AGN strength activity and star formation activity. We aim to conduct a thorough analysis to determine wh…
▽ More
We introduce our project, AGNSTRONG (Active Galactic Nuclei and STaR fOrmation in Nearby Galaxies). Our research goals encompass investigating the kinematic properties of ionized and molecular gas outflows, understanding the impact of AGN feedback, and exploring the coevolution dynamics between AGN strength activity and star formation activity. We aim to conduct a thorough analysis to determine whether there is an increase or suppression in SFRs among targets with and without powerful relativistic jets. Our sample consists of 35 nearby AGNs with and without powerful relativistic jet detections. Utilizing sub-millimeter (sub-mm) continuum observations at 450 μm and 850 μm from SCUBA-2 at the James Clerk Maxwell Telescope, we determine star-formation rates (SFRs) for our sources using spectral energy distribution (SED) fitting models. Additionally, we employ high-quality, spatially resolved spectra from UV-optical to near-infrared bands obtained with the Double Spectrograph and Triple Spectrograph mounted on the 200-inch Hale telescope at Palomar Observatory to study their multiphase gas outflow properties. This paper presents an overview of our sample selection methodology, research strategy, and initial results of our project. We find that the SFRs determined without including the sub-mm data in the SED fitting are overestimated by approximately 0.08 dex compared to those estimated with the inclusion of sub-mm data. Additionally, we compare the estimated SFRs in our work with those traced by the 4000Å break, as provided by the MPA-JHU catalog. We find that our determined SFRs are systematically higher than those traced by the 4000Å break. Finally, we outline our future research plans.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Enhancing LLM-Based Feedback: Insights from Intelligent Tutoring Systems and the Learning Sciences
Authors:
John Stamper,
Ruiwei Xiao,
Xinying Hou
Abstract:
The field of Artificial Intelligence in Education (AIED) focuses on the intersection of technology, education, and psychology, placing a strong emphasis on supporting learners' needs with compassion and understanding. The growing prominence of Large Language Models (LLMs) has led to the development of scalable solutions within educational settings, including generating different types of feedback…
▽ More
The field of Artificial Intelligence in Education (AIED) focuses on the intersection of technology, education, and psychology, placing a strong emphasis on supporting learners' needs with compassion and understanding. The growing prominence of Large Language Models (LLMs) has led to the development of scalable solutions within educational settings, including generating different types of feedback in Intelligent Tutoring Systems. However, the approach to utilizing these models often involves directly formulating prompts to solicit specific information, lacking a solid theoretical foundation for prompt construction and empirical assessments of their impact on learning. This work advocates careful and caring AIED research by going through previous research on feedback generation in ITS, with emphasis on the theoretical frameworks they utilized and the efficacy of the corresponding design in empirical evaluations, and then suggesting opportunities to apply these evidence-based principles to the design, experiment, and evaluation phases of LLM-based feedback generation. The main contributions of this paper include: an avocation of applying more cautious, theoretically grounded methods in feedback generation in the era of generative AI; and practical suggestions on theory and evidence-based feedback design for LLM-powered ITS.
△ Less
Submitted 11 May, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
A family of air-stable chalcogenide solid electrolytes in Li$_2$BMQ$_4$ (B = Ca, Sr and Ba; M = Si, Ge and Sn; Q = O, S and Se) systems
Authors:
Huican Mao,
Xiang Zhu,
Guangmao Li,
Jie Pang,
Junfeng Hao,
Liqi Wang,
Hailong Yu,
Youguo Shi,
Fan Wu,
Shilie Pan,
Ruijuan Xiao,
Hong Li,
Liquan Chen
Abstract:
Combining high-throughput first-principles calculations and experimental measurements, we have identified a novel family of fast lithium-ion chalcogenide conductors in Li$_2$BMQ$_4$ (2114, B = Ca, Sr and Ba; M = Si, Ge and Sn; Q = O, S and Se) systems. Our calculations demonstrate that most of the thermodynamically and kinetically stable sulfides and selenides in this new system exhibit ultralow L…
▽ More
Combining high-throughput first-principles calculations and experimental measurements, we have identified a novel family of fast lithium-ion chalcogenide conductors in Li$_2$BMQ$_4$ (2114, B = Ca, Sr and Ba; M = Si, Ge and Sn; Q = O, S and Se) systems. Our calculations demonstrate that most of the thermodynamically and kinetically stable sulfides and selenides in this new system exhibit ultralow Li$^+$ ion migration activation energy (0.16 eV ~ 0.56 eV) and considerable bandgaps varying between ~ 2 eV and 3.5 eV. We have successfully synthesized Li$_2$BaSnS$_4$ and Li$_2$SrSiS$_4$, and they exhibit excellent moisture stability through H$_2$S gas measurements. Electrochemical impedance measurements indicate 2114 systems show the typical features of solid ionic conductors, with a room-temperature Li$^+$ conductivity close to 5$\times$10$^{-4}$ mS/cm aligning with our molecular dynamics simulations. Furthermore, we have theoretically investigated the substitution of Cl$^-$ at S$^{2-}$ site. The doped compounds display significantly higher conductivity, with an increase of about three orders of magnitude (up to a maximum of 0.72 mS/cm) compared to the undoped compounds. These findings offer valuable insights for the further exploration of potential chalcogenide solid electrolyte materials with robust air stability and enhanced ionic conductivity for practical applications in lithium-ion batteries.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Anomalous electronic energy relaxation and soft phonons in the Dirac semimetal Cd$_3$As$_2$
Authors:
Rishi Bhandia,
David Barbalas,
Run Xiao,
Juan R. Chamorro,
Tyrel M. McQueen,
Nitin Samarth,
N. P. Armitage
Abstract:
We have used a combination of linear response time-domain THz spectroscopy (TDTS) and high-field non-linear THz spectroscopy to separately probe the electronic momentum and energy relaxation rates respectively of the Dirac semimetal Cd$_3$As$_2$. We find, consistent with prior measurements, that Cd$_3$As$_2$ has an enormous nonlinearities in the THz frequency range. We extract the momentum relaxat…
▽ More
We have used a combination of linear response time-domain THz spectroscopy (TDTS) and high-field non-linear THz spectroscopy to separately probe the electronic momentum and energy relaxation rates respectively of the Dirac semimetal Cd$_3$As$_2$. We find, consistent with prior measurements, that Cd$_3$As$_2$ has an enormous nonlinearities in the THz frequency range. We extract the momentum relaxation rate of Cd$_3$As$_2$ using Drude fits to the optical conductivity. We also conduct THz range 2D coherent spectroscopy. The dominant response is a pump-probe signal, which allow us to separately extract the energy relaxation rate. We find that the rate of energy relaxation decreases down to the lowest measured temperatures. We connect this to Cd$_3$As$_2$ anomalous lattice dynamics, evidence for which is found in its low thermal conductivity and soft phonons in Raman scattering. The lack of a peak in the energy relaxation rate as a function of T can be connected to the linear in T dependence of the current relaxation e.g. the phonon scattering is elastic down to the lowest measured temperatures approximately 120 K.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Streamlining Image Editing with Layered Diffusion Brushes
Authors:
Peyman Gholami,
Robert Xiao
Abstract:
Denoising diffusion models have recently gained prominence as powerful tools for a variety of image generation and manipulation tasks. Building on this, we propose a novel tool for real-time editing of images that provides users with fine-grained region-targeted supervision in addition to existing prompt-based controls. Our novel editing technique, termed Layered Diffusion Brushes, leverages promp…
▽ More
Denoising diffusion models have recently gained prominence as powerful tools for a variety of image generation and manipulation tasks. Building on this, we propose a novel tool for real-time editing of images that provides users with fine-grained region-targeted supervision in addition to existing prompt-based controls. Our novel editing technique, termed Layered Diffusion Brushes, leverages prompt-guided and region-targeted alteration of intermediate denoising steps, enabling precise modifications while maintaining the integrity and context of the input image. We provide an editor based on Layered Diffusion Brushes modifications, which incorporates well-known image editing concepts such as layer masks, visibility toggles, and independent manipulation of layers; regardless of their order. Our system renders a single edit on a 512x512 image within 140 ms using a high-end consumer GPU, enabling real-time feedback and rapid exploration of candidate edits. We validated our method and editing system through a user study involving both natural images (using inversion) and generated images, showcasing its usability and effectiveness compared to existing techniques such as InstructPix2Pix and Stable Diffusion Inpainting for refining images. Our approach demonstrates efficacy across a range of tasks, including object attribute adjustments, error correction, and sequential prompt-based object placement and manipulation, demonstrating its versatility and potential for enhancing creative workflows.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Probing Neutral Triple Gauge Couplings via $\boldsymbol{Zγ\,(\ell^+\ell^-γ)}$ Production at $\boldsymbol{e^+e^-}$ Colliders
Authors:
Danning Liu,
Rui-Qing Xiao,
Shu Li,
John Ellis,
Hong-Jian He,
Rui Yuan
Abstract:
Neutral triple gauge couplings (nTGCs) are absent in the Standard Model (SM) and at the dimension-6 level in the Standard Model Effective Field Theory (SMEFT), arising first from dimension-8 operators. As such, they provide a unique window for probing new physics beyond the SM. These dimension-8 operators can be mapped to nTGC form factors whose structure is consistent with the spontaneously-broke…
▽ More
Neutral triple gauge couplings (nTGCs) are absent in the Standard Model (SM) and at the dimension-6 level in the Standard Model Effective Field Theory (SMEFT), arising first from dimension-8 operators. As such, they provide a unique window for probing new physics beyond the SM. These dimension-8 operators can be mapped to nTGC form factors whose structure is consistent with the spontaneously-broken electroweak gauge symmetry of the SM. In this work, we study the probes of nTGCs in the reaction $e^+e^-\to Zγ$ with $Z\to\ell^+\ell^-\,(\ell =e,μ)$ at an $e^+e^-$ collider. We perform a detector-level simulation and analysis of this reaction at the Circular Electron Positron Collider (CEPC) with collision energy $\sqrt{s} = 240$ GeV and an integrated luminosity of 20 ab$^{-1}$. We present the sensitivity limits on probing the new physics scales of dimension-8 nTGC operators via measurements of the corresponding nTGC form factors.
△ Less
Submitted 1 July, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
Hi-Gen: Generative Retrieval For Large-Scale Personalized E-commerce Search
Authors:
Yan**g Wu,
Yinfu Feng,
Jian Wang,
Wenji Zhou,
Yunan Ye,
Rong Xiao
Abstract:
Leveraging generative retrieval (GR) techniques to enhance search systems is an emerging methodology that has shown promising results in recent years. In GR, a text-to-text model maps string queries directly to relevant document identifiers (docIDs), so it dramatically simplifies the whole retrieval process. However, when applying most GR models in large-scale E-commerce for personalized item sear…
▽ More
Leveraging generative retrieval (GR) techniques to enhance search systems is an emerging methodology that has shown promising results in recent years. In GR, a text-to-text model maps string queries directly to relevant document identifiers (docIDs), so it dramatically simplifies the whole retrieval process. However, when applying most GR models in large-scale E-commerce for personalized item search, we have to face two key problems in encoding and decoding. (1) Existing docID generation methods ignore the encoding of efficiency information, which is critical in E-commerce. (2) The positional information is important in decoding docIDs, while prior studies have not adequately discriminated the significance of positional information or well exploited the inherent interrelation among these positions. To overcome these problems, we introduce an efficient Hierarchical encoding-decoding Generative retrieval method (Hi-Gen) for large-scale personalized E-commerce search systems. Specifically, we first design a representation learning model along with metric learning to learn discriminative feature representations of items to capture both semantic relevance and efficiency information. Then, we propose a category-guided hierarchical clustering scheme that makes full use of the semantic and efficiency information of items to facilitate docID generation. Finally, we design a position-aware loss to discriminate the importance of positions and mine the inherent interrelation between different tokens at the same position. This loss boosts the performance of the language model used in the decoding stage. Besides, we propose two variants of Hi-Gen (i.e.,Hi-Gen-I2I and Hi-Gen-Cluster) to support online real-time large-scale recall in the online serving process. Extensive experiments on both public and industry datasets demonstrate the effectiveness and efficiency of Hi-Gen.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
SQUWA: Signal Quality Aware DNN Architecture for Enhanced Accuracy in Atrial Fibrillation Detection from Noisy PPG Signals
Authors:
Runze Yan,
Cheng Ding,
Ran Xiao,
Aleksandr Fedorov,
Randall J Lee,
Fadi Nahab,
Xiao Hu
Abstract:
Atrial fibrillation (AF), a common cardiac arrhythmia, significantly increases the risk of stroke, heart disease, and mortality. Photoplethysmography (PPG) offers a promising solution for continuous AF monitoring, due to its cost efficiency and integration into wearable devices. Nonetheless, PPG signals are susceptible to corruption from motion artifacts and other factors often encountered in ambu…
▽ More
Atrial fibrillation (AF), a common cardiac arrhythmia, significantly increases the risk of stroke, heart disease, and mortality. Photoplethysmography (PPG) offers a promising solution for continuous AF monitoring, due to its cost efficiency and integration into wearable devices. Nonetheless, PPG signals are susceptible to corruption from motion artifacts and other factors often encountered in ambulatory settings. Conventional approaches typically discard corrupted segments or attempt to reconstruct original signals, allowing for the use of standard machine learning techniques. However, this reduces dataset size and introduces biases, compromising prediction accuracy and the effectiveness of continuous monitoring. We propose a novel deep learning model, Signal Quality Weighted Fusion of Attentional Convolution and Recurrent Neural Network (SQUWA), designed to learn how to retain accurate predictions from partially corrupted PPG. Specifically, SQUWA innovatively integrates an attention mechanism that directly considers signal quality during the learning process, dynamically adjusting the weights of time series segments based on their quality. This approach enhances the influence of higher-quality segments while reducing that of lower-quality ones, effectively utilizing partially corrupted segments. This approach represents a departure from the conventional methods that exclude such segments, enabling the utilization of a broader range of data, which has great implications for less disruption when monitoring of AF risks and more accurate estimation of AF burdens. Our extensive experiments show that SQUWA outperform existing PPG-based models, achieving the highest AUCPR of 0.89 with label noise mitigation. This also exceeds the 0.86 AUCPR of models trained with using both electrocardiogram (ECG) and PPG data.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Multi-view X-ray Image Synthesis with Multiple Domain Disentanglement from CT Scans
Authors:
Lixing Tan,
Shuang Song,
Kangneng Zhou,
Chengbo Duan,
Lanying Wang,
Huayang Ren,
Linlin Liu,
Wei Zhang,
Ruoxiu Xiao
Abstract:
X-ray images play a vital role in the intraoperative processes due to their high resolution and fast imaging speed and greatly promote the subsequent segmentation, registration and reconstruction. However, over-dosed X-rays superimpose potential risks to human health to some extent. Data-driven algorithms from volume scans to X-ray images are restricted by the scarcity of paired X-ray and volume d…
▽ More
X-ray images play a vital role in the intraoperative processes due to their high resolution and fast imaging speed and greatly promote the subsequent segmentation, registration and reconstruction. However, over-dosed X-rays superimpose potential risks to human health to some extent. Data-driven algorithms from volume scans to X-ray images are restricted by the scarcity of paired X-ray and volume data. Existing methods are mainly realized by modelling the whole X-ray imaging procedure. In this study, we propose a learning-based approach termed CT2X-GAN to synthesize the X-ray images in an end-to-end manner using the content and style disentanglement from three different image domains. Our method decouples the anatomical structure information from CT scans and style information from unpaired real X-ray images/ digital reconstructed radiography (DRR) images via a series of decoupling encoders. Additionally, we introduce a novel consistency regularization term to improve the stylistic resemblance between synthesized X-ray images and real X-ray images. Meanwhile, we also impose a supervised process by computing the similarity of computed real DRR and synthesized DRR images. We further develop a pose attention module to fully strengthen the comprehensive information in the decoupled content code from CT scans, facilitating high-quality multi-view image synthesis in the lower 2D space. Extensive experiments were conducted on the publicly available CTSpine1K dataset and achieved 97.8350, 0.0842 and 3.0938 in terms of FID, KID and defined user-scored X-ray similarity, respectively. In comparison with 3D-aware methods ($π$-GAN, EG3D), CT2X-GAN is superior in improving the synthesis quality and realistic to the real X-ray images.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Revealing the spatial nature of sublattice symmetry
Authors:
Rong Xiao,
Y. X. Zhao
Abstract:
The sublattice symmetry on a bipartite lattice is commonly regarded as the chiral symmetry in the AIII class of the tenfold Altland-Zirnbauer classification. Here, we reveal the spatial nature of sublattice symmetry, and show that this assertion holds only if the periodicity of primitive unit cells agrees with that of the sublattice labeling. In cases where the periodicity does not agree, sublatti…
▽ More
The sublattice symmetry on a bipartite lattice is commonly regarded as the chiral symmetry in the AIII class of the tenfold Altland-Zirnbauer classification. Here, we reveal the spatial nature of sublattice symmetry, and show that this assertion holds only if the periodicity of primitive unit cells agrees with that of the sublattice labeling. In cases where the periodicity does not agree, sublattice symmetry is represented as a glide reflection in energy-momentum space, which inverts energy and simultaneously translates some $k$ by $π$, leading to substantially different physics. Particularly, it introduces novel constraints on zero modes in semimetals and completely alters the classification table of topological insulators compared to class AIII. Notably, the dimensions corresponding to trivial and nontrivial classifications are switched, and the nontrivial classification becomes $\mathbb{Z}_2$ instead of $\mathbb{Z}$. We have applied these results to several models, including the Hofstadter model both with and without dimerization.
△ Less
Submitted 8 May, 2024; v1 submitted 17 April, 2024;
originally announced April 2024.
-
Monochromatic Polynomial sumset structures on $\mathbb{N}$
Authors:
Zhengxing Lian,
Rongzhong Xiao
Abstract:
In the paper, we searh for monochromatic infinite additive structures involving polynomials on $\mathbb{N}$. Ultimately, we can prove that for any $r\in \mathbb{N}$, any distinct natural numbers $a,b$ and any $2$-coloring of $\mathbb{N}$, there exist subsets $B,C\subset \mathbb{N}$ with $|B|=r$ and $|C|=\infty$ such that there exists a color containing $B+aC$ and $B+bC$. In fact, for the specific…
▽ More
In the paper, we searh for monochromatic infinite additive structures involving polynomials on $\mathbb{N}$. Ultimately, we can prove that for any $r\in \mathbb{N}$, any distinct natural numbers $a,b$ and any $2$-coloring of $\mathbb{N}$, there exist subsets $B,C\subset \mathbb{N}$ with $|B|=r$ and $|C|=\infty$ such that there exists a color containing $B+aC$ and $B+bC$. In fact, for the specific question considered by us, we give a complete answer.
△ Less
Submitted 15 April, 2024; v1 submitted 8 April, 2024;
originally announced April 2024.
-
Exploring How Multiple Levels of GPT-Generated Programming Hints Support or Disappoint Novices
Authors:
Ruiwei Xiao,
Xinying Hou,
John Stamper
Abstract:
Recent studies have integrated large language models (LLMs) into diverse educational contexts, including providing adaptive programming hints, a type of feedback focuses on hel** students move forward during problem-solving. However, most existing LLM-based hint systems are limited to one single hint type. To investigate whether and how different levels of hints can support students' problem-sol…
▽ More
Recent studies have integrated large language models (LLMs) into diverse educational contexts, including providing adaptive programming hints, a type of feedback focuses on hel** students move forward during problem-solving. However, most existing LLM-based hint systems are limited to one single hint type. To investigate whether and how different levels of hints can support students' problem-solving and learning, we conducted a think-aloud study with 12 novices using the LLM Hint Factory, a system providing four levels of hints from general natural language guidance to concrete code assistance, varying in format and granularity. We discovered that high-level natural language hints alone can be helpless or even misleading, especially when addressing next-step or syntax-related help requests. Adding lower-level hints, like code examples with in-line comments, can better support students. The findings open up future work on customizing help responses from content, format, and granularity levels to accurately identify and meet students' learning needs.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Deconvolution from two order statistics
Authors:
JoonHwan Cho,
Yao Luo,
Ruli Xiao
Abstract:
Economic data are often contaminated by measurement errors and truncated by ranking. This paper shows that the classical measurement error model with independent and additive measurement errors is identified nonparametrically using only two order statistics of repeated measurements. The identification result confirms a hypothesis by Athey and Haile (2002) for a symmetric ascending auction model wi…
▽ More
Economic data are often contaminated by measurement errors and truncated by ranking. This paper shows that the classical measurement error model with independent and additive measurement errors is identified nonparametrically using only two order statistics of repeated measurements. The identification result confirms a hypothesis by Athey and Haile (2002) for a symmetric ascending auction model with unobserved heterogeneity. Extensions allow for heterogeneous measurement errors, broadening the applicability to additional empirical settings, including asymmetric auctions and wage offer models. We adapt an existing simulated sieve estimator and illustrate its performance in finite samples.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
MatchSeg: Towards Better Segmentation via Reference Image Matching
Authors:
Ruiqiang Xiao,
Jiayu Huo,
Haotian Zheng,
Yang Liu,
Sebastien Ourselin,
Rachel Sparks
Abstract:
Recently, automated medical image segmentation methods based on deep learning have achieved great success. However, they heavily rely on large annotated datasets, which are costly and time-consuming to acquire. Few-shot learning aims to overcome the need for annotated data by using a small labeled dataset, known as a support set, to guide predicting labels for new, unlabeled images, known as the q…
▽ More
Recently, automated medical image segmentation methods based on deep learning have achieved great success. However, they heavily rely on large annotated datasets, which are costly and time-consuming to acquire. Few-shot learning aims to overcome the need for annotated data by using a small labeled dataset, known as a support set, to guide predicting labels for new, unlabeled images, known as the query set. Inspired by this paradigm, we introduce MatchSeg, a novel framework that enhances medical image segmentation through strategic reference image matching. We leverage contrastive language-image pre-training (CLIP) to select highly relevant samples when defining the support set. Additionally, we design a joint attention module to strengthen the interaction between support and query features, facilitating a more effective knowledge transfer between support and query sets. We validated our method across four public datasets. Experimental results demonstrate superior segmentation performance and powerful domain generalization ability of MatchSeg against existing methods for domain-specific and cross-domain segmentation tasks. Our code is made available at https://github.com/keeplearning-again/MatchSeg
△ Less
Submitted 19 June, 2024; v1 submitted 23 March, 2024;
originally announced March 2024.
-
Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression
Authors:
Hancheng Ye,
Chong Yu,
Peng Ye,
Renqiu Xia,
Yansong Tang,
Jiwen Lu,
Tao Chen,
Bo Zhang
Abstract:
Recent Vision Transformer Compression (VTC) works mainly follow a two-stage scheme, where the importance score of each model unit is first evaluated or preset in each submodule, followed by the sparsity score evaluation according to the target sparsity constraint. Such a separate evaluation process induces the gap between importance and sparsity score distributions, thus causing high search costs…
▽ More
Recent Vision Transformer Compression (VTC) works mainly follow a two-stage scheme, where the importance score of each model unit is first evaluated or preset in each submodule, followed by the sparsity score evaluation according to the target sparsity constraint. Such a separate evaluation process induces the gap between importance and sparsity score distributions, thus causing high search costs for VTC. In this work, for the first time, we investigate how to integrate the evaluations of importance and sparsity scores into a single stage, searching the optimal subnets in an efficient manner. Specifically, we present OFB, a cost-efficient approach that simultaneously evaluates both importance and sparsity scores, termed Once for Both (OFB), for VTC. First, a bi-mask scheme is developed by entangling the importance score and the differentiable sparsity score to jointly determine the pruning potential (prunability) of each unit. Such a bi-mask search strategy is further used together with a proposed adaptive one-hot loss to realize the progressive-and-efficient search for the most important subnet. Finally, Progressive Masked Image Modeling (PMIM) is proposed to regularize the feature space to be more representative during the search process, which may be degraded by the dimension reduction. Extensive experiments demonstrate that OFB can achieve superior compression performance over state-of-the-art searching-based and pruning-based methods under various Vision Transformer architectures, meanwhile promoting search efficiency significantly, e.g., costing one GPU search day for the compression of DeiT-S on ImageNet-1K.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
Efficient Learning Strategy for Predicting Glass Forming Ability in Imbalanced Datasets of Bulk Metallic Glasses
Authors:
Xuhe Gong,
Jiazi Bi,
Xiaobin Liu,
Ran Li,
Ruijuan Xiao,
Tao Zhang,
Hong Li
Abstract:
The prediction of glass forming ability (GFA) and various properties in bulk metallic glasses (BMGs) pose a challenge due to the unique disordered atomic structure in this type of materials. Machine learning shows the potential ability to find a way out. However, the training set from the experimental data of BMGs faces the issue of data imbalance, including the distribution of data related to ele…
▽ More
The prediction of glass forming ability (GFA) and various properties in bulk metallic glasses (BMGs) pose a challenge due to the unique disordered atomic structure in this type of materials. Machine learning shows the potential ability to find a way out. However, the training set from the experimental data of BMGs faces the issue of data imbalance, including the distribution of data related to elements, the range of performance data, and the distribution of sparse and dense data area in each specific system. In this work, the origin of the data imbalance and its impact on the GFA prediction ability of machine learning models are analyzed. We propose the solutions by training the model using the pruned dataset to mitigate the imbalance and by performing an active experimental iterative learning to compensate for the information loss resulting from data reduction. The strategy is proved in Zr-Al-Cu system, and the automated workflow has been established. It effectively avoids the prediction results from trap** into the intensive training data area or from inducing by the data distribution of similar element systems. This approach will expedite the development of new BMGs compositions especially for unexplored systems.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Mechanistic Insights into Temperature Effects for Ionic Conductivity in Li6PS5Cl
Authors:
Zicun Li,
Jianxing Huang,
Xinguo Ren,
**bin Li,
Ruijuan Xiao,
Hong Li
Abstract:
Ensuring solid-state lithium batteries perform well across a wide temperature range is crucial for their practical use. Molecular dynamics (MD) simulations can provide valuable insights into the temperature dependence of the battery materials, however, the high computational cost of ab initio MD poses challenges for simulating ion migration dynamics at low temperatures. To address this issue, accu…
▽ More
Ensuring solid-state lithium batteries perform well across a wide temperature range is crucial for their practical use. Molecular dynamics (MD) simulations can provide valuable insights into the temperature dependence of the battery materials, however, the high computational cost of ab initio MD poses challenges for simulating ion migration dynamics at low temperatures. To address this issue, accurate machine-learning interatomic potentials were trained, which enable efficient and reliable simulations of the ionic diffusion processes in Li6PS5Cl over a large temperature range for long-time evolution. Our study revealed the significant impact of subtle lattice parameter variations on Li+ diffusion at low temperatures and identified the increasing influence of surface contributions as the temperature decreases. Our findings elucidate the factors influencing low temperature performance and present strategic guidance towards improving the performance of solid-state lithium batteries under these conditions.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
DPPA: Pruning Method for Large Language Model to Model Merging
Authors:
Yaochen Zhu,
Rui Xia,
Jiajun Zhang
Abstract:
Model merging is to combine fine-tuned models derived from multiple domains, with the intent of enhancing the model's proficiency across various domains. The principal concern is the resolution of parameter conflicts. A substantial amount of existing research remedy this issue during the merging stage, with the latest study focusing on resolving this issue throughout the pruning stage. The DARE ap…
▽ More
Model merging is to combine fine-tuned models derived from multiple domains, with the intent of enhancing the model's proficiency across various domains. The principal concern is the resolution of parameter conflicts. A substantial amount of existing research remedy this issue during the merging stage, with the latest study focusing on resolving this issue throughout the pruning stage. The DARE approach has exhibited promising outcomes when applied to a simplistic fine-tuned model. However, the efficacy of this method tends to wane when employed on complex fine-tuned models that show a significant parameter bias relative to the baseline model. In this paper, we introduce a dual-stage method termed Dynamic Pruning Partition Amplification (DPPA), devised to tackle the challenge of merging complex fine-tuned models. Initially, we introduce Dynamically Pruning (DP), an improved approach based on magnitude pruning, which aim is to enhance performance at higher pruning rates. Subsequently, we propose Dynamically Partition Amplification (DPA), a rescaling strategy, is designed to dynamically amplify parameter partitions in relation to their significance levels. The experimental results show that our method maintains a mere 20% of domain-specific parameters and yet delivers a performance comparable to other methodologies that preserve up to 90% of parameters. Furthermore, our method displays outstanding performance post-pruning, leading to a significant improvement of nearly 20% performance in model merging. We make our code on Github.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
VCD: Knowledge Base Guided Visual Commonsense Discovery in Images
Authors:
Xiangqing Shen,
Yurun Song,
Siwei Wu,
Rui Xia
Abstract:
Visual commonsense contains knowledge about object properties, relationships, and behaviors in visual data. Discovering visual commonsense can provide a more comprehensive and richer understanding of images, and enhance the reasoning and decision-making capabilities of computer vision systems. However, the visual commonsense defined in existing visual commonsense discovery studies is coarse-graine…
▽ More
Visual commonsense contains knowledge about object properties, relationships, and behaviors in visual data. Discovering visual commonsense can provide a more comprehensive and richer understanding of images, and enhance the reasoning and decision-making capabilities of computer vision systems. However, the visual commonsense defined in existing visual commonsense discovery studies is coarse-grained and incomplete. In this work, we draw inspiration from a commonsense knowledge base ConceptNet in natural language processing, and systematically define the types of visual commonsense. Based on this, we introduce a new task, Visual Commonsense Discovery (VCD), aiming to extract fine-grained commonsense of different types contained within different objects in the image. We accordingly construct a dataset (VCDD) from Visual Genome and ConceptNet for VCD, featuring over 100,000 images and 14 million object-commonsense pairs. We furthermore propose a generative model (VCDM) that integrates a vision-language model with instruction tuning to tackle VCD. Automatic and human evaluations demonstrate VCDM's proficiency in VCD, particularly outperforming GPT-4V in implicit commonsense discovery. The value of VCD is further demonstrated by its application to two downstream tasks, including visual commonsense evaluation and visual question answering. The data and code will be made available on GitHub.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning
Authors:
Renqiu Xia,
Bo Zhang,
Hancheng Ye,
Xiangchao Yan,
Qi Liu,
Hongbin Zhou,
Zijun Chen,
Min Dou,
Botian Shi,
Junchi Yan,
Yu Qiao
Abstract:
Recently, many versatile Multi-modal Large Language Models (MLLMs) have emerged continuously. However, their capacity to query information depicted in visual charts and engage in reasoning based on the queried contents remains under-explored. In this paper, to comprehensively and rigorously benchmark the ability of the off-the-shelf MLLMs in the chart domain, we construct ChartX, a multi-modal eva…
▽ More
Recently, many versatile Multi-modal Large Language Models (MLLMs) have emerged continuously. However, their capacity to query information depicted in visual charts and engage in reasoning based on the queried contents remains under-explored. In this paper, to comprehensively and rigorously benchmark the ability of the off-the-shelf MLLMs in the chart domain, we construct ChartX, a multi-modal evaluation set covering 18 chart types, 7 chart tasks, 22 disciplinary topics, and high-quality chart data. Besides, we develop ChartVLM to offer a new perspective on handling multi-modal tasks that strongly depend on interpretable patterns, such as reasoning tasks in the field of charts or geometric images. We evaluate the chart-related ability of mainstream MLLMs and our ChartVLM on the proposed ChartX evaluation set. Extensive experiments demonstrate that ChartVLM surpasses both versatile and chart-related large models, achieving results comparable to GPT-4V. We believe that our study can pave the way for further exploration in creating a more comprehensive chart evaluation set and develo** more interpretable multi-modal models. Both ChartX and ChartVLM are available at: https://github.com/UniModal4Reasoning/ChartVLM
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding
Authors:
Hanling Yi,
Feng Lin,
Hongbin Li,
Peiyang Ning,
Xiaotian Yu,
Rong Xiao
Abstract:
This research aims to accelerate the inference speed of large language models (LLMs) with billions of parameters. We propose \textbf{S}mart \textbf{P}arallel \textbf{A}uto-\textbf{C}orrect d\textbf{E}coding (SPACE), an innovative approach designed for achieving lossless acceleration of LLMs. By integrating semi-autoregressive inference and speculative decoding capabilities, SPACE uniquely enables…
▽ More
This research aims to accelerate the inference speed of large language models (LLMs) with billions of parameters. We propose \textbf{S}mart \textbf{P}arallel \textbf{A}uto-\textbf{C}orrect d\textbf{E}coding (SPACE), an innovative approach designed for achieving lossless acceleration of LLMs. By integrating semi-autoregressive inference and speculative decoding capabilities, SPACE uniquely enables autoregressive LLMs to parallelize token generation and verification. This is realized through a specialized semi-autoregressive supervised fine-tuning process that equips existing LLMs with the ability to simultaneously predict multiple tokens. Additionally, an auto-correct decoding algorithm facilitates the simultaneous generation and verification of token sequences within a single model invocation. Through extensive experiments on a range of LLMs, SPACE has demonstrated inference speedup ranging from 2.7x-4.0x on HumanEval-X while maintaining output quality.
△ Less
Submitted 19 May, 2024; v1 submitted 18 February, 2024;
originally announced February 2024.
-
QACP: An Annotated Question Answering Dataset for Assisting Chinese Python Programming Learners
Authors:
Rui Xiao,
Lu Han,
Xiaoying Zhou,
Jiong Wang,
Na Zong,
Pengyu Zhang
Abstract:
In online learning platforms, particularly in rapidly growing computer programming courses, addressing the thousands of students' learning queries requires considerable human cost. The creation of intelligent assistant large language models (LLMs) tailored for programming education necessitates distinct data support. However, in real application scenarios, the data resources for training such LLMs…
▽ More
In online learning platforms, particularly in rapidly growing computer programming courses, addressing the thousands of students' learning queries requires considerable human cost. The creation of intelligent assistant large language models (LLMs) tailored for programming education necessitates distinct data support. However, in real application scenarios, the data resources for training such LLMs are relatively scarce. Therefore, to address the data scarcity in intelligent educational systems for programming, this paper proposes a new Chinese question-and-answer dataset for Python learners. To ensure the authenticity and reliability of the sources of the questions, we collected questions from actual student questions and categorized them according to various dimensions such as the type of questions and the type of learners. This annotation principle is designed to enhance the effectiveness and quality of online programming education, providing a solid data foundation for develo** the programming teaching assists (TA). Furthermore, we conducted comprehensive evaluations of various LLMs proficient in processing and generating Chinese content, highlighting the potential limitations of general LLMs as intelligent teaching assistants in computer programming courses.
△ Less
Submitted 22 February, 2024; v1 submitted 30 January, 2024;
originally announced February 2024.
-
Evaluation of General Large Language Models in Contextually Assessing Semantic Concepts Extracted from Adult Critical Care Electronic Health Record Notes
Authors:
Darren Liu,
Cheng Ding,
Delgersuren Bold,
Monique Bouvier,
Jiaying Lu,
Benjamin Shickel,
Craig S. Jabaley,
Wenhui Zhang,
Soo** Park,
Michael J. Young,
Mark S. Wainwright,
Gilles Clermont,
Parisa Rashidi,
Eric S. Rosenthal,
Laurie Dimisko,
Ran Xiao,
Joo Heung Yoon,
Carl Yang,
Xiao Hu
Abstract:
The field of healthcare has increasingly turned its focus towards Large Language Models (LLMs) due to their remarkable performance. However, their performance in actual clinical applications has been underexplored. Traditional evaluations based on question-answering tasks don't fully capture the nuanced contexts. This gap highlights the need for more in-depth and practical assessments of LLMs in r…
▽ More
The field of healthcare has increasingly turned its focus towards Large Language Models (LLMs) due to their remarkable performance. However, their performance in actual clinical applications has been underexplored. Traditional evaluations based on question-answering tasks don't fully capture the nuanced contexts. This gap highlights the need for more in-depth and practical assessments of LLMs in real-world healthcare settings. Objective: We sought to evaluate the performance of LLMs in the complex clinical context of adult critical care medicine using systematic and comprehensible analytic methods, including clinician annotation and adjudication. Methods: We investigated the performance of three general LLMs in understanding and processing real-world clinical notes. Concepts from 150 clinical notes were identified by MetaMap and then labeled by 9 clinicians. Each LLM's proficiency was evaluated by identifying the temporality and negation of these concepts using different prompts for an in-depth analysis. Results: GPT-4 showed overall superior performance compared to other LLMs. In contrast, both GPT-3.5 and text-davinci-003 exhibit enhanced performance when the appropriate prompting strategies are employed. The GPT family models have demonstrated considerable efficiency, evidenced by their cost-effectiveness and time-saving capabilities. Conclusion: A comprehensive qualitative performance evaluation framework for LLMs is developed and operationalized. This framework goes beyond singular performance aspects. With expert annotations, this methodology not only validates LLMs' capabilities in processing complex medical data but also establishes a benchmark for future LLM evaluations across specialized domains.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models
Authors:
Feng Lin,
Hanling Yi,
Hongbin Li,
Yifan Yang,
Xiaotian Yu,
Guangming Lu,
Rong Xiao
Abstract:
Large language models (LLMs) commonly employ autoregressive generation during inference, leading to high memory bandwidth demand and consequently extended latency. To mitigate this inefficiency, we present Bi-directional Tuning for lossless Acceleration (BiTA), an innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification. Inspired by the concept of pro…
▽ More
Large language models (LLMs) commonly employ autoregressive generation during inference, leading to high memory bandwidth demand and consequently extended latency. To mitigate this inefficiency, we present Bi-directional Tuning for lossless Acceleration (BiTA), an innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification. Inspired by the concept of prompt tuning, we enhance LLMs with a parameter-efficient design called bi-directional tuning for the capability in semi-autoregressive generation. Employing efficient tree-based decoding, the models perform draft candidate generation and verification in parallel, ensuring outputs identical to their autoregressive counterparts under greedy sampling. BiTA serves as a lightweight plug-in module, seamlessly boosting the inference efficiency of existing LLMs without requiring additional assistance models or incurring significant extra memory costs. Applying the proposed BiTA, LLaMA-2-70B-Chat achieves a 2.7$\times$ speedup on the MT-Bench benchmark. Extensive experiments confirm our method surpasses state-of-the-art acceleration techniques.
△ Less
Submitted 25 January, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
First Observational Evidence for an Interconnected Evolution between Time Lag and QPO Frequency among AGNs
Authors:
Ruisong Xia,
Hao Liu,
Yongquan Xue
Abstract:
Quasi-periodic oscillations (QPOs) have been widely observed in black hole X-ray binaries (BHBs), which often exhibit significant X-ray variations. Extensive research has explored the long-term evolution of the properties of QPOs in BHBs. In contrast, such evolution in active galactic nuclei (AGNs) has remained largely unexplored due to limited observational data. By using the 10 new XMM-Newton ob…
▽ More
Quasi-periodic oscillations (QPOs) have been widely observed in black hole X-ray binaries (BHBs), which often exhibit significant X-ray variations. Extensive research has explored the long-term evolution of the properties of QPOs in BHBs. In contrast, such evolution in active galactic nuclei (AGNs) has remained largely unexplored due to limited observational data. By using the 10 new XMM-Newton observations for the narrow-line Seyfert 1 galaxy RE J1034+396 from publicly available data, we analyze the characteristics of its X-ray QPOs and examine their long-term evolution. The hard-band (1--4 keV) QPOs are found in all 10 observations and the frequency of these QPOs evolves ranging at $(2.47\text{--}2.83)\times10^{-4}\rm\ Hz$. Furthermore, QPO signals in the soft (0.3--1 keV) and hard bands exhibit strong coherence, although, at times, the variations in the soft band lead those in the hard band (the hard-lag mode), while at other times, it is the reverse (the soft-lag mode). The observations presented here serendipitously captured two ongoing lag reversals within about two weeks, which are first seen in RE J1034+396 and also among all AGNs. A transition in QPO frequency also takes place within a two-week timeframe, two weeks prior to its corresponding lag reversal, indicating a possible coherence between the transitions of QPO frequency and lag mode with delay. The diagram of time lag versus QPO frequency clearly evidences this interconnected evolution with hysteresis, which is, for the first time, observed among AGNs.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Generating Non-Stationary Textures using Self-Rectification
Authors:
Yang Zhou,
Rongjun Xiao,
Dani Lischinski,
Daniel Cohen-Or,
Hui Huang
Abstract:
This paper addresses the challenge of example-based non-stationary texture synthesis. We introduce a novel twostep approach wherein users first modify a reference texture using standard image editing tools, yielding an initial rough target for the synthesis. Subsequently, our proposed method, termed "self-rectification", automatically refines this target into a coherent, seamless texture, while fa…
▽ More
This paper addresses the challenge of example-based non-stationary texture synthesis. We introduce a novel twostep approach wherein users first modify a reference texture using standard image editing tools, yielding an initial rough target for the synthesis. Subsequently, our proposed method, termed "self-rectification", automatically refines this target into a coherent, seamless texture, while faithfully preserving the distinct visual characteristics of the reference exemplar. Our method leverages a pre-trained diffusion network, and uses self-attention mechanisms, to gradually align the synthesized texture with the reference, ensuring the retention of the structures in the provided target. Through experimental validation, our approach exhibits exceptional proficiency in handling non-stationary textures, demonstrating significant advancements in texture synthesis when compared to existing state-of-the-art techniques. Code is available at https://github.com/xiaorongjun000/Self-Rectification
△ Less
Submitted 30 January, 2024; v1 submitted 5 January, 2024;
originally announced January 2024.
-
Female Entrepreneur on Board:Assessing the Effect of Gender on Corporate Financial Constraints
Authors:
Ruiying Xiao
Abstract:
This study investigates the impact of female leadership on the financial constraints of firms, which are publicly listed entrepreneurial enterprises in China. Utilizing data from 938 companies on the China Growth Enterprise Market (GEM) over a period of 2013-2022, this paper explores how the female presence in CEO positions, senior management, and board membership influences a firm's ability to ma…
▽ More
This study investigates the impact of female leadership on the financial constraints of firms, which are publicly listed entrepreneurial enterprises in China. Utilizing data from 938 companies on the China Growth Enterprise Market (GEM) over a period of 2013-2022, this paper explores how the female presence in CEO positions, senior management, and board membership influences a firm's ability to manage financial constraints. Our analysis employs the Kaplan-Zingales (KZ) Index to measure these constraints, encompassing some key financial factors such as cash flow, dividends, and leverage. The findings reveal that companies with female CEOs or a higher proportion of women in top management are associated with reduced financial constraints. However, the influence of female board members is less clear-cut. Our study also delves into the variances of these effects between high-tech and low-tech industry sectors, emphasizing how internal gender biases in high-tech industries may impede the alleviation of financing constraints on firms. This research contributes to a nuanced understanding of the role of gender dynamics in corporate financial management, especially in the context of China's evolving economic landscape. It underscores the importance of promoting female leadership not only for gender equity but also for enhancing corporate financial resilience.
△ Less
Submitted 4 January, 2024;
originally announced January 2024.
-
Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math
Authors:
Zengzhi Wang,
Rui Xia,
Pengfei Liu
Abstract:
High-quality, large-scale corpora are the cornerstone of building foundation models. In this work, we introduce \textsc{MathPile}, a diverse and high-quality math-centric corpus comprising about 9.5 billion tokens. Throughout its creation, we adhered to the principle of ``\emph{less is more}'', firmly believing in the supremacy of data quality over quantity, even in the pre-training phase. Our met…
▽ More
High-quality, large-scale corpora are the cornerstone of building foundation models. In this work, we introduce \textsc{MathPile}, a diverse and high-quality math-centric corpus comprising about 9.5 billion tokens. Throughout its creation, we adhered to the principle of ``\emph{less is more}'', firmly believing in the supremacy of data quality over quantity, even in the pre-training phase. Our meticulous data collection and processing efforts included a complex suite of preprocessing, prefiltering, language identification, cleaning, filtering, and deduplication, ensuring the high quality of our corpus. Furthermore, we performed data contamination detection on downstream benchmark test sets to eliminate duplicates. We hope our \textsc{MathPile} can help to enhance the mathematical reasoning abilities of language models. We plan to open-source different versions of \mathpile with the scripts used for processing, to facilitate future developments in this field.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
Achieving 100% amplitude modulation depth in a graphene-based tuneable capacitance metamaterial
Authors:
Ruqiao Xia,
Nikita W. Almond,
Stephen J. Kindness,
Sergey A. Mikhailov,
Wadood Tadbier,
Riccardo Degl'Innocenti,
Yuezhen Lu,
Abbie Lowe,
Ben Ramsay,
Lukas A. Jakob,
James Dann,
Stephan Hofmann,
Harvey E. Beere,
David A. Ritchie,
Wladislaw Michailow
Abstract:
Effective control of terahertz radiation requires the development of efficient and fast modulators with a large modulation depth. This challenge is often tackled by using metamaterials, artificial sub-wavelength optical structures engineered to resonate at the desired terahertz frequency. Metamaterial-based devices exploiting graphene as the active tuneable element have been proven to be a highly…
▽ More
Effective control of terahertz radiation requires the development of efficient and fast modulators with a large modulation depth. This challenge is often tackled by using metamaterials, artificial sub-wavelength optical structures engineered to resonate at the desired terahertz frequency. Metamaterial-based devices exploiting graphene as the active tuneable element have been proven to be a highly effective solution for THz modulation. However, whilst the graphene conductivity can be tuned over a wide range, it cannot be reduced to zero due to the gapless nature of graphene, which directly limits the maximum achievable modulation depth for single-layer metamaterial modulators. Here, we demonstrate two novel solutions to circumvent this restriction: Firstly, we excite the modulator from the back of the substrate, and secondly, we incorporate air gaps into the graphene patches. This results in a ground-breaking graphene-metal metamaterial terahertz modulator, operating at 2.0-2.5 THz, which demonstrates a 99.01 % amplitude and a 99.99 % intensity modulation depth at 2.15 THz, with a reconfiguration speed in excess of 3 MHz. Our results open up new frontiers in the area of terahertz technology.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
Trajectory Planning and Tracking of Hybrid Flying-Crawling Quadrotors
Authors:
Dongnan Hu,
Ruihao Xia,
Xin **,
Yang Tang
Abstract:
Hybrid Flying-Crawling Quadrotors (HyFCQs) are transformable robots with the ability of terrestrial and aerial hybrid motion. This article presents a trajectory planning and tracking framework designed for HyFCQs. In this framework, a terrestrial-aerial path-searching method with the crawling limitation of HyFCQs is proposed to guarantee the dynamical feasibility of trajectories. Additionally, a t…
▽ More
Hybrid Flying-Crawling Quadrotors (HyFCQs) are transformable robots with the ability of terrestrial and aerial hybrid motion. This article presents a trajectory planning and tracking framework designed for HyFCQs. In this framework, a terrestrial-aerial path-searching method with the crawling limitation of HyFCQs is proposed to guarantee the dynamical feasibility of trajectories. Additionally, a trajectory tracking method is proposed to address the challenges associated with the deformation time required by HyFCQs, which makes tracking hybrid trajectories at the junction between terrestrial and aerial segments difficult. Simulations and real-world experiments in diverse scenarios validate the exceptional performance of the proposed approach.
△ Less
Submitted 14 May, 2024; v1 submitted 14 December, 2023;
originally announced December 2023.
-
Motion Planning and Control of A Morphing Quadrotor in Restricted Scenarios
Authors:
Guiyang Cui,
Ruihao Xia,
Xin **,
Yang Tang
Abstract:
Morphing quadrotors with four external actuators can adapt to different restricted scenarios by changing their geometric structure. However, previous works mainly focus on the improvements in structures and controllers, and existing planning algorithms don't consider the morphological modifications, which leads to safety and dynamic feasibility issues. In this paper, we propose a unified planning…
▽ More
Morphing quadrotors with four external actuators can adapt to different restricted scenarios by changing their geometric structure. However, previous works mainly focus on the improvements in structures and controllers, and existing planning algorithms don't consider the morphological modifications, which leads to safety and dynamic feasibility issues. In this paper, we propose a unified planning and control framework for morphing quadrotors to deform autonomously and efficiently. The framework consists of a milliseconds-level spatial-temporal trajectory optimizer that takes into account the morphological modifications of quadrotors. The optimizer can generate full-body safety trajectories including position and attitude. Additionally, it incorporates a nonlinear attitude controller that accounts for aerodynamic drag and dynamically adjusts dynamic parameters such as the inertia tensor and Center of Gravity. The controller can also online compute the thrust coefficient during morphing. Benchmark experiments compared with existing methods validate the robustness of the proposed controller. Extensive simulations and real-world experiments are performed to demonstrate the effectiveness of the proposed framework.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Interface-Induced Superconductivity in Magnetic Topological Insulator-Iron Chalcogenide Heterostructures
Authors:
Hemian Yi,
Yi-Fan Zhao,
Ying-Ting Chan,
Jiaqi Cai,
Ruobing Mei,
Xianxin Wu,
Zi-Jie Yan,
Ling-Jie Zhou,
Ruoxi Zhang,
Zihao Wang,
Stephen Paolini,
Run Xiao,
Ke Wang,
Anthony R. Richardella,
John Singleton,
Laurel E. Winter,
Thomas Prokscha,
Zaher Salman,
Andreas Suter,
Purnima P. Balakrishnan,
Alexander J. Grutter,
Moses H. W. Chan,
Nitin Samarth,
Xiaodong Xu,
Weida Wu
, et al. (2 additional authors not shown)
Abstract:
When two different electronic materials are brought together, the resultant interface often shows unexpected quantum phenomena, including interfacial superconductivity and Fu-Kane topological superconductivity (TSC). Here, we use molecular beam epitaxy (MBE) to synthesize heterostructures formed by stacking together two magnetic materials, a ferromagnetic topological insulator (TI) and an antiferr…
▽ More
When two different electronic materials are brought together, the resultant interface often shows unexpected quantum phenomena, including interfacial superconductivity and Fu-Kane topological superconductivity (TSC). Here, we use molecular beam epitaxy (MBE) to synthesize heterostructures formed by stacking together two magnetic materials, a ferromagnetic topological insulator (TI) and an antiferromagnetic iron chalcogenide (FeTe). We discover emergent interface-induced superconductivity in these heterostructures and demonstrate the trifecta occurrence of superconductivity, ferromagnetism, and topological band structure in the magnetic TI layer, the three essential ingredients of chiral TSC. The unusual coexistence of ferromagnetism and superconductivity can be attributed to the high upper critical magnetic field that exceeds the Pauli paramagnetic limit for conventional superconductors at low temperatures. The magnetic TI/FeTe heterostructures with robust superconductivity and atomically sharp interfaces provide an ideal wafer-scale platform for the exploration of chiral TSC and Majorana physics, constituting an important step toward scalable topological quantum computation.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
The Extended Resonant Modal Theory and Its Applications
Authors:
Ruqi Xiao,
Wen Geyi,
Guo Yang,
Wen Wu
Abstract:
In this paper, we extend the resonant modal theory (RMT) developed previously for a metal object to an arbitrary source region consisting of metals, dielectrics, or the combination of both. The influences of dielectrics on the fields are replaced by equivalent volume sources through the use of the compensation theorem in electromagnetic theory. The resonant frequencies can be determined by finding…
▽ More
In this paper, we extend the resonant modal theory (RMT) developed previously for a metal object to an arbitrary source region consisting of metals, dielectrics, or the combination of both. The influences of dielectrics on the fields are replaced by equivalent volume sources through the use of the compensation theorem in electromagnetic theory. The resonant frequencies can be determined by finding the roots of the determinant of the matrix resulted from the discretization of the real homogeneous volume-surface integral equation derived from the requirement that the difference of stored field energies in the source region vanishes. As applications of the extended RMT, three examples have been investigated. The first example is a dielectric resonator antenna, and is designed by exciting the first resonant mode of the composite structure in which the dielectric cylinder is combined with a conformal metallic strip. The second example is a dual-band dielectric-coated metallic wire antenna. The third example studies the resonant modes of a rectangular patch antenna.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Reconsideration on evaluation of machine learning models in continuous monitoring using wearables
Authors:
Cheng Ding,
Zhicheng Guo,
Cynthia Rudin,
Ran Xiao,
Fadi B Nahab,
Xiao Hu
Abstract:
This paper explores the challenges in evaluating machine learning (ML) models for continuous health monitoring using wearable devices beyond conventional metrics. We state the complexities posed by real-world variability, disease dynamics, user-specific characteristics, and the prevalence of false notifications, necessitating novel evaluation strategies. Drawing insights from large-scale heart stu…
▽ More
This paper explores the challenges in evaluating machine learning (ML) models for continuous health monitoring using wearable devices beyond conventional metrics. We state the complexities posed by real-world variability, disease dynamics, user-specific characteristics, and the prevalence of false notifications, necessitating novel evaluation strategies. Drawing insights from large-scale heart studies, the paper offers a comprehensive guideline for robust ML model evaluation on continuous health monitoring.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
Audio Prompt Tuning for Universal Sound Separation
Authors:
Yuzhuo Liu,
Xubo Liu,
Yan Zhao,
Yuanyuan Wang,
Rui Xia,
**chuan Tain,
Yuxuan Wang
Abstract:
Universal sound separation (USS) is a task to separate arbitrary sounds from an audio mixture. Existing USS systems are capable of separating arbitrary sources, given a few examples of the target sources as queries. However, separating arbitrary sounds with a single system is challenging, and the robustness is not always guaranteed. In this work, we propose audio prompt tuning (APT), a simple yet…
▽ More
Universal sound separation (USS) is a task to separate arbitrary sounds from an audio mixture. Existing USS systems are capable of separating arbitrary sources, given a few examples of the target sources as queries. However, separating arbitrary sounds with a single system is challenging, and the robustness is not always guaranteed. In this work, we propose audio prompt tuning (APT), a simple yet effective approach to enhance existing USS systems. Specifically, APT improves the separation performance of specific sources through training a small number of prompt parameters with limited audio samples, while maintaining the generalization of the USS model by kee** its parameters frozen. We evaluate the proposed method on MUSDB18 and ESC-50 datasets. Compared with the baseline model, APT can improve the signal-to-distortion ratio performance by 0.67 dB and 2.06 dB using the full training set of two datasets. Moreover, APT with only 5 audio samples even outperforms the baseline systems utilizing full training data on the ESC-50 dataset, indicating the great potential of few-shot APT.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
FreeAL: Towards Human-Free Active Learning in the Era of Large Language Models
Authors:
Ruixuan Xiao,
Yiwen Dong,
Junbo Zhao,
Runze Wu,
Minmin Lin,
Gang Chen,
Haobo Wang
Abstract:
Collecting high-quality labeled data for model training is notoriously time-consuming and labor-intensive for various NLP tasks. While copious solutions, such as active learning for small language models (SLMs) and prevalent in-context learning in the era of large language models (LLMs), have been proposed and alleviate the labeling burden to some extent, their performances are still subject to hu…
▽ More
Collecting high-quality labeled data for model training is notoriously time-consuming and labor-intensive for various NLP tasks. While copious solutions, such as active learning for small language models (SLMs) and prevalent in-context learning in the era of large language models (LLMs), have been proposed and alleviate the labeling burden to some extent, their performances are still subject to human intervention. It is still underexplored how to reduce the annotation cost in the LLMs era. To bridge this, we revolutionize traditional active learning and propose an innovative collaborative learning framework FreeAL to interactively distill and filter the task-specific knowledge from LLMs. During collaborative training, an LLM serves as an active annotator inculcating its coarse-grained knowledge, while a downstream SLM is incurred as a student to filter out high-quality in-context samples to feedback LLM for the subsequent label refinery. Extensive experiments on eight benchmark datasets demonstrate that FreeAL largely enhances the zero-shot performances for both SLM and LLM without any human supervision. The code is available at https://github.com/Justherozen/FreeAL .
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
In-Context Learning for Knowledge Base Question Answering for Unmanned Systems based on Large Language Models
Authors:
Yunlong Chen,
Yaming Zhang,
Jianfei Yu,
Li Yang,
Rui Xia
Abstract:
Knowledge Base Question Answering (KBQA) aims to answer factoid questions based on knowledge bases. However, generating the most appropriate knowledge base query code based on Natural Language Questions (NLQ) poses a significant challenge in KBQA. In this work, we focus on the CCKS2023 Competition of Question Answering with Knowledge Graph Inference for Unmanned Systems. Inspired by the recent suc…
▽ More
Knowledge Base Question Answering (KBQA) aims to answer factoid questions based on knowledge bases. However, generating the most appropriate knowledge base query code based on Natural Language Questions (NLQ) poses a significant challenge in KBQA. In this work, we focus on the CCKS2023 Competition of Question Answering with Knowledge Graph Inference for Unmanned Systems. Inspired by the recent success of large language models (LLMs) like ChatGPT and GPT-3 in many QA tasks, we propose a ChatGPT-based Cypher Query Language (CQL) generation framework to generate the most appropriate CQL based on the given NLQ. Our generative framework contains six parts: an auxiliary model predicting the syntax-related information of CQL based on the given NLQ, a proper noun matcher extracting proper nouns from the given NLQ, a demonstration example selector retrieving similar examples of the input sample, a prompt constructor designing the input template of ChatGPT, a ChatGPT-based generation model generating the CQL, and an ensemble model to obtain the final answers from diversified outputs. With our ChatGPT-based CQL generation framework, we achieved the second place in the CCKS 2023 Question Answering with Knowledge Graph Inference for Unmanned Systems competition, achieving an F1-score of 0.92676.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
A Unified Framework for Rank-based Loss Minimization
Authors:
Rufeng Xiao,
Yuze Ge,
Rujun Jiang,
Yifan Yan
Abstract:
The empirical loss, commonly referred to as the average loss, is extensively utilized for training machine learning models. However, in order to address the diverse performance requirements of machine learning models, the use of the rank-based loss is prevalent, replacing the empirical loss in many cases. The rank-based loss comprises a weighted sum of sorted individual losses, encompassing both c…
▽ More
The empirical loss, commonly referred to as the average loss, is extensively utilized for training machine learning models. However, in order to address the diverse performance requirements of machine learning models, the use of the rank-based loss is prevalent, replacing the empirical loss in many cases. The rank-based loss comprises a weighted sum of sorted individual losses, encompassing both convex losses like the spectral risk, which includes the empirical risk and conditional value-at-risk, and nonconvex losses such as the human-aligned risk and the sum of the ranked range loss. In this paper, we introduce a unified framework for the optimization of the rank-based loss through the utilization of a proximal alternating direction method of multipliers. We demonstrate the convergence and convergence rate of the proposed algorithm under mild conditions. Experiments conducted on synthetic and real datasets illustrate the effectiveness and efficiency of the proposed algorithm.
△ Less
Submitted 3 January, 2024; v1 submitted 26 October, 2023;
originally announced October 2023.
-
Pointwise convergence of some continuous-time polynomial ergodic averages
Authors:
Wen Huang,
Song Shao,
Rongzhong Xiao
Abstract:
In this paper, we study the pointwise convergence of some continuous-time polynomial ergodic averages. Our method is based on the topological models of measurable flows. One of main results of the paper is as follow. Let $(X,\mathcal{X},μ, (T^{t})_{t\in \mathbb{R}})$ and $(X,\mathcal{X},μ, (S^{t})_{t\in \mathbb{R}})$ be two measurable flows, $a\in \mathbb{Q}$, and $Q\in \mathbb{R}[t]$ with…
▽ More
In this paper, we study the pointwise convergence of some continuous-time polynomial ergodic averages. Our method is based on the topological models of measurable flows. One of main results of the paper is as follow. Let $(X,\mathcal{X},μ, (T^{t})_{t\in \mathbb{R}})$ and $(X,\mathcal{X},μ, (S^{t})_{t\in \mathbb{R}})$ be two measurable flows, $a\in \mathbb{Q}$, and $Q\in \mathbb{R}[t]$ with $\text{deg}\ Q\ge 2$. Then for any $f_1, f_2, g\in L^{\infty}(μ)$, the limit \begin{equation*}
\lim\limits_{M\to\infty}\frac{1}{M}\int_{0}^{M}f_1(T^{t}x)f_2(T^{at}x)g(S^{Q(t)}x)dt \end{equation*} exists for $μ$-a.e. $x\in X$.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Photoplethysmography based atrial fibrillation detection: an updated review from July 2019
Authors:
Cheng Ding,
Ran Xiao,
Weijia Wang,
Elizabeth Holdsworth,
Xiao Hu
Abstract:
Atrial fibrillation (AF) is a prevalent cardiac arrhythmia associated with significant health ramifications, including an elevated susceptibility to ischemic stroke, heart disease, and heightened mortality. Photoplethysmography (PPG) has emerged as a promising technology for continuous AF monitoring for its cost-effectiveness and widespread integration into wearable devices. Our team previously co…
▽ More
Atrial fibrillation (AF) is a prevalent cardiac arrhythmia associated with significant health ramifications, including an elevated susceptibility to ischemic stroke, heart disease, and heightened mortality. Photoplethysmography (PPG) has emerged as a promising technology for continuous AF monitoring for its cost-effectiveness and widespread integration into wearable devices. Our team previously conducted an exhaustive review on PPG-based AF detection before June 2019. However, since then, more advanced technologies have emerged in this field. This paper offers a comprehensive review of the latest advancements in PPG-based AF detection, utilizing digital health and artificial intelligence (AI) solutions, within the timeframe spanning from July 2019 to December 2022. Through extensive exploration of scientific databases, we have identified 59 pertinent studies. Our comprehensive review encompasses an in-depth assessment of the statistical methodologies, traditional machine learning techniques, and deep learning approaches employed in these studies. In addition, we address the challenges encountered in the domain of PPG-based AF detection. Furthermore, we maintain a dedicated website to curate the latest research in this area, with regular updates on a regular basis.
△ Less
Submitted 21 October, 2023;
originally announced October 2023.