Search | arXiv e-print repository

Native approach to controlled-Z gates in inductively coupled fluxonium qubits

Authors: Xizheng Ma, Gengyan Zhang, Feng Wu, Feng Bao, Xu Chang, Jianjun Chen, Hao Deng, Ran Gao, Xun Gao, Lijuan Hu, Honghong Ji, Hsiang-Sheng Ku, Kannan Lu, Lu Ma, Liyong Mao, Zhijun Song, Hantao Sun, Chengchun Tang, Fei Wang, Hongcheng Wang, Tenghui Wang, Tian Xia, Make Ying, Huijuan Zhan, Tao Zhou , et al. (5 additional authors not shown)

Abstract: The fluxonium qubits have emerged as a promising platform for gate-based quantum information processing. However, their extraordinary protection against charge fluctuations comes at a cost: when coupled capacitively, the qubit-qubit interactions are restricted to XX-interactions. Consequently, effective XX- or XZ-interactions are only constructed either by temporarily populating higher-energy stat… ▽ More The fluxonium qubits have emerged as a promising platform for gate-based quantum information processing. However, their extraordinary protection against charge fluctuations comes at a cost: when coupled capacitively, the qubit-qubit interactions are restricted to XX-interactions. Consequently, effective XX- or XZ-interactions are only constructed either by temporarily populating higher-energy states, or by exploiting perturbative effects under microwave driving. Instead, we propose and demonstrate an inductive coupling scheme, which offers a wide selection of native qubit-qubit interactions for fluxonium. In particular, we leverage a built-in, flux-controlled ZZ-interaction to perform qubit entanglement. To combat the increased flux-noise-induced dephasing away from the flux-insensitive position, we use a continuous version of the dynamical decoupling scheme to perform noise filtering. Combining these, we demonstrate a 20 ns controlled-Z (CZ) gate with a mean fidelity of 99.53%. More than confirming the efficacy of our gate scheme, this high-fidelity result also reveals a promising but rarely explored parameter space uniquely suitable for gate operations between fluxonium qubits. △ Less

Submitted 30 August, 2023; originally announced August 2023.

arXiv:2308.14507 [pdf, other]

Spectral Estimators for Structured Generalized Linear Models via Approximate Message Passing

Authors: Yihan Zhang, Hong Chang Ji, Ramji Venkataramanan, Marco Mondelli

Abstract: We consider the problem of parameter estimation in a high-dimensional generalized linear model. Spectral methods obtained via the principal eigenvector of a suitable data-dependent matrix provide a simple yet surprisingly effective solution. However, despite their wide use, a rigorous performance characterization, as well as a principled way to preprocess the data, are available only for unstructu… ▽ More We consider the problem of parameter estimation in a high-dimensional generalized linear model. Spectral methods obtained via the principal eigenvector of a suitable data-dependent matrix provide a simple yet surprisingly effective solution. However, despite their wide use, a rigorous performance characterization, as well as a principled way to preprocess the data, are available only for unstructured (i.i.d.\ Gaussian and Haar orthogonal) designs. In contrast, real-world data matrices are highly structured and exhibit non-trivial correlations. To address the problem, we consider correlated Gaussian designs capturing the anisotropic nature of the features via a covariance matrix $Σ$. Our main result is a precise asymptotic characterization of the performance of spectral estimators. This allows us to identify the optimal preprocessing that minimizes the number of samples needed for parameter estimation. Surprisingly, such preprocessing is universal across a broad set of designs, which partly addresses a conjecture on optimal spectral estimators for rotationally invariant models. Our principled approach vastly improves upon previous heuristic methods, including for designs common in computational imaging and genetics. The proposed methodology, based on approximate message passing, is broadly applicable and opens the way to the precise characterization of spiked matrices and of the corresponding spectral methods in a variety of settings. △ Less

Submitted 3 July, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

arXiv:2308.12537 [pdf, other]

HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasks

Authors: Zichao Dong, Weikun Zhang, Xufeng Huang, Hang Ji, Xin Zhan, Junbo Chen

Abstract: Human robot interaction is an exciting task, which aimed to guide robots following instructions from human. Since huge gap lies between human natural language and machine codes, end to end human robot interaction models is fair challenging. Further, visual information receiving from sensors of robot is also a hard language for robot to perceive. In this work, HuBo-VLM is proposed to tackle percept… ▽ More Human robot interaction is an exciting task, which aimed to guide robots following instructions from human. Since huge gap lies between human natural language and machine codes, end to end human robot interaction models is fair challenging. Further, visual information receiving from sensors of robot is also a hard language for robot to perceive. In this work, HuBo-VLM is proposed to tackle perception tasks associated with human robot interaction including object detection and visual grounding by a unified transformer based vision language model. Extensive experiments on the Talk2Car benchmark demonstrate the effectiveness of our approach. Code would be publicly available in https://github.com/dzcgaara/HuBo-VLM. △ Less

Submitted 23 August, 2023; originally announced August 2023.

arXiv:2308.11768 [pdf]

doi 10.1007/s44214-023-00035-z

Ferromagnetic and insulating behavior in both half magnetic levitation and non-levitation LK-99 like samples

Authors: Pinyuan Wang, Xiaoqi Liu, Jun Ge, Chengcheng Ji, Haoran Ji, Yanzhao Liu, Yiwen Ai, Gaoxing Ma, Shichao Qi, Jian Wang

Abstract: Finding materials exhibiting superconductivity at room temperature has long been one of the ultimate goals in physics and material science. Recently, room-temperature superconducting properties have been claimed in a copper substituted lead phosphate apatite (Pb$_{10-x}$Cu$_x$(PO$_4$)$_6$O, or called LK-99) [1-3]. Using a similar approach, we have prepared LK-99 like samples and confirmed the half… ▽ More Finding materials exhibiting superconductivity at room temperature has long been one of the ultimate goals in physics and material science. Recently, room-temperature superconducting properties have been claimed in a copper substituted lead phosphate apatite (Pb$_{10-x}$Cu$_x$(PO$_4$)$_6$O, or called LK-99) [1-3]. Using a similar approach, we have prepared LK-99 like samples and confirmed the half-levitation behaviors in some small specimens under the influence of a magnet at room temperature. To examine the magnetic properties of our samples, we have performed systematic magnetization measurements on the as-grown LK-99-like samples, including the half-levitated and non-levitated samples. The magnetization measurements show the coexistence of soft-ferromagnetic and diamagnetic signals in both half-levitated and non-levitated samples. The electrical transport measurements on the as-grown LK-99-like samples including both half-levitated and non-levitated samples show an insulating behavior characterized by the increasing resistivity with the decreasing temperature. △ Less

Submitted 28 August, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

Journal ref: Quantum Front 2, 10 (2023)

arXiv:2308.10705 [pdf, other]

Unsupervised 3D Pose Estimation with Non-Rigid Structure-from-Motion Modeling

Authors: Haorui Ji, Hui Deng, Yuchao Dai, Hongdong Li

Abstract: Most of the previous 3D human pose estimation work relied on the powerful memory capability of the network to obtain suitable 2D-3D map**s from the training data. Few works have studied the modeling of human posture deformation in motion. In this paper, we propose a new modeling method for human pose deformations and design an accompanying diffusion-based motion prior. Inspired by the field of n… ▽ More Most of the previous 3D human pose estimation work relied on the powerful memory capability of the network to obtain suitable 2D-3D map**s from the training data. Few works have studied the modeling of human posture deformation in motion. In this paper, we propose a new modeling method for human pose deformations and design an accompanying diffusion-based motion prior. Inspired by the field of non-rigid structure-from-motion, we divide the task of reconstructing 3D human skeletons in motion into the estimation of a 3D reference skeleton, and a frame-by-frame skeleton deformation. A mixed spatial-temporal NRSfMformer is used to simultaneously estimate the 3D reference skeleton and the skeleton deformation of each frame from 2D observations sequence, and then sum them to obtain the pose of each frame. Subsequently, a loss term based on the diffusion model is used to ensure that the pipeline learns the correct prior motion knowledge. Finally, we have evaluated our proposed method on mainstream datasets and obtained superior results outperforming the state-of-the-art. △ Less

Submitted 18 August, 2023; originally announced August 2023.

arXiv:2308.01227 [pdf, other]

Towards Integrated Sensing and Communications for 6G: A Standardization Perspective

Authors: Aryan Kaushik, Rohit Singh, Shalanika Dayarathna, Rajitha Senanayake, Marco Di Renzo, Miguel Dajer, Hyoungju Ji, Younsun Kim, Vincenzo Sciancalepore, Alessio Zappone, Wonjae Shin

Abstract: The radio communication division of the International Telecommunication Union (ITU-R) has recently adopted Integrated Sensing and Communication (ISAC) among the key usage scenarios for IMT-2030/6G. ISAC is envisioned to play a vital role in the upcoming wireless generation standards. In this work, we bring together several paramount and innovative aspects of ISAC technology from a global 6G standa… ▽ More The radio communication division of the International Telecommunication Union (ITU-R) has recently adopted Integrated Sensing and Communication (ISAC) among the key usage scenarios for IMT-2030/6G. ISAC is envisioned to play a vital role in the upcoming wireless generation standards. In this work, we bring together several paramount and innovative aspects of ISAC technology from a global 6G standardization perspective, including both industrial and academic progress. Specifically, this article provides 6G requirements and ISAC-enabled vision, including various aspects of 6G standardization, benefits of ISAC co-existence, and integration challenges. Moreover, we present key enabling technologies, including intelligent metasurface-aided ISAC, as well as Orthogonal Time Frequency Space (OTFS) waveform design and interference management for ISAC. Finally, future aspects are discussed to open various research opportunities and challenges on the ISAC technology towards 6G wireless communications. △ Less

Submitted 2 August, 2023; originally announced August 2023.

Comments: 7 pages, 5 figures

arXiv:2308.00910 [pdf, ps, other]

A Mini Immersed Finite Element Method for Two-Phase Stokes Problems on Cartesian Meshes

Authors: Haifeng Ji, Dong Liang, Qian Zhang

Abstract: This paper presents a mini immersed finite element (IFE) method for solving two- and three-dimensional two-phase Stokes problems on Cartesian meshes. The IFE space is constructed from the conventional mini element with shape functions modified on interface elements according to interface jump conditions, while kee** the degrees of freedom unchanged. Both discontinuous viscosity coefficients and… ▽ More This paper presents a mini immersed finite element (IFE) method for solving two- and three-dimensional two-phase Stokes problems on Cartesian meshes. The IFE space is constructed from the conventional mini element with shape functions modified on interface elements according to interface jump conditions, while kee** the degrees of freedom unchanged. Both discontinuous viscosity coefficients and surface forces are considered in the construction. The interface is approximated via discrete level set functions and explicit formulas of IFE basis functions and correction functions are derived, which make the IFE method easy to implement. The optimal approximation capabilities of the IFE space and the inf-sup stability and the optimal a priori error estimate of the IFE method are derived rigorously with constants independent of the mesh size and how the interface cuts the mesh. It is also proved that the condition number has the usual bound independent of the interface. Numerical experiments are provided to confirm the theoretical results. △ Less

Submitted 1 August, 2023; originally announced August 2023.

arXiv:2307.11694 [pdf, other]

SynerGPT: In-Context Learning for Personalized Drug Synergy Prediction and Drug Design

Authors: Carl Edwards, Aakanksha Naik, Tushar Khot, Martin Burke, Heng Ji, Tom Hope

Abstract: Predicting synergistic drug combinations can help accelerate discovery of cancer treatments, particularly therapies personalized to a patient's specific tumor via biopsied cells. In this paper, we propose a novel setting and models for in-context drug synergy learning. We are given a small "personalized dataset" of 10-20 drug synergy relationships in the context of specific cancer cell targets. Ou… ▽ More Predicting synergistic drug combinations can help accelerate discovery of cancer treatments, particularly therapies personalized to a patient's specific tumor via biopsied cells. In this paper, we propose a novel setting and models for in-context drug synergy learning. We are given a small "personalized dataset" of 10-20 drug synergy relationships in the context of specific cancer cell targets. Our goal is to predict additional drug synergy relationships in that context. Inspired by recent work that pre-trains a GPT language model (LM) to "in-context learn" common function classes, we devise novel pre-training schemes that enable a GPT model to in-context learn "drug synergy functions". Our model -- which does not use any textual corpora, molecular fingerprints, protein interaction or any other domain-specific knowledge -- is able to achieve competitive results. We further integrate our in-context approach with a genetic algorithm to optimize model prompts and select synergy candidates to test after conducting a patient biopsy. Finally, we explore a novel task of inverse drug design which can potentially enable the design of drugs that synergize specifically to target a given patient's "personalized dataset". Our findings can potentially have an important impact on precision cancer medicine, and also raise intriguing questions on non-textual pre-training for LMs. △ Less

Submitted 24 October, 2023; v1 submitted 19 June, 2023; originally announced July 2023.

arXiv:2307.11316 [pdf, other]

Making Pre-trained Language Models both Task-solvers and Self-calibrators

Authors: Yangyi Chen, Xingyao Wang, Heng Ji

Abstract: Pre-trained language models (PLMs) serve as backbones for various real-world systems. For high-stake applications, it's equally essential to have reasonable confidence estimations in predictions. While the vanilla confidence scores of PLMs can already be effectively utilized, PLMs consistently become overconfident in their wrong predictions, which is not desirable in practice. Previous work shows… ▽ More Pre-trained language models (PLMs) serve as backbones for various real-world systems. For high-stake applications, it's equally essential to have reasonable confidence estimations in predictions. While the vanilla confidence scores of PLMs can already be effectively utilized, PLMs consistently become overconfident in their wrong predictions, which is not desirable in practice. Previous work shows that introducing an extra calibration task can mitigate this issue. The basic idea involves acquiring additional data to train models in predicting the confidence of their initial predictions. However, it only demonstrates the feasibility of this kind of method, assuming that there are abundant extra available samples for the introduced calibration task. In this work, we consider the practical scenario that we need to effectively utilize training samples to make PLMs both task-solvers and self-calibrators. Three challenges are presented, including limited training samples, data imbalance, and distribution shifts. We first conduct pilot experiments to quantify various decisive factors in the calibration task. Based on the empirical analysis results, we propose a training algorithm LM-TOAST to tackle the challenges. Experimental results show that LM-TOAST can effectively utilize the training data to make PLMs have reasonable confidence estimations while maintaining the original task performance. Further, we consider three downstream applications, namely selective classification, adversarial defense, and model cascading, to show the practical usefulness of LM-TOAST. The code will be made public at \url{https://github.com/Yangyi-Chen/LM-TOAST}. △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: Accepted to Findings of ACL 2023

arXiv:2307.08626 [pdf, other]

Density of Brown measure of free circular Brownian motion

Authors: László Erdős, Hong Chang Ji

Abstract: We consider the Brown measure of free circular Brownian motion $\boldsymbol{a}+\sqrt{t}\boldsymbol{x}$, where $\boldsymbol{a}$ is a general non-normal operator and $\boldsymbol{x}$ is a circular element $*$-free from $\boldsymbol{a}$. We prove that, under a mild assumption on $\boldsymbol{a}$, the density of the Brown measure has one of the following two types of behavior around each point on the… ▽ More We consider the Brown measure of free circular Brownian motion $\boldsymbol{a}+\sqrt{t}\boldsymbol{x}$, where $\boldsymbol{a}$ is a general non-normal operator and $\boldsymbol{x}$ is a circular element $*$-free from $\boldsymbol{a}$. We prove that, under a mild assumption on $\boldsymbol{a}$, the density of the Brown measure has one of the following two types of behavior around each point on the boundary of its support -- either (i) sharp cut, i.e. a jump discontinuity along the boundary, or (ii) quadratic decay at certain critical points on the boundary. Our result is in direct analogy with the previously known phenomenon for the spectral density of free semicircular Brownian motion, whose singularities are either a square-root edge or a cubic cusp. We also provide several examples and counterexamples, one of which shows that our assumption on $\boldsymbol{a}$ is necessary. △ Less

Submitted 17 July, 2023; originally announced July 2023.

Comments: 26 pages, 4 figures

MSC Class: 46L54; 60B20

arXiv:2307.08423 [pdf, other]

Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

Authors: Xuan Zhang, Limei Wang, Jacob Helwig, Youzhi Luo, Cong Fu, Yaochen Xie, Meng Liu, Yuchao Lin, Zhao Xu, Keqiang Yan, Keir Adams, Maurice Weiler, Xiner Li, Tianfan Fu, Yucheng Wang, Haiyang Yu, YuQing Xie, Xiang Fu, Alex Strasser, Shenglong Xu, Yi Liu, Yuanqi Du, Alexandra Saxton, Hongyi Ling, Hannah Lawrence , et al. (38 additional authors not shown)

Abstract: Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Sc… ▽ More Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This work aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science. △ Less

Submitted 15 November, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

arXiv:2307.07109 [pdf, other]

Laboratory Study of Collisionless Magnetic Reconnection

Authors: H. Ji, J. Yoo, W. Fox, M. Yamada, M. Argall, J. Egedal, Y. -H. Liu, R. Wilder, S. Eriksson, W. Daughton, K. Bergstedt, S. Bose, J. Burch, R. Torbert, J. Ng, L. -J. Chen

Abstract: A concise review is given on the past two decades' results from laboratory experiments on collisionless magnetic reconnection in direct relation with space measurements, especially by Magnetospheric Multiscale (MMS) mission. Highlights include spatial structures of electromagnetic fields in ion and electron diffusion regions as a function of upstream symmetry and guide field strength; energy conve… ▽ More A concise review is given on the past two decades' results from laboratory experiments on collisionless magnetic reconnection in direct relation with space measurements, especially by Magnetospheric Multiscale (MMS) mission. Highlights include spatial structures of electromagnetic fields in ion and electron diffusion regions as a function of upstream symmetry and guide field strength; energy conversion and partition from magnetic field to ions and electrons including particle acceleration; electrostatic and electromagnetic kinetic plasma waves with various wavelengths; and plasmoid-mediated multiscale reconnection. Combined with the progress in theoretical, numerical, and observational studies, the physics foundation of fast reconnection in colisionless plasmas has been largely established, at least within the parameter ranges and spatial scales that were studied. Immediate and long-term future opportunities based on multiscale experiments and space missions supported by exascale computation are discussed, including dissipation by kinetic plasma waves, particle heating and acceleration, and multiscale physics across fluid and kinetic scales. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: 40 pages, 15 figures

Journal ref: ISSI book titled "Magnetic Reconnection: Explosive Energy Conversion in Space Plasmas" (2023)

arXiv:2307.05873 [pdf, other]

OG: Equip vision occupancy with instance segmentation and visual grounding

Authors: Zichao Dong, Hang Ji, Weikun Zhang, Xufeng Huang, Junbo Chen

Abstract: Occupancy prediction tasks focus on the inference of both geometry and semantic labels for each voxel, which is an important perception mission. However, it is still a semantic segmentation task without distinguishing various instances. Further, although some existing works, such as Open-Vocabulary Occupancy (OVO), have already solved the problem of open vocabulary detection, visual grounding in o… ▽ More Occupancy prediction tasks focus on the inference of both geometry and semantic labels for each voxel, which is an important perception mission. However, it is still a semantic segmentation task without distinguishing various instances. Further, although some existing works, such as Open-Vocabulary Occupancy (OVO), have already solved the problem of open vocabulary detection, visual grounding in occupancy has not been solved to the best of our knowledge. To tackle the above two limitations, this paper proposes Occupancy Grounding (OG), a novel method that equips vanilla occupancy instance segmentation ability and could operate visual grounding in a voxel manner with the help of grounded-SAM. Keys to our approach are (1) affinity field prediction for instance clustering and (2) association strategy for aligning 2D instance masks and 3D occupancy instances. Extensive experiments have been conducted whose visualization results and analysis are shown below. Our code will be publicly released soon. △ Less

Submitted 11 July, 2023; originally announced July 2023.

arXiv:2307.05300 [pdf, other]

Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration

Authors: Zhenhailong Wang, Shaoguang Mao, Wenshan Wu, Tao Ge, Furu Wei, Heng Ji

Abstract: Human intelligence thrives on cognitive synergy, where collaboration among different minds yield superior outcomes compared to isolated individuals. In this work, we propose Solo Performance Prompting (SPP), which transforms a single LLM into a cognitive synergist by engaging in multi-turn self-collaboration with multiple personas. A cognitive synergist is an intelligent agent that collaboratively… ▽ More Human intelligence thrives on cognitive synergy, where collaboration among different minds yield superior outcomes compared to isolated individuals. In this work, we propose Solo Performance Prompting (SPP), which transforms a single LLM into a cognitive synergist by engaging in multi-turn self-collaboration with multiple personas. A cognitive synergist is an intelligent agent that collaboratively combines multiple minds' strengths and knowledge to enhance problem-solving in complex tasks. By dynamically identifying and simulating different personas based on task inputs, SPP unleashes the potential of cognitive synergy in LLMs. Our in-depth analysis shows that assigning multiple fine-grained personas in LLMs improves problem-solving abilities compared to using a single or fixed number of personas. We evaluate SPP on three challenging tasks: Trivia Creative Writing, Codenames Collaborative, and Logic Grid Puzzle, encompassing both knowledge-intensive and reasoning-intensive types. Unlike previous works, such as Chain-of-Thought, that solely enhance the reasoning abilities in LLMs, experimental results demonstrate that SPP effectively reduces factual hallucination, and maintains strong reasoning capabilities. Additionally, comparative experiments show that cognitive synergy only emerges in GPT-4 and does not appear in less capable models, such as GPT-3.5-turbo and Llama2-13b-chat, which draws an interesting analogy to human development. Code, data, and prompts can be found at: https://github.com/MikeWangWZHL/Solo-Performance-Prompting.git. △ Less

Submitted 26 March, 2024; v1 submitted 11 July, 2023; originally announced July 2023.

Comments: Accepted as a main conference paper at NAACL 2024

arXiv:2307.01972 [pdf, other]

Open-Domain Hierarchical Event Schema Induction by Incremental Prompting and Verification

Authors: Sha Li, Ruining Zhao, Manling Li, Heng Ji, Chris Callison-Burch, Jiawei Han

Abstract: Event schemas are a form of world knowledge about the typical progression of events. Recent methods for event schema induction use information extraction systems to construct a large number of event graph instances from documents, and then learn to generalize the schema from such instances. In contrast, we propose to treat event schemas as a form of commonsense knowledge that can be derived from l… ▽ More Event schemas are a form of world knowledge about the typical progression of events. Recent methods for event schema induction use information extraction systems to construct a large number of event graph instances from documents, and then learn to generalize the schema from such instances. In contrast, we propose to treat event schemas as a form of commonsense knowledge that can be derived from large language models (LLMs). This new paradigm greatly simplifies the schema induction process and allows us to handle both hierarchical relations and temporal relations between events in a straightforward way. Since event schemas have complex graph structures, we design an incremental prompting and verification method to break down the construction of a complex event graph into three stages: event skeleton construction, event expansion, and event-event relation verification. Compared to directly using LLMs to generate a linearized graph, our method can generate large and complex schemas with 7.2% F1 improvement in temporal relations and 31.0% F1 improvement in hierarchical relations. In addition, compared to the previous state-of-the-art closed-domain schema induction model, human assessors were able to cover $\sim$10% more events when translating the schemas into coherent stories and rated our schemas 1.3 points higher (on a 5-point scale) in terms of readability. △ Less

Submitted 4 July, 2023; originally announced July 2023.

Comments: Accepted to ACL 2023. 19 pages with appendix

arXiv:2306.16602 [pdf, other]

An electro-hydrodynamics modeling of droplet actuation on solid surface by surfactant-mediated electro-dewetting

Authors: Weiqi Chu, Hangjie Ji, Qining Wang, Chang-** "CJ'' Kim, Andrea L. Bertozzi

Abstract: We propose an electro-hydrodynamics model to describe the dynamic evolution of a slender drop containing a dilute ionic surfactant on a naturally wettable surface, with a varying external electric field. This unified model reproduces fundamental microfluidic operations controlled by electrical signals, including dewetting, rewetting, and droplet shifting. In this paper, lubrication theory analysis… ▽ More We propose an electro-hydrodynamics model to describe the dynamic evolution of a slender drop containing a dilute ionic surfactant on a naturally wettable surface, with a varying external electric field. This unified model reproduces fundamental microfluidic operations controlled by electrical signals, including dewetting, rewetting, and droplet shifting. In this paper, lubrication theory analysis and numerical simulations illustrate how to electrically control the wettability of surface via the charged surfactant. Our numerical results show that electric field promotes dewetting by attracting ionic surfactants onto the transition thin-film region and promotes rewetting by attracting them away from the region. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: 16 pages, 13 figures

arXiv:2306.15245 [pdf, other]

C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue Evaluation

Authors: Liliang Ren, Mankeerat Sidhu, Qi Zeng, Revanth Gangi Reddy, Heng Ji, ChengXiang Zhai

Abstract: Existing reference-free turn-level evaluation metrics for chatbots inadequately capture the interaction between the user and the system. Consequently, they often correlate poorly with human evaluations. To address this issue, we propose a novel model-agnostic approach that leverages Conditional Pointwise Mutual Information (C-PMI) to measure the turn-level interaction between the system and the us… ▽ More Existing reference-free turn-level evaluation metrics for chatbots inadequately capture the interaction between the user and the system. Consequently, they often correlate poorly with human evaluations. To address this issue, we propose a novel model-agnostic approach that leverages Conditional Pointwise Mutual Information (C-PMI) to measure the turn-level interaction between the system and the user based on a given evaluation dimension. Experimental results on the widely used FED dialogue evaluation dataset demonstrate that our approach significantly improves the correlation with human judgment compared with existing evaluation systems. By replacing the negative log-likelihood-based scorer with our proposed C-PMI scorer, we achieve a relative 62.6% higher Spearman correlation on average for the FED evaluation metric. Our code is publicly available at https://github.com/renll/C-PMI. △ Less

Submitted 1 September, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

Comments: Published at ACL2023 DialDoc Workshop; Updated Results

arXiv:2306.10599 [pdf, ps, other]

An Empirical Study of Untangling Patterns of Two-Class Dependency Cycles

Authors: Qiong Feng, Shuwen Liu, Huan Ji, Xiaotian Ma, Peng Liang

Abstract: Dependency cycles pose a significant challenge to software quality and maintainability. However, there is limited understanding of how practitioners resolve dependency cycles in real-world scenarios. This paper presents an empirical study investigating the recurring patterns employed by software developers to resolve dependency cycles between two classes in practice. We analyzed the data from 38 o… ▽ More Dependency cycles pose a significant challenge to software quality and maintainability. However, there is limited understanding of how practitioners resolve dependency cycles in real-world scenarios. This paper presents an empirical study investigating the recurring patterns employed by software developers to resolve dependency cycles between two classes in practice. We analyzed the data from 38 open-source projects across different domains and manually inspected hundreds of cycle untangling cases. Our findings reveal that developers tend to employ five recurring patterns to address dependency cycles. The chosen patterns are not only determined by dependency relations between cyclic classes, but also highly related to their design context, i.e., how cyclic classes depend on or are depended by their neighbor classes. Through this empirical study, we also discovered three common counterintuitive solutions developers usually adopted during cycles' handling. These recurring patterns and common counterintuitive solutions observed in dependency cycles' practice can serve as a taxonomy to improve developers' awareness and also be used as learning materials for students in software engineering and inexperienced developers. Our results also suggest that, in addition to considering the internal structure of dependency cycles, automatic tools need to consider the design context of cycles to provide better support for refactoring dependency cycles. △ Less

Submitted 17 December, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

Comments: Preprint accepted for publication in Empirical Software Engineering, 2023

arXiv:2306.04909 [pdf]

doi 10.3847/25c2cfeb.c1b1eb07

Particle acceleration in solar flares with imaging-spectroscopy in soft X-rays

Authors: Mitsuo Oka, Amir Caspi, Bin Chen, Mark Cheung, James Drake, Dale Gary, Lindsay Glesener, Fan Guo, Hantao Ji, Xiaocan Li, Takuma Nakamura, Noriyuki Narukage, Katharine Reeves, Pascal Saint-Hilaire, Taro Sakao, Chengcai Shen, Amy Winebarger, Tom Woods

Abstract: Particles are accelerated to very high, non-thermal energies during explosive energy-release phenomena in space, solar, and astrophysical plasma environments. In the case of solar flares, it has been established that magnetic reconnection plays an important role for releasing the magnetic energy, but it remains unclear if or how magnetic reconnection can further explain particle acceleration durin… ▽ More Particles are accelerated to very high, non-thermal energies during explosive energy-release phenomena in space, solar, and astrophysical plasma environments. In the case of solar flares, it has been established that magnetic reconnection plays an important role for releasing the magnetic energy, but it remains unclear if or how magnetic reconnection can further explain particle acceleration during flares. Here we argue that the key issue is the lack of understanding of the precise context of particle acceleration but it can be overcome, in the near future, by performing imaging-spectroscopy in soft X-rays (SXRs). Such observations should be complemented by observations in other wavelengths such as extreme-ultraviolets (EUVs), microwaves, hard X-rays (HXRs), and gamma-rays. Also, numerical simulations will be crucial for further narrowing down the particle acceleration mechanism in the context revealed by the observations. Of all these efforts, imaging-spectroscopy in SXRs, if successfully applied to large limb flares, will be a milestone in our challenge of understanding electron acceleration in solar flares and beyond, i.e. the Plasma Universe. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: White paper submitted to the Decadal Survey for Solar and Space Physics (Heliophysics) 2024-2033; 10 pages, 2 figures

Journal ref: Bulletin of the AAS, Vol. 55, Issue 3, Whitepaper #302 (10pp); 2023 July 31

arXiv:2306.04618 [pdf, other]

Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations

Authors: Lifan Yuan, Yangyi Chen, Ganqu Cui, Hongcheng Gao, Fangyuan Zou, Xingyi Cheng, Heng Ji, Zhiyuan Liu, Maosong Sun

Abstract: This paper reexamines the research on out-of-distribution (OOD) robustness in the field of NLP. We find that the distribution shift settings in previous studies commonly lack adequate challenges, hindering the accurate evaluation of OOD robustness. To address these issues, we propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts. Then we i… ▽ More This paper reexamines the research on out-of-distribution (OOD) robustness in the field of NLP. We find that the distribution shift settings in previous studies commonly lack adequate challenges, hindering the accurate evaluation of OOD robustness. To address these issues, we propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts. Then we introduce BOSS, a Benchmark suite for Out-of-distribution robustneSS evaluation covering 5 tasks and 20 datasets. Based on BOSS, we conduct a series of experiments on pre-trained language models for analysis and evaluation of OOD robustness. First, for vanilla fine-tuning, we examine the relationship between in-distribution (ID) and OOD performance. We identify three typical types that unveil the inner learning mechanism, which could potentially facilitate the forecasting of OOD robustness, correlating with the advancements on ID datasets. Then, we evaluate 5 classic methods on BOSS and find that, despite exhibiting some effectiveness in specific cases, they do not offer significant improvement compared to vanilla fine-tuning. Further, we evaluate 5 LLMs with various adaptation paradigms and find that when sufficient ID data is available, fine-tuning domain-specific models outperform LLMs on ID examples significantly. However, in the case of OOD instances, prioritizing LLMs with in-context learning yields better results. We identify that both fine-tuned small models and LLMs face challenges in effectively addressing downstream tasks. The code is public at \url{https://github.com/lifan-yuan/OOD_NLP}. △ Less

Submitted 26 October, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: Accepted to NeurIPS 2023 Dataset and Benchmark Track. Code is available at \url{https://github.com/lifan-yuan/OOD_NLP}

arXiv:2306.00887 [pdf, other]

OpenPI-C: A Better Benchmark and Stronger Baseline for Open-Vocabulary State Tracking

Authors: Xueqing Wu, Sha Li, Heng Ji

Abstract: Open-vocabulary state tracking is a more practical version of state tracking that aims to track state changes of entities throughout a process without restricting the state space and entity space. OpenPI is to date the only dataset annotated for open-vocabulary state tracking. However, we identify issues with the dataset quality and evaluation metric. For the dataset, we categorize 3 types of prob… ▽ More Open-vocabulary state tracking is a more practical version of state tracking that aims to track state changes of entities throughout a process without restricting the state space and entity space. OpenPI is to date the only dataset annotated for open-vocabulary state tracking. However, we identify issues with the dataset quality and evaluation metric. For the dataset, we categorize 3 types of problems on the procedure level, step level and state change level respectively, and build a clean dataset OpenPI-C using multiple rounds of human judgment. For the evaluation metric, we propose a cluster-based metric to fix the original metric's preference for repetition. Model-wise, we enhance the seq2seq generation baseline by reinstating two key properties for state tracking: temporal dependency and entity awareness. The state of the world after an action is inherently dependent on the previous state. We model this dependency through a dynamic memory bank and allow the model to attend to the memory slots during decoding. On the other hand, the state of the world is naturally a union of the states of involved entities. Since the entities are unknown in the open-vocabulary setting, we propose a two-stage model that refines the state change prediction conditioned on entities predicted from the first stage. Empirical results show the effectiveness of our proposed model especially on the cluster-based metric. The code and data are released at https://github.com/shirley-wu/openpi-c △ Less

Submitted 20 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: ACL 2023 findings (fix typo)

arXiv:2305.18641 [pdf, other]

Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs

Authors: Mingyang Zhou, Yi R. Fung, Long Chen, Christopher Thomas, Heng Ji, Shih-Fu Chang

Abstract: Building cross-model intelligence that can understand charts and communicate the salient information hidden behind them is an appealing challenge in the vision and language(V+L) community. The capability to uncover the underlined table data of chart figures is a critical key to automatic chart understanding. We introduce ChartT5, a V+L model that learns how to interpret table information from char… ▽ More Building cross-model intelligence that can understand charts and communicate the salient information hidden behind them is an appealing challenge in the vision and language(V+L) community. The capability to uncover the underlined table data of chart figures is a critical key to automatic chart understanding. We introduce ChartT5, a V+L model that learns how to interpret table information from chart images via cross-modal pre-training on plot table pairs. Specifically, we propose two novel pre-training objectives: Masked Header Prediction (MHP) and Masked Value Prediction (MVP) to facilitate the model with different skills to interpret the table information. We have conducted extensive experiments on chart question answering and chart summarization to verify the effectiveness of the proposed pre-training strategies. In particular, on the ChartQA benchmark, our ChartT5 outperforms the state-of-the-art non-pretraining methods by over 8% performance gains. △ Less

Submitted 29 May, 2023; originally announced May 2023.

Comments: Accepted by Findings of ACL 2023

arXiv:2305.18582 [pdf, other]

Information Association for Language Model Updating by Mitigating LM-Logical Discrepancy

Authors: Pengfei Yu, Heng Ji

Abstract: Large Language Models~(LLMs) struggle with providing current information due to the outdated pre-training data. Existing methods for updating LLMs, such as knowledge editing and continual fine-tuning, have significant drawbacks in generalizability of new information and the requirements on structured updating corpus. We identify the core challenge behind these drawbacks: the LM-logical discrepancy… ▽ More Large Language Models~(LLMs) struggle with providing current information due to the outdated pre-training data. Existing methods for updating LLMs, such as knowledge editing and continual fine-tuning, have significant drawbacks in generalizability of new information and the requirements on structured updating corpus. We identify the core challenge behind these drawbacks: the LM-logical discrepancy featuring the difference between language modeling probabilities and logical probabilities. To evaluate and address the core challenge, we propose a new task formulation of the information updating task that only requires the provision of an unstructured updating corpus and evaluates the performance of information updating on the generalizability to question-answer pairs pertaining to the updating information. We further propose a novel and effective pipeline approach for the task, highlighting a self-prompting-based question-answer generation process and a associative distillation methods to bridge the LM-logical discrepancy. We develop two datasets for evaluation, one sourced from news articles published in March and April 2023, and the other from the Natural Questions benchmark. Experimental results demonstrate the superiority of our approach, significantly increasing the factual consistency score (on a scale from 0 to 1) by up to 0.16. Furthermore, our method effectively mitigates forgetting utilizing a compact replay buffer with only 2.3% of the training tokens. △ Less

Submitted 9 February, 2024; v1 submitted 29 May, 2023; originally announced May 2023.

arXiv:2305.18503 [pdf, other]

From Adversarial Arms Race to Model-centric Evaluation: Motivating a Unified Automatic Robustness Evaluation Framework

Authors: Yangyi Chen, Hongcheng Gao, Ganqu Cui, Lifan Yuan, Dehan Kong, Hanlu Wu, Ning Shi, Bo Yuan, Longtao Huang, Hui Xue, Zhiyuan Liu, Maosong Sun, Heng Ji

Abstract: Textual adversarial attacks can discover models' weaknesses by adding semantic-preserved but misleading perturbations to the inputs. The long-lasting adversarial attack-and-defense arms race in Natural Language Processing (NLP) is algorithm-centric, providing valuable techniques for automatic robustness evaluation. However, the existing practice of robustness evaluation may exhibit issues of incom… ▽ More Textual adversarial attacks can discover models' weaknesses by adding semantic-preserved but misleading perturbations to the inputs. The long-lasting adversarial attack-and-defense arms race in Natural Language Processing (NLP) is algorithm-centric, providing valuable techniques for automatic robustness evaluation. However, the existing practice of robustness evaluation may exhibit issues of incomprehensive evaluation, impractical evaluation protocol, and invalid adversarial samples. In this paper, we aim to set up a unified automatic robustness evaluation framework, shifting towards model-centric evaluation to further exploit the advantages of adversarial attacks. To address the above challenges, we first determine robustness evaluation dimensions based on model capabilities and specify the reasonable algorithm to generate adversarial samples for each dimension. Then we establish the evaluation protocol, including evaluation settings and metrics, under realistic demands. Finally, we use the perturbation degree of adversarial samples to control the sample validity. We implement a toolkit RobTest that realizes our automatic robustness evaluation framework. In our experiments, we conduct a robustness evaluation of RoBERTa models to demonstrate the effectiveness of our evaluation framework, and further show the rationality of each component in the framework. The code will be made public at \url{https://github.com/thunlp/RobTest}. △ Less

Submitted 29 May, 2023; originally announced May 2023.

Comments: Accepted to Findings of ACL 2023

arXiv:2305.17542 [pdf, other]

Non-Sequential Graph Script Induction via Multimedia Grounding

Authors: Yu Zhou, Sha Li, Manling Li, Xudong Lin, Shih-Fu Chang, Mohit Bansal, Heng Ji

Abstract: Online resources such as WikiHow compile a wide range of scripts for performing everyday tasks, which can assist models in learning to reason about procedures. However, the scripts are always presented in a linear manner, which does not reflect the flexibility displayed by people executing tasks in real life. For example, in the CrossTask Dataset, 64.5% of consecutive step pairs are also observed… ▽ More Online resources such as WikiHow compile a wide range of scripts for performing everyday tasks, which can assist models in learning to reason about procedures. However, the scripts are always presented in a linear manner, which does not reflect the flexibility displayed by people executing tasks in real life. For example, in the CrossTask Dataset, 64.5% of consecutive step pairs are also observed in the reverse order, suggesting their ordering is not fixed. In addition, each step has an average of 2.56 frequent next steps, demonstrating "branching". In this paper, we propose the new challenging task of non-sequential graph script induction, aiming to capture optional and interchangeable steps in procedural planning. To automate the induction of such graph scripts for given tasks, we propose to take advantage of loosely aligned videos of people performing the tasks. In particular, we design a multimodal framework to ground procedural videos to WikiHow textual steps and thus transform each video into an observed step path on the latent ground truth graph script. This key transformation enables us to train a script knowledge model capable of both generating explicit graph scripts for learnt tasks and predicting future steps given a partial step sequence. Our best model outperforms the strongest pure text/vision baselines by 17.52% absolute gains on F1@3 for next step prediction and 13.8% absolute gains on Acc@1 for partial sequence completion. Human evaluation shows our model outperforming the WikiHow linear baseline by 48.76% absolute gains in capturing sequential and non-sequential step relationships. △ Less

Submitted 27 May, 2023; originally announced May 2023.

arXiv:2305.17373 [pdf, other]

Zero- and Few-Shot Event Detection via Prompt-Based Meta Learning

Authors: Zhenrui Yue, Huimin Zeng, Mengfei Lan, Heng Ji, Dong Wang

Abstract: With emerging online topics as a source for numerous new events, detecting unseen / rare event types presents an elusive challenge for existing event detection methods, where only limited data access is provided for training. To address the data scarcity problem in event detection, we propose MetaEvent, a meta learning-based framework for zero- and few-shot event detection. Specifically, we sample… ▽ More With emerging online topics as a source for numerous new events, detecting unseen / rare event types presents an elusive challenge for existing event detection methods, where only limited data access is provided for training. To address the data scarcity problem in event detection, we propose MetaEvent, a meta learning-based framework for zero- and few-shot event detection. Specifically, we sample training tasks from existing event types and perform meta training to search for optimal parameters that quickly adapt to unseen tasks. In our framework, we propose to use the cloze-based prompt and a trigger-aware soft verbalizer to efficiently project output to unseen event types. Moreover, we design a contrastive meta objective based on maximum mean discrepancy (MMD) to learn class-separating features. As such, the proposed MetaEvent can perform zero-shot event detection by map** features to event types without any prior knowledge. In our experiments, we demonstrate the effectiveness of MetaEvent in both zero-shot and few-shot scenarios, where the proposed method achieves state-of-the-art performance in extensive experiments on benchmark datasets FewEvent and MAVEN. △ Less

Submitted 27 May, 2023; originally announced May 2023.

Comments: Accepted to ACL 2023

arXiv:2305.16470 [pdf, other]

Measuring the Effect of Influential Messages on Varying Personas

Authors: Chenkai Sun, **ning Li, Hou Pong Chan, ChengXiang Zhai, Heng Ji

Abstract: Predicting how a user responds to news events enables important applications such as allowing intelligent agents or content producers to estimate the effect on different communities and revise unreleased messages to prevent unexpected bad outcomes such as social conflict and moral injury. We present a new task, Response Forecasting on Personas for News Media, to estimate the response a persona (ch… ▽ More Predicting how a user responds to news events enables important applications such as allowing intelligent agents or content producers to estimate the effect on different communities and revise unreleased messages to prevent unexpected bad outcomes such as social conflict and moral injury. We present a new task, Response Forecasting on Personas for News Media, to estimate the response a persona (characterizing an individual or a group) might have upon seeing a news message. Compared to the previous efforts which only predict generic comments to news, the proposed task not only introduces personalization in the modeling but also predicts the sentiment polarity and intensity of each response. This enables more accurate and comprehensive inference on the mental state of the persona. Meanwhile, the generated sentiment dimensions make the evaluation and application more reliable. We create the first benchmark dataset, which consists of 13,357 responses to 3,847 news headlines from Twitter. We further evaluate the SOTA neural language models with our dataset. The empirical results suggest that the included persona attributes are helpful for the performance of all response dimensions. Our analysis shows that the best-performing models are capable of predicting responses that are consistent with the personas, and as a byproduct, the task formulation also enables many interesting applications in the analysis of social network groups and their opinions, such as the discovery of extreme opinion groups. △ Less

Submitted 25 May, 2023; originally announced May 2023.

arXiv:2305.16133 [pdf, other]

OVO: Open-Vocabulary Occupancy

Authors: Zhiyu Tan, Zichao Dong, Cheng Zhang, Weikun Zhang, Hang Ji, Hao Li

Abstract: Semantic occupancy prediction aims to infer dense geometry and semantics of surroundings for an autonomous agent to operate safely in the 3D environment. Existing occupancy prediction methods are almost entirely trained on human-annotated volumetric data. Although of high quality, the generation of such 3D annotations is laborious and costly, restricting them to a few specific object categories in… ▽ More Semantic occupancy prediction aims to infer dense geometry and semantics of surroundings for an autonomous agent to operate safely in the 3D environment. Existing occupancy prediction methods are almost entirely trained on human-annotated volumetric data. Although of high quality, the generation of such 3D annotations is laborious and costly, restricting them to a few specific object categories in the training dataset. To address this limitation, this paper proposes Open Vocabulary Occupancy (OVO), a novel approach that allows semantic occupancy prediction of arbitrary classes but without the need for 3D annotations during training. Keys to our approach are (1) knowledge distillation from a pre-trained 2D open-vocabulary segmentation model to the 3D occupancy network, and (2) pixel-voxel filtering for high-quality training data generation. The resulting framework is simple, compact, and compatible with most state-of-the-art semantic occupancy prediction models. On NYUv2 and SemanticKITTI datasets, OVO achieves competitive performance compared to supervised semantic occupancy prediction approaches. Furthermore, we conduct extensive analyses and ablation studies to offer insights into the design of the proposed framework. Our code is publicly available at https://github.com/dzcgaara/OVO. △ Less

Submitted 14 June, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

arXiv:2305.14647 [pdf, other]

Scientific Opinion Summarization: Paper Meta-review Generation Dataset, Methods, and Evaluation

Authors: Qi Zeng, Mankeerat Sidhu, Ansel Blume, Hou Pong Chan, Lu Wang, Heng Ji

Abstract: Opinions in scientific research papers can be divergent, leading to controversies among reviewers. However, most existing datasets for opinion summarization are centered around product reviews and assume that the analyzed opinions are non-controversial, failing to account for the variability seen in other contexts such as academic papers, political debates, or social media discussions. To address… ▽ More Opinions in scientific research papers can be divergent, leading to controversies among reviewers. However, most existing datasets for opinion summarization are centered around product reviews and assume that the analyzed opinions are non-controversial, failing to account for the variability seen in other contexts such as academic papers, political debates, or social media discussions. To address this gap, we propose the task of scientific opinion summarization, where research paper reviews are synthesized into meta-reviews. To facilitate this task, we introduce the ORSUM dataset covering 15,062 paper meta-reviews and 57,536 paper reviews from 47 conferences. Furthermore, we propose the Checklist-guided Iterative Introspection approach, which breaks down scientific opinion summarization into several stages, iteratively refining the summary under the guidance of questions from a checklist. Our experiments show that (1) human-written summaries do not always satisfy all necessary criteria such as depth of discussion, and identifying consensus and controversy for the specific domain, and (2) the combination of task decomposition and iterative self-refinement shows strong potential for enhancing the opinions and can be applied to other complex text generation using black-box LLMs. △ Less

Submitted 15 June, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: IJCAI 2024 AI4Research Workshop

arXiv:2305.14548 [pdf, other]

Interpretable Automatic Fine-grained Inconsistency Detection in Text Summarization

Authors: Hou Pong Chan, Qi Zeng, Heng Ji

Abstract: Existing factual consistency evaluation approaches for text summarization provide binary predictions and limited insights into the weakness of summarization systems. Therefore, we propose the task of fine-grained inconsistency detection, the goal of which is to predict the fine-grained types of factual errors in a summary. Motivated by how humans inspect factual inconsistency in summaries, we prop… ▽ More Existing factual consistency evaluation approaches for text summarization provide binary predictions and limited insights into the weakness of summarization systems. Therefore, we propose the task of fine-grained inconsistency detection, the goal of which is to predict the fine-grained types of factual errors in a summary. Motivated by how humans inspect factual inconsistency in summaries, we propose an interpretable fine-grained inconsistency detection model, FineGrainFact, which explicitly represents the facts in the documents and summaries with semantic frames extracted by semantic role labeling, and highlights the related semantic frames to predict inconsistency. The highlighted semantic frames help verify predicted error types and correct inconsistent summaries. Experiment results demonstrate that our model outperforms strong baselines and provides evidence to support or refute the summary. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: Accepted by ACL Findings 2023. Code and data are available at https://github.com/kenchan0226/fineGrainedFact

arXiv:2305.14318 [pdf, other]

CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models

Authors: Cheng Qian, Chi Han, Yi R. Fung, Yujia Qin, Zhiyuan Liu, Heng Ji

Abstract: Large Language Models (LLMs) have made significant progress in utilizing tools, but their ability is limited by API availability and the instability of implicit reasoning, particularly when both planning and execution are involved. To overcome these limitations, we propose CREATOR, a novel framework that enables LLMs to create their own tools using documentation and code realization. CREATOR disen… ▽ More Large Language Models (LLMs) have made significant progress in utilizing tools, but their ability is limited by API availability and the instability of implicit reasoning, particularly when both planning and execution are involved. To overcome these limitations, we propose CREATOR, a novel framework that enables LLMs to create their own tools using documentation and code realization. CREATOR disentangles abstract tool creation and concrete decision execution, resulting in improved performance. We evaluate CREATOR on MATH and TabMWP benchmarks, respectively consisting of challenging math competition problems and diverse tabular contents. Remarkably, CREATOR outperforms existing chain-of-thought, program-of-thought, and tool-using baselines. Additionally, we introduce the Creation Challenge dataset, featuring 2K diverse questions, to emphasize the necessity and benefits of LLMs' tool creation ability. Further research demonstrates that leveraging LLMs as tool creators facilitates knowledge transfer, and LLMs exhibit varying levels of tool creation abilities, enabling them to adapt to diverse situations. The tool creation ability revolutionizes the LLM's problem-solving paradigm, driving us closer to the next frontier of artificial intelligence. All the codes and data are released. △ Less

Submitted 21 June, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: Findings of EMNLP 2023

arXiv:2305.14259 [pdf, other]

SciMON: Scientific Inspiration Machines Optimized for Novelty

Authors: Qingyun Wang, Doug Downey, Heng Ji, Tom Hope

Abstract: We explore and enhance the ability of neural language models to generate novel scientific directions grounded in literature. Work on literature-based hypothesis generation has traditionally focused on binary link prediction--severely limiting the expressivity of hypotheses. This line of work also does not focus on optimizing novelty. We take a dramatic departure with a novel setting in which model… ▽ More We explore and enhance the ability of neural language models to generate novel scientific directions grounded in literature. Work on literature-based hypothesis generation has traditionally focused on binary link prediction--severely limiting the expressivity of hypotheses. This line of work also does not focus on optimizing novelty. We take a dramatic departure with a novel setting in which models use as input background contexts (e.g., problems, experimental settings, goals), and output natural language ideas grounded in literature. We present SciMON, a modeling framework that uses retrieval of "inspirations" from past scientific papers, and explicitly optimizes for novelty by iteratively comparing to prior papers and updating idea suggestions until sufficient novelty is achieved. Comprehensive evaluations reveal that GPT-4 tends to generate ideas with overall low technical depth and novelty, while our methods partially mitigate this issue. Our work represents a first step toward evaluating and develo** language models that generate new ideas derived from the scientific literature △ Less

Submitted 3 June, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: 21 pages. Code and resource are available at https://github.com/EagleW/CLBD Accepted by the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)

arXiv:2305.14225 [pdf, other]

ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media

Authors: Kung-Hsiang Huang, Hou Pong Chan, Kathleen McKeown, Heng Ji

Abstract: Considerable advancements have been made to tackle the misrepresentation of information derived from reference articles in the domains of fact-checking and faithful summarization. However, an unaddressed aspect remains - the identification of social media posts that manipulate information within associated news articles. This task presents a significant challenge, primarily due to the prevalence o… ▽ More Considerable advancements have been made to tackle the misrepresentation of information derived from reference articles in the domains of fact-checking and faithful summarization. However, an unaddressed aspect remains - the identification of social media posts that manipulate information within associated news articles. This task presents a significant challenge, primarily due to the prevalence of personal opinions in such posts. We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information. To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles. Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance. Additionally, we have developed a simple yet effective basic model that outperforms LLMs significantly on the ManiTweet dataset. Finally, we have conducted an exploratory analysis of human-written tweets, unveiling intriguing connections between manipulation and the domain and factuality of news articles, as well as revealing that manipulated sentences are more likely to encapsulate the main story or consequences of a news outlet. △ Less

Submitted 12 June, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

arXiv:2305.12798 [pdf, other]

Word Embeddings Are Steers for Language Models

Authors: Chi Han, Jialiang Xu, Manling Li, Yi Fung, Chenkai Sun, Nan Jiang, Tarek Abdelzaher, Heng Ji

Abstract: Language models (LMs) automatically learn word embeddings during pre-training on language corpora. Although word embeddings are usually interpreted as feature vectors for individual words, their roles in language model generation remain underexplored. In this work, we theoretically and empirically revisit output word embeddings and find that their linear transformations are equivalent to steering… ▽ More Language models (LMs) automatically learn word embeddings during pre-training on language corpora. Although word embeddings are usually interpreted as feature vectors for individual words, their roles in language model generation remain underexplored. In this work, we theoretically and empirically revisit output word embeddings and find that their linear transformations are equivalent to steering language model generation styles. We name such steers LM-Steers and find them existing in LMs of all sizes. It requires learning parameters equal to 0.2% of the original LMs' size for steering each style. On tasks such as language model detoxification and sentiment control, LM-Steers can achieve comparable or superior performance compared with state-of-the-art controlled generation methods while maintaining a better balance with generation quality. The learned LM-Steer serves as a lens in text styles: it reveals that word embeddings are interpretable when associated with language model generations and can highlight text spans that most indicate the style differences. An LM-Steer is transferrable between different language models by an explicit form calculation. One can also continuously steer LMs simply by scaling the LM-Steer or compose multiple LM-Steers by adding their transformations. Our codes are publicly available at \url{https://github.com/Glaciohound/LM-Steer}. △ Less

Submitted 6 June, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: ACL 2024 Long Paper, 9 pages, 3 figures

arXiv:2305.12766 [pdf, other]

Explaining Emergent In-Context Learning as Kernel Regression

Authors: Chi Han, Ziqi Wang, Han Zhao, Heng Ji

Abstract: Large language models (LLMs) have initiated a paradigm shift in transfer learning. In contrast to the classic pretraining-then-finetuning procedure, in order to use LLMs for downstream prediction tasks, one only needs to provide a few demonstrations, known as in-context examples, without adding more or updating existing model parameters. This in-context learning (ICL) capability of LLMs is intrigu… ▽ More Large language models (LLMs) have initiated a paradigm shift in transfer learning. In contrast to the classic pretraining-then-finetuning procedure, in order to use LLMs for downstream prediction tasks, one only needs to provide a few demonstrations, known as in-context examples, without adding more or updating existing model parameters. This in-context learning (ICL) capability of LLMs is intriguing, and it is not yet fully understood how pretrained LLMs acquire such capabilities. In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training on a general language corpus by proposing one hypothesis that LLMs can simulate kernel regression with internal representations when faced with in-context examples. More concretely, we first prove that Bayesian inference on in-context prompts can be asymptotically understood as kernel regression $\hat y = \sum_i y_i K(x, x_i)/\sum_i K(x, x_i)$ as the number of in-context demonstrations grows. Then, we empirically investigate the in-context behaviors of language models. We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression. Finally, our theory provides insights into multiple phenomena observed in the ICL field: why retrieving demonstrative samples similar to test samples can help, why ICL performance is sensitive to the output formats, and why ICL accuracy benefits from selecting in-distribution and representative samples. △ Less

Submitted 5 October, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: 9 pages, 4 figures

arXiv:2305.12738 [pdf, other]

Logical Entity Representation in Knowledge-Graphs for Differentiable Rule Learning

Authors: Chi Han, Qizheng He, Charles Yu, Xinya Du, Hanghang Tong, Heng Ji

Abstract: Probabilistic logical rule learning has shown great strength in logical rule mining and knowledge graph completion. It learns logical rules to predict missing edges by reasoning on existing edges in the knowledge graph. However, previous efforts have largely been limited to only modeling chain-like Horn clauses such as $R_1(x,z)\land R_2(z,y)\Rightarrow H(x,y)$. This formulation overlooks addition… ▽ More Probabilistic logical rule learning has shown great strength in logical rule mining and knowledge graph completion. It learns logical rules to predict missing edges by reasoning on existing edges in the knowledge graph. However, previous efforts have largely been limited to only modeling chain-like Horn clauses such as $R_1(x,z)\land R_2(z,y)\Rightarrow H(x,y)$. This formulation overlooks additional contextual information from neighboring sub-graphs of entity variables $x$, $y$ and $z$. Intuitively, there is a large gap here, as local sub-graphs have been found to provide important information for knowledge graph completion. Inspired by these observations, we propose Logical Entity RePresentation (LERP) to encode contextual information of entities in the knowledge graph. A LERP is designed as a vector of probabilistic logical functions on the entity's neighboring sub-graph. It is an interpretable representation while allowing for differentiable optimization. We can then incorporate LERP into probabilistic logical rule learning to learn more expressive rules. Empirical results demonstrate that with LERP, our model outperforms other rule learning methods in knowledge graph completion and is comparable or even superior to state-of-the-art black-box methods. Moreover, we find that our model can discover a more expressive family of logical rules. LERP can also be further combined with embedding learning methods like TransE to make it more interpretable. △ Less

Submitted 22 May, 2023; originally announced May 2023.

Comments: 9 pages, 5 figures; accepted by 11th International Conference on Learning Representations (ICLR 2023)

arXiv:2305.12565 [pdf, other]

Understanding the Effect of Data Augmentation on Knowledge Distillation

Authors: Ziqi Wang, Chi Han, Wenxuan Bao, Heng Ji

Abstract: Knowledge distillation (KD) requires sufficient data to transfer knowledge from large-scale teacher models to small-scale student models. Therefore, data augmentation has been widely used to mitigate the shortage of data under specific scenarios. Classic data augmentation techniques, such as synonym replacement and k-nearest-neighbors, are initially designed for fine-tuning. To avoid severe semant… ▽ More Knowledge distillation (KD) requires sufficient data to transfer knowledge from large-scale teacher models to small-scale student models. Therefore, data augmentation has been widely used to mitigate the shortage of data under specific scenarios. Classic data augmentation techniques, such as synonym replacement and k-nearest-neighbors, are initially designed for fine-tuning. To avoid severe semantic shifts and preserve task-specific labels, those methods prefer to change only a small proportion of tokens (e.g., changing 10% tokens is generally the best option for fine-tuning). However, such data augmentation methods are sub-optimal for knowledge distillation since the teacher model could provide label distributions and is more tolerant to semantic shifts. We first observe that KD prefers as much data as possible, which is different from fine-tuning that too much data will not gain more performance. Since changing more tokens leads to more semantic shifts, we use the proportion of changed tokens to reflect semantic shift degrees. Then we find that KD prefers augmented data with a larger semantic shift degree (e.g., changing 30% tokens is generally the best option for KD) than fine-tuning (changing 10% tokens). Besides, our findings show that smaller datasets prefer larger degrees until the out-of-distribution problem occurs (e.g., datasets with less than 10k inputs may prefer the 50% degree, and datasets with more than 100k inputs may prefer the 10% degree). Our work sheds light on the preference difference in data augmentation between fine-tuning and knowledge distillation and encourages the community to explore KD-specific data augmentation methods. △ Less

Submitted 21 May, 2023; originally announced May 2023.

Comments: 14 pages, 8 tables, 5 figures

arXiv:2305.11744 [pdf, other]

ReFIT: Relevance Feedback from a Reranker during Inference

Authors: Revanth Gangi Reddy, Pradeep Dasigi, Md Arafat Sultan, Arman Cohan, Avirup Sil, Heng Ji, Hannaneh Hajishirzi

Abstract: Retrieve-and-rerank is a prevalent framework in neural information retrieval, wherein a bi-encoder network initially retrieves a pre-defined number of candidates (e.g., K=100), which are then reranked by a more powerful cross-encoder model. While the reranker often yields improved candidate scores compared to the retriever, its scope is confined to only the top K retrieved candidates. As a result,… ▽ More Retrieve-and-rerank is a prevalent framework in neural information retrieval, wherein a bi-encoder network initially retrieves a pre-defined number of candidates (e.g., K=100), which are then reranked by a more powerful cross-encoder model. While the reranker often yields improved candidate scores compared to the retriever, its scope is confined to only the top K retrieved candidates. As a result, the reranker cannot improve retrieval performance in terms of Recall@K. In this work, we propose to leverage the reranker to improve recall by making it provide relevance feedback to the retriever at inference time. Specifically, given a test instance during inference, we distill the reranker's predictions for that instance into the retriever's query representation using a lightweight update mechanism. The aim of the distillation loss is to align the retriever's candidate scores more closely with those produced by the reranker. The algorithm then proceeds by executing a second retrieval step using the updated query vector. We empirically demonstrate that this method, applicable to various retrieve-and-rerank frameworks, substantially enhances retrieval recall across multiple domains, languages, and modalities. △ Less

Submitted 28 May, 2024; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: Preprint

arXiv:2305.11499 [pdf, other]

RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought

Authors: Tianci Xue, Ziqi Wang, Zhenhailong Wang, Chi Han, Pengfei Yu, Heng Ji

Abstract: Large language Models (LLMs) have achieved promising performance on arithmetic reasoning tasks by incorporating step-by-step chain-of-thought (CoT) prompting. However, LLMs face challenges in maintaining factual consistency during reasoning, exhibiting tendencies to condition overlooking, question misinterpretation, and condition hallucination over given problems. Existing methods use coarse-grain… ▽ More Large language Models (LLMs) have achieved promising performance on arithmetic reasoning tasks by incorporating step-by-step chain-of-thought (CoT) prompting. However, LLMs face challenges in maintaining factual consistency during reasoning, exhibiting tendencies to condition overlooking, question misinterpretation, and condition hallucination over given problems. Existing methods use coarse-grained feedback (e.g., whether the answer is correct) to improve factual consistency. In this work, we propose RCoT (Reversing Chain-of-Thought), a novel method to improve LLMs' reasoning abilities by automatically detecting and rectifying factual inconsistency in LLMs, generated solutions. To detect factual inconsistency, RCoT first asks LLMs to reconstruct the problem based on generated solutions. Then fine-grained comparisons between the original problem and the reconstructed problem expose the factual inconsistency in the original solutions. To rectify the solution, RCoT formulates detected factual inconsistency into fine-grained feedback to guide LLMs in revising solutions. Experimental results demonstrate improvements of RCoT over standard CoT, Self-Consistency and Self-Refine across seven arithmetic datasets. Moreover, we find that manually written fine-grained feedback can dramatically improve LLMs' reasoning abilities (e.g., ChatGPT reaches 94.6% accuracy on GSM8K), encouraging the community to further explore the fine-grained feedback generation methods. △ Less

Submitted 1 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: 24 pages, 21 figures

arXiv:2305.10683 [pdf, other]

Paxion: Patching Action Knowledge in Video-Language Foundation Models

Authors: Zhenhailong Wang, Ansel Blume, Sha Li, Genglin Liu, Jaemin Cho, Zineng Tang, Mohit Bansal, Heng Ji

Abstract: Action knowledge involves the understanding of textual, visual, and temporal aspects of actions. We introduce the Action Dynamics Benchmark (ActionBench) containing two carefully designed probing tasks: Action Antonym and Video Reversal, which targets multimodal alignment capabilities and temporal understanding skills of the model, respectively. Despite recent video-language models' (VidLM) impres… ▽ More Action knowledge involves the understanding of textual, visual, and temporal aspects of actions. We introduce the Action Dynamics Benchmark (ActionBench) containing two carefully designed probing tasks: Action Antonym and Video Reversal, which targets multimodal alignment capabilities and temporal understanding skills of the model, respectively. Despite recent video-language models' (VidLM) impressive performance on various benchmark tasks, our diagnostic tasks reveal their surprising deficiency (near-random performance) in action knowledge, suggesting that current models rely on object recognition abilities as a shortcut for action understanding. To remedy this, we propose a novel framework, Paxion, along with a new Discriminative Video Dynamics Modeling (DVDM) objective. The Paxion framework utilizes a Knowledge Patcher network to encode new action knowledge and a Knowledge Fuser component to integrate the Patcher into frozen VidLMs without compromising their existing capabilities. Due to limitations of the widely-used Video-Text Contrastive (VTC) loss for learning action knowledge, we introduce the DVDM objective to train the Knowledge Patcher. DVDM forces the model to encode the correlation between the action text and the correct ordering of video frames. Our extensive analyses show that Paxion and DVDM together effectively fill the gap in action knowledge understanding (~50% to 80%), while maintaining or improving performance on a wide spectrum of both object- and action-centric downstream tasks. The code and data will be made publicly available for research purposes at https://github.com/MikeWangWZHL/Paxion.git. △ Less

Submitted 21 October, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

Comments: NeurIPS 2023 spotlight

arXiv:2305.10314 [pdf, other]

LeTI: Learning to Generate from Textual Interactions

Authors: Xingyao Wang, Hao Peng, Reyhaneh Jabbarvand, Heng Ji

Abstract: Fine-tuning pre-trained language models (LMs) is essential for enhancing their capabilities. Existing techniques commonly fine-tune on input-output pairs (e.g., instruction tuning) or with numerical rewards that gauge the output quality (e.g., RLHF). We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and ex… ▽ More Fine-tuning pre-trained language models (LMs) is essential for enhancing their capabilities. Existing techniques commonly fine-tune on input-output pairs (e.g., instruction tuning) or with numerical rewards that gauge the output quality (e.g., RLHF). We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and explain errors in their outputs through textual feedback. Our focus is the code generation task, where the model produces code based on natural language instructions. This setting invites a natural and scalable way to acquire textual feedback: the error messages and stack traces from code execution using a Python interpreter. LETI iteratively fine-tunes the model, using the LM objective, on a concatenation of natural language instructions, LM-generated programs, and textual feedback. Prepended to this fine-tuning text, a binary reward token is used to differentiate correct and buggy solutions. LETI requires no ground-truth outputs for training and even outperforms a fine-tuned baseline that does. LETI not only improves the performance of LMs on a code generation dataset MBPP, but also generalizes to other datasets. Trained on MBPP, it achieves comparable or better performance than the base LMs on unseen problems in HumanEval. Furthermore, compared to binary feedback, we observe that textual feedback leads to improved generation quality and sample efficiency, achieving the same performance with fewer than half of the gradient steps. LETI is equally applicable in natural language tasks when they can be formulated as code generation, which we empirically verified on event argument extraction. △ Less

Submitted 19 March, 2024; v1 submitted 17 May, 2023; originally announced May 2023.

Comments: NAACL 2024 Findings

arXiv:2305.07982 [pdf, other]

Zero-shot Faithful Factual Error Correction

Authors: Kung-Hsiang Huang, Hou Pong Chan, Heng Ji

Abstract: Faithfully correcting factual errors is critical for maintaining the integrity of textual knowledge bases and preventing hallucinations in sequence-to-sequence models. Drawing on humans' ability to identify and correct factual errors, we present a zero-shot framework that formulates questions about input claims, looks for correct answers in the given evidence, and assesses the faithfulness of each… ▽ More Faithfully correcting factual errors is critical for maintaining the integrity of textual knowledge bases and preventing hallucinations in sequence-to-sequence models. Drawing on humans' ability to identify and correct factual errors, we present a zero-shot framework that formulates questions about input claims, looks for correct answers in the given evidence, and assesses the faithfulness of each correction based on its consistency with the evidence. Our zero-shot framework outperforms fully-supervised approaches, as demonstrated by experiments on the FEVER and SciFact datasets, where our outputs are shown to be more faithful. More importantly, the decomposability nature of our framework inherently provides interpretability. Additionally, to reveal the most suitable metrics for evaluating factual error corrections, we analyze the correlation between commonly used metrics with human judgments in terms of three different dimensions regarding intelligibility and faithfulness. △ Less

Submitted 27 May, 2023; v1 submitted 13 May, 2023; originally announced May 2023.

Comments: Accepted by ACL 2023

arXiv:2305.06407 [pdf, other]

Combo of Thinking and Observing for Outside-Knowledge VQA

Authors: Qingyi Si, Yuchen Mo, Zheng Lin, Huishan Ji, Wei** Wang

Abstract: Outside-knowledge visual question answering is a challenging task that requires both the acquisition and the use of open-ended real-world knowledge. Some existing solutions draw external knowledge into the cross-modality space which overlooks the much vaster textual knowledge in natural-language space, while others transform the image into a text that further fuses with the textual knowledge into… ▽ More Outside-knowledge visual question answering is a challenging task that requires both the acquisition and the use of open-ended real-world knowledge. Some existing solutions draw external knowledge into the cross-modality space which overlooks the much vaster textual knowledge in natural-language space, while others transform the image into a text that further fuses with the textual knowledge into the natural-language space and completely abandons the use of visual features. In this paper, we are inspired to constrain the cross-modality space into the same space of natural-language space which makes the visual features preserved directly, and the model still benefits from the vast knowledge in natural-language space. To this end, we propose a novel framework consisting of a multimodal encoder, a textual encoder and an answer decoder. Such structure allows us to introduce more types of knowledge including explicit and implicit multimodal and textual knowledge. Extensive experiments validate the superiority of the proposed method which outperforms the state-of-the-art by 6.17% accuracy. We also conduct comprehensive ablations of each component, and systematically study the roles of varying types of knowledge. Codes and knowledge data can be found at https://github.com/PhoebusSi/Thinking-while-Observing. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: ACL-23, Main Conference

arXiv:2304.10689 [pdf, ps, other]

Decay of geometry for a class of cubic polynomials

Authors: Haoyang Ji, Wenxiu Ma

Abstract: In this paper we study a class of bimodal cubic polynomials for which its critical points have the same $ω$-limit set which is an invariant Cantor set. These maps have generalized Fibonacci combinatorics in terms of generalized renormalization on the twin principal nest. It is proved that such maps possess `decay of geometry' in the sense that the scaling factor of the twin principal nest decrease… ▽ More In this paper we study a class of bimodal cubic polynomials for which its critical points have the same $ω$-limit set which is an invariant Cantor set. These maps have generalized Fibonacci combinatorics in terms of generalized renormalization on the twin principal nest. It is proved that such maps possess `decay of geometry' in the sense that the scaling factor of the twin principal nest decreases at least exponentially fast. As an application, we prove that they have no Cantor attractor. △ Less

Submitted 5 July, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

arXiv:2304.08354 [pdf, other]

Tool Learning with Foundation Models

Authors: Yujia Qin, Shengding Hu, Yankai Lin, Weize Chen, Ning Ding, Ganqu Cui, Zheni Zeng, Yufei Huang, Chaojun Xiao, Chi Han, Yi Ren Fung, Yusheng Su, Huadong Wang, Cheng Qian, Runchu Tian, Kunlun Zhu, Shihao Liang, Xingyu Shen, Bokai Xu, Zhen Zhang, Yining Ye, Bowen Li, Ziwei Tang, **g Yi, Yuzhang Zhu , et al. (16 additional authors not shown)

Abstract: Humans possess an extraordinary ability to create and utilize tools, allowing them to overcome physical limitations and explore new frontiers. With the advent of foundation models, AI systems have the potential to be equally adept in tool use as humans. This paradigm, i.e., tool learning with foundation models, combines the strengths of specialized tools and foundation models to achieve enhanced a… ▽ More Humans possess an extraordinary ability to create and utilize tools, allowing them to overcome physical limitations and explore new frontiers. With the advent of foundation models, AI systems have the potential to be equally adept in tool use as humans. This paradigm, i.e., tool learning with foundation models, combines the strengths of specialized tools and foundation models to achieve enhanced accuracy, efficiency, and automation in problem-solving. Despite its immense potential, there is still a lack of a comprehensive understanding of key challenges, opportunities, and future endeavors in this field. To this end, we present a systematic investigation of tool learning in this paper. We first introduce the background of tool learning, including its cognitive origins, the paradigm shift of foundation models, and the complementary roles of tools and models. Then we recapitulate existing tool learning research into tool-augmented and tool-oriented learning. We formulate a general tool learning framework: starting from understanding the user instruction, models should learn to decompose a complex task into several subtasks, dynamically adjust their plan through reasoning, and effectively conquer each sub-task by selecting appropriate tools. We also discuss how to train models for improved tool-use capabilities and facilitate the generalization in tool learning. Considering the lack of a systematic tool learning evaluation in prior works, we experiment with 18 representative tools and show the potential of current foundation models in skillfully utilizing tools. Finally, we discuss several open problems that require further investigation for tool learning. Overall, we hope this paper could inspire future research in integrating tools with foundation models. △ Less

Submitted 15 June, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

arXiv:2303.15375 [pdf, other]

doi 10.1145/3613424.3614256

Demystifying CXL Memory with Genuine CXL-Ready Systems and Devices

Authors: Yan Sun, Yifan Yuan, Zeduo Yu, Reese Kuper, Chihun Song, **ghan Huang, Houxiang Ji, Siddharth Agarwal, Jiaqi Lou, Ipoom Jeong, Ren Wang, Jung Ho Ahn, Tianyin Xu, Nam Sung Kim

Abstract: The ever-growing demands for memory with larger capacity and higher bandwidth have driven recent innovations on memory expansion and disaggregation technologies based on Compute eXpress Link (CXL). Especially, CXL-based memory expansion technology has recently gained notable attention for its ability not only to economically expand memory capacity and bandwidth but also to decouple memory technolo… ▽ More The ever-growing demands for memory with larger capacity and higher bandwidth have driven recent innovations on memory expansion and disaggregation technologies based on Compute eXpress Link (CXL). Especially, CXL-based memory expansion technology has recently gained notable attention for its ability not only to economically expand memory capacity and bandwidth but also to decouple memory technologies from a specific memory interface of the CPU. However, since CXL memory devices have not been widely available, they have been emulated using DDR memory in a remote NUMA node. In this paper, for the first time, we comprehensively evaluate a true CXL-ready system based on the latest 4th-generation Intel Xeon CPU with three CXL memory devices from different manufacturers. Specifically, we run a set of microbenchmarks not only to compare the performance of true CXL memory with that of emulated CXL memory but also to analyze the complex interplay between the CPU and CXL memory in depth. This reveals important differences between emulated CXL memory and true CXL memory, some of which will compel researchers to revisit the analyses and proposals from recent work. Next, we identify opportunities for memory-bandwidth-intensive applications to benefit from the use of CXL memory. Lastly, we propose a CXL-memory-aware dynamic page allocation policy, Caption to more efficiently use CXL memory as a bandwidth expander. We demonstrate that Caption can automatically converge to an empirically favorable percentage of pages allocated to CXL memory, which improves the performance of memory-bandwidth-intensive applications by up to 24% when compared to the default page allocation policy designed for traditional NUMA systems. △ Less

Submitted 4 October, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

Comments: This paper has been accepted by MICRO'23. Please refer to the https://doi.org/10.1145/3613424.3614256 for the official version of this paper

ACM Class: C.4; D.4; C.0

arXiv:2303.14794 [pdf, other]

Anomalous open orbits in Hofstadter spectrum of Chern insulator

Authors: Haijiao Ji, Noah F. Q. Yuan, Hua Jiang, Haiwen Liu, X. C. Xie

Abstract: The nontrivial band topology can influence the Hofstadter spectrum. We investigate the Hofstadter spectrum for various models of Chern insulators under a rational flux $\frac{φ_{0}}{q}$, here $φ_{0}=\frac{h}{e}$ and $q$ being an integer. We find two major features. First, the number of splitting subbands is $|q-C|$ with Chern number $C$. Second, the anomalous open-orbit subbands with Chern numbers… ▽ More The nontrivial band topology can influence the Hofstadter spectrum. We investigate the Hofstadter spectrum for various models of Chern insulators under a rational flux $\frac{φ_{0}}{q}$, here $φ_{0}=\frac{h}{e}$ and $q$ being an integer. We find two major features. First, the number of splitting subbands is $|q-C|$ with Chern number $C$. Second, the anomalous open-orbit subbands with Chern numbers $q-1$ and $-q-1$ emerge, which are beyond the parameter window $(-q/2,q/2)$ of the Diophantine equation studied by Thouless-Kohmoto-Nightingale-den Nijs [Phys. Rev. Lett. \textbf{49}, 405 (1982)]. These two findings are explained by semiclassical dynamics. We propose that the number of splitting subbands can be utilized to determine Chern number in cold atom systems, and the open-orbit subbands can provide routes to study exotic features beyond the Landau level physics. △ Less

Submitted 26 March, 2023; originally announced March 2023.

arXiv:2303.14728 [pdf, other]

Coarsening of thin films with weak condensation

Authors: Hangjie Ji, Thomas P. Witelski

Abstract: A lubrication model can be used to describe the dynamics of a weakly volatile viscous fluid layer on a hydrophobic substrate. Thin layers of the fluid are unstable to perturbations and break up into slowly evolving interacting droplets. A reduced-order dynamical system is derived from the lubrication model based on the nearest-neighbor droplet interactions in the weak condensation limit. Dynamics… ▽ More A lubrication model can be used to describe the dynamics of a weakly volatile viscous fluid layer on a hydrophobic substrate. Thin layers of the fluid are unstable to perturbations and break up into slowly evolving interacting droplets. A reduced-order dynamical system is derived from the lubrication model based on the nearest-neighbor droplet interactions in the weak condensation limit. Dynamics for periodic arrays of identical drops and pairwise droplet interactions are investigated which provide insights into the coarsening dynamics for large systems. Weak condensation is shown to be a singular perturbation, fundamentally changing the long-time coarsening dynamics for the droplets and the overall mass of the fluid in two additional regimes of long-time dynamics. △ Less

Submitted 16 January, 2024; v1 submitted 26 March, 2023; originally announced March 2023.

Comments: 24 pages, 10 figures

arXiv:2303.14337 [pdf, other]

SmartBook: AI-Assisted Situation Report Generation for Intelligence Analysts

Authors: Revanth Gangi Reddy, Daniel Lee, Yi R. Fung, Khanh Duy Nguyen, Qi Zeng, Manling Li, Ziqi Wang, Clare Voss, Heng Ji

Abstract: Timely and comprehensive understanding of emerging events is crucial for effective decision-making; automating situation report generation can significantly reduce the time, effort, and cost for intelligence analysts. In this work, we identify intelligence analysts' practices and preferences for AI assistance in situation report generation to guide the design strategies for an effective, trust-bui… ▽ More Timely and comprehensive understanding of emerging events is crucial for effective decision-making; automating situation report generation can significantly reduce the time, effort, and cost for intelligence analysts. In this work, we identify intelligence analysts' practices and preferences for AI assistance in situation report generation to guide the design strategies for an effective, trust-building interface that aligns with their thought processes and needs. Next, we introduce SmartBook, an automated framework designed to generate situation reports from large volumes of news data, creating structured reports by automatically discovering event-related strategic questions. These reports include multiple hypotheses (claims), summarized and grounded to sources with factual evidence, to promote in-depth situation understanding. Our comprehensive evaluation of SmartBook, encompassing a user study alongside a content review with an editing study, reveals SmartBook's effectiveness in generating accurate and relevant situation reports. Qualitative evaluations indicate over 80% of questions probe for strategic information, and over 90% of summaries produce tactically useful content, being consistently favored over summaries from a large language model integrated with web search. The editing study reveals that minimal information is removed from the generated text (under 2.5%), suggesting that SmartBook provides analysts with a valuable foundation for situation reports △ Less

Submitted 27 May, 2024; v1 submitted 24 March, 2023; originally announced March 2023.

Comments: Preprint

arXiv:2303.09093 [pdf, other]

GLEN: General-Purpose Event Detection for Thousands of Types

Authors: Qiusi Zhan, Sha Li, Kathryn Conger, Martha Palmer, Heng Ji, Jiawei Han

Abstract: The progress of event extraction research has been hindered by the absence of wide-coverage, large-scale datasets. To make event extraction systems more accessible, we build a general-purpose event detection dataset GLEN, which covers 205K event mentions with 3,465 different types, making it more than 20x larger in ontology than today's largest event dataset. GLEN is created by utilizing the DWD O… ▽ More The progress of event extraction research has been hindered by the absence of wide-coverage, large-scale datasets. To make event extraction systems more accessible, we build a general-purpose event detection dataset GLEN, which covers 205K event mentions with 3,465 different types, making it more than 20x larger in ontology than today's largest event dataset. GLEN is created by utilizing the DWD Overlay, which provides a map** between Wikidata Qnodes and PropBank rolesets. This enables us to use the abundant existing annotation for PropBank as distant supervision. In addition, we also propose a new multi-stage event detection model CEDAR specifically designed to handle the large ontology size in GLEN. We show that our model exhibits superior performance compared to a range of baselines including InstructGPT. Finally, we perform error analysis and show that label noise is still the largest challenge for improving performance for this new dataset. Our dataset, code, and models are released at \url{https://github.com/ZQS1943/GLEN}.} △ Less

Submitted 31 October, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

Comments: Accepted to EMNLP 2023. The first two authors contributed equally. (16 pages)

Showing 101–150 of 551 results for author: Ji, H