-
Native approach to controlled-Z gates in inductively coupled fluxonium qubits
Authors:
Xizheng Ma,
Gengyan Zhang,
Feng Wu,
Feng Bao,
Xu Chang,
Jianjun Chen,
Hao Deng,
Ran Gao,
Xun Gao,
Lijuan Hu,
Honghong Ji,
Hsiang-Sheng Ku,
Kannan Lu,
Lu Ma,
Liyong Mao,
Zhijun Song,
Hantao Sun,
Chengchun Tang,
Fei Wang,
Hongcheng Wang,
Tenghui Wang,
Tian Xia,
Make Ying,
Huijuan Zhan,
Tao Zhou
, et al. (5 additional authors not shown)
Abstract:
The fluxonium qubits have emerged as a promising platform for gate-based quantum information processing. However, their extraordinary protection against charge fluctuations comes at a cost: when coupled capacitively, the qubit-qubit interactions are restricted to XX-interactions. Consequently, effective XX- or XZ-interactions are only constructed either by temporarily populating higher-energy stat…
▽ More
The fluxonium qubits have emerged as a promising platform for gate-based quantum information processing. However, their extraordinary protection against charge fluctuations comes at a cost: when coupled capacitively, the qubit-qubit interactions are restricted to XX-interactions. Consequently, effective XX- or XZ-interactions are only constructed either by temporarily populating higher-energy states, or by exploiting perturbative effects under microwave driving. Instead, we propose and demonstrate an inductive coupling scheme, which offers a wide selection of native qubit-qubit interactions for fluxonium. In particular, we leverage a built-in, flux-controlled ZZ-interaction to perform qubit entanglement. To combat the increased flux-noise-induced dephasing away from the flux-insensitive position, we use a continuous version of the dynamical decoupling scheme to perform noise filtering. Combining these, we demonstrate a 20 ns controlled-Z (CZ) gate with a mean fidelity of 99.53%. More than confirming the efficacy of our gate scheme, this high-fidelity result also reveals a promising but rarely explored parameter space uniquely suitable for gate operations between fluxonium qubits.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
Spectral Estimators for Structured Generalized Linear Models via Approximate Message Passing
Authors:
Yihan Zhang,
Hong Chang Ji,
Ramji Venkataramanan,
Marco Mondelli
Abstract:
We consider the problem of parameter estimation in a high-dimensional generalized linear model. Spectral methods obtained via the principal eigenvector of a suitable data-dependent matrix provide a simple yet surprisingly effective solution. However, despite their wide use, a rigorous performance characterization, as well as a principled way to preprocess the data, are available only for unstructu…
▽ More
We consider the problem of parameter estimation in a high-dimensional generalized linear model. Spectral methods obtained via the principal eigenvector of a suitable data-dependent matrix provide a simple yet surprisingly effective solution. However, despite their wide use, a rigorous performance characterization, as well as a principled way to preprocess the data, are available only for unstructured (i.i.d.\ Gaussian and Haar orthogonal) designs. In contrast, real-world data matrices are highly structured and exhibit non-trivial correlations. To address the problem, we consider correlated Gaussian designs capturing the anisotropic nature of the features via a covariance matrix $Σ$. Our main result is a precise asymptotic characterization of the performance of spectral estimators. This allows us to identify the optimal preprocessing that minimizes the number of samples needed for parameter estimation. Surprisingly, such preprocessing is universal across a broad set of designs, which partly addresses a conjecture on optimal spectral estimators for rotationally invariant models. Our principled approach vastly improves upon previous heuristic methods, including for designs common in computational imaging and genetics. The proposed methodology, based on approximate message passing, is broadly applicable and opens the way to the precise characterization of spiked matrices and of the corresponding spectral methods in a variety of settings.
△ Less
Submitted 3 July, 2024; v1 submitted 28 August, 2023;
originally announced August 2023.
-
HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasks
Authors:
Zichao Dong,
Weikun Zhang,
Xufeng Huang,
Hang Ji,
Xin Zhan,
Junbo Chen
Abstract:
Human robot interaction is an exciting task, which aimed to guide robots following instructions from human. Since huge gap lies between human natural language and machine codes, end to end human robot interaction models is fair challenging. Further, visual information receiving from sensors of robot is also a hard language for robot to perceive. In this work, HuBo-VLM is proposed to tackle percept…
▽ More
Human robot interaction is an exciting task, which aimed to guide robots following instructions from human. Since huge gap lies between human natural language and machine codes, end to end human robot interaction models is fair challenging. Further, visual information receiving from sensors of robot is also a hard language for robot to perceive. In this work, HuBo-VLM is proposed to tackle perception tasks associated with human robot interaction including object detection and visual grounding by a unified transformer based vision language model. Extensive experiments on the Talk2Car benchmark demonstrate the effectiveness of our approach. Code would be publicly available in https://github.com/dzcgaara/HuBo-VLM.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
Ferromagnetic and insulating behavior in both half magnetic levitation and non-levitation LK-99 like samples
Authors:
Pinyuan Wang,
Xiaoqi Liu,
Jun Ge,
Chengcheng Ji,
Haoran Ji,
Yanzhao Liu,
Yiwen Ai,
Gaoxing Ma,
Shichao Qi,
Jian Wang
Abstract:
Finding materials exhibiting superconductivity at room temperature has long been one of the ultimate goals in physics and material science. Recently, room-temperature superconducting properties have been claimed in a copper substituted lead phosphate apatite (Pb$_{10-x}$Cu$_x$(PO$_4$)$_6$O, or called LK-99) [1-3]. Using a similar approach, we have prepared LK-99 like samples and confirmed the half…
▽ More
Finding materials exhibiting superconductivity at room temperature has long been one of the ultimate goals in physics and material science. Recently, room-temperature superconducting properties have been claimed in a copper substituted lead phosphate apatite (Pb$_{10-x}$Cu$_x$(PO$_4$)$_6$O, or called LK-99) [1-3]. Using a similar approach, we have prepared LK-99 like samples and confirmed the half-levitation behaviors in some small specimens under the influence of a magnet at room temperature. To examine the magnetic properties of our samples, we have performed systematic magnetization measurements on the as-grown LK-99-like samples, including the half-levitated and non-levitated samples. The magnetization measurements show the coexistence of soft-ferromagnetic and diamagnetic signals in both half-levitated and non-levitated samples. The electrical transport measurements on the as-grown LK-99-like samples including both half-levitated and non-levitated samples show an insulating behavior characterized by the increasing resistivity with the decreasing temperature.
△ Less
Submitted 28 August, 2023; v1 submitted 16 August, 2023;
originally announced August 2023.
-
Unsupervised 3D Pose Estimation with Non-Rigid Structure-from-Motion Modeling
Authors:
Haorui Ji,
Hui Deng,
Yuchao Dai,
Hongdong Li
Abstract:
Most of the previous 3D human pose estimation work relied on the powerful memory capability of the network to obtain suitable 2D-3D map**s from the training data. Few works have studied the modeling of human posture deformation in motion. In this paper, we propose a new modeling method for human pose deformations and design an accompanying diffusion-based motion prior. Inspired by the field of n…
▽ More
Most of the previous 3D human pose estimation work relied on the powerful memory capability of the network to obtain suitable 2D-3D map**s from the training data. Few works have studied the modeling of human posture deformation in motion. In this paper, we propose a new modeling method for human pose deformations and design an accompanying diffusion-based motion prior. Inspired by the field of non-rigid structure-from-motion, we divide the task of reconstructing 3D human skeletons in motion into the estimation of a 3D reference skeleton, and a frame-by-frame skeleton deformation. A mixed spatial-temporal NRSfMformer is used to simultaneously estimate the 3D reference skeleton and the skeleton deformation of each frame from 2D observations sequence, and then sum them to obtain the pose of each frame. Subsequently, a loss term based on the diffusion model is used to ensure that the pipeline learns the correct prior motion knowledge. Finally, we have evaluated our proposed method on mainstream datasets and obtained superior results outperforming the state-of-the-art.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Towards Integrated Sensing and Communications for 6G: A Standardization Perspective
Authors:
Aryan Kaushik,
Rohit Singh,
Shalanika Dayarathna,
Rajitha Senanayake,
Marco Di Renzo,
Miguel Dajer,
Hyoungju Ji,
Younsun Kim,
Vincenzo Sciancalepore,
Alessio Zappone,
Wonjae Shin
Abstract:
The radio communication division of the International Telecommunication Union (ITU-R) has recently adopted Integrated Sensing and Communication (ISAC) among the key usage scenarios for IMT-2030/6G. ISAC is envisioned to play a vital role in the upcoming wireless generation standards. In this work, we bring together several paramount and innovative aspects of ISAC technology from a global 6G standa…
▽ More
The radio communication division of the International Telecommunication Union (ITU-R) has recently adopted Integrated Sensing and Communication (ISAC) among the key usage scenarios for IMT-2030/6G. ISAC is envisioned to play a vital role in the upcoming wireless generation standards. In this work, we bring together several paramount and innovative aspects of ISAC technology from a global 6G standardization perspective, including both industrial and academic progress. Specifically, this article provides 6G requirements and ISAC-enabled vision, including various aspects of 6G standardization, benefits of ISAC co-existence, and integration challenges. Moreover, we present key enabling technologies, including intelligent metasurface-aided ISAC, as well as Orthogonal Time Frequency Space (OTFS) waveform design and interference management for ISAC. Finally, future aspects are discussed to open various research opportunities and challenges on the ISAC technology towards 6G wireless communications.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
A Mini Immersed Finite Element Method for Two-Phase Stokes Problems on Cartesian Meshes
Authors:
Haifeng Ji,
Dong Liang,
Qian Zhang
Abstract:
This paper presents a mini immersed finite element (IFE) method for solving two- and three-dimensional two-phase Stokes problems on Cartesian meshes. The IFE space is constructed from the conventional mini element with shape functions modified on interface elements according to interface jump conditions, while kee** the degrees of freedom unchanged. Both discontinuous viscosity coefficients and…
▽ More
This paper presents a mini immersed finite element (IFE) method for solving two- and three-dimensional two-phase Stokes problems on Cartesian meshes. The IFE space is constructed from the conventional mini element with shape functions modified on interface elements according to interface jump conditions, while kee** the degrees of freedom unchanged. Both discontinuous viscosity coefficients and surface forces are considered in the construction. The interface is approximated via discrete level set functions and explicit formulas of IFE basis functions and correction functions are derived, which make the IFE method easy to implement. The optimal approximation capabilities of the IFE space and the inf-sup stability and the optimal a priori error estimate of the IFE method are derived rigorously with constants independent of the mesh size and how the interface cuts the mesh. It is also proved that the condition number has the usual bound independent of the interface. Numerical experiments are provided to confirm the theoretical results.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
SynerGPT: In-Context Learning for Personalized Drug Synergy Prediction and Drug Design
Authors:
Carl Edwards,
Aakanksha Naik,
Tushar Khot,
Martin Burke,
Heng Ji,
Tom Hope
Abstract:
Predicting synergistic drug combinations can help accelerate discovery of cancer treatments, particularly therapies personalized to a patient's specific tumor via biopsied cells. In this paper, we propose a novel setting and models for in-context drug synergy learning. We are given a small "personalized dataset" of 10-20 drug synergy relationships in the context of specific cancer cell targets. Ou…
▽ More
Predicting synergistic drug combinations can help accelerate discovery of cancer treatments, particularly therapies personalized to a patient's specific tumor via biopsied cells. In this paper, we propose a novel setting and models for in-context drug synergy learning. We are given a small "personalized dataset" of 10-20 drug synergy relationships in the context of specific cancer cell targets. Our goal is to predict additional drug synergy relationships in that context. Inspired by recent work that pre-trains a GPT language model (LM) to "in-context learn" common function classes, we devise novel pre-training schemes that enable a GPT model to in-context learn "drug synergy functions". Our model -- which does not use any textual corpora, molecular fingerprints, protein interaction or any other domain-specific knowledge -- is able to achieve competitive results. We further integrate our in-context approach with a genetic algorithm to optimize model prompts and select synergy candidates to test after conducting a patient biopsy. Finally, we explore a novel task of inverse drug design which can potentially enable the design of drugs that synergize specifically to target a given patient's "personalized dataset". Our findings can potentially have an important impact on precision cancer medicine, and also raise intriguing questions on non-textual pre-training for LMs.
△ Less
Submitted 24 October, 2023; v1 submitted 19 June, 2023;
originally announced July 2023.
-
Making Pre-trained Language Models both Task-solvers and Self-calibrators
Authors:
Yangyi Chen,
Xingyao Wang,
Heng Ji
Abstract:
Pre-trained language models (PLMs) serve as backbones for various real-world systems. For high-stake applications, it's equally essential to have reasonable confidence estimations in predictions. While the vanilla confidence scores of PLMs can already be effectively utilized, PLMs consistently become overconfident in their wrong predictions, which is not desirable in practice. Previous work shows…
▽ More
Pre-trained language models (PLMs) serve as backbones for various real-world systems. For high-stake applications, it's equally essential to have reasonable confidence estimations in predictions. While the vanilla confidence scores of PLMs can already be effectively utilized, PLMs consistently become overconfident in their wrong predictions, which is not desirable in practice. Previous work shows that introducing an extra calibration task can mitigate this issue. The basic idea involves acquiring additional data to train models in predicting the confidence of their initial predictions. However, it only demonstrates the feasibility of this kind of method, assuming that there are abundant extra available samples for the introduced calibration task. In this work, we consider the practical scenario that we need to effectively utilize training samples to make PLMs both task-solvers and self-calibrators. Three challenges are presented, including limited training samples, data imbalance, and distribution shifts. We first conduct pilot experiments to quantify various decisive factors in the calibration task. Based on the empirical analysis results, we propose a training algorithm LM-TOAST to tackle the challenges. Experimental results show that LM-TOAST can effectively utilize the training data to make PLMs have reasonable confidence estimations while maintaining the original task performance. Further, we consider three downstream applications, namely selective classification, adversarial defense, and model cascading, to show the practical usefulness of LM-TOAST. The code will be made public at \url{https://github.com/Yangyi-Chen/LM-TOAST}.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Density of Brown measure of free circular Brownian motion
Authors:
László Erdős,
Hong Chang Ji
Abstract:
We consider the Brown measure of free circular Brownian motion $\boldsymbol{a}+\sqrt{t}\boldsymbol{x}$, where $\boldsymbol{a}$ is a general non-normal operator and $\boldsymbol{x}$ is a circular element $*$-free from $\boldsymbol{a}$. We prove that, under a mild assumption on $\boldsymbol{a}$, the density of the Brown measure has one of the following two types of behavior around each point on the…
▽ More
We consider the Brown measure of free circular Brownian motion $\boldsymbol{a}+\sqrt{t}\boldsymbol{x}$, where $\boldsymbol{a}$ is a general non-normal operator and $\boldsymbol{x}$ is a circular element $*$-free from $\boldsymbol{a}$. We prove that, under a mild assumption on $\boldsymbol{a}$, the density of the Brown measure has one of the following two types of behavior around each point on the boundary of its support -- either (i) sharp cut, i.e. a jump discontinuity along the boundary, or (ii) quadratic decay at certain critical points on the boundary. Our result is in direct analogy with the previously known phenomenon for the spectral density of free semicircular Brownian motion, whose singularities are either a square-root edge or a cubic cusp. We also provide several examples and counterexamples, one of which shows that our assumption on $\boldsymbol{a}$ is necessary.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems
Authors:
Xuan Zhang,
Limei Wang,
Jacob Helwig,
Youzhi Luo,
Cong Fu,
Yaochen Xie,
Meng Liu,
Yuchao Lin,
Zhao Xu,
Keqiang Yan,
Keir Adams,
Maurice Weiler,
Xiner Li,
Tianfan Fu,
Yucheng Wang,
Haiyang Yu,
YuQing Xie,
Xiang Fu,
Alex Strasser,
Shenglong Xu,
Yi Liu,
Yuanqi Du,
Alexandra Saxton,
Hongyi Ling,
Hannah Lawrence
, et al. (38 additional authors not shown)
Abstract:
Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Sc…
▽ More
Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This work aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science.
△ Less
Submitted 15 November, 2023; v1 submitted 17 July, 2023;
originally announced July 2023.
-
Laboratory Study of Collisionless Magnetic Reconnection
Authors:
H. Ji,
J. Yoo,
W. Fox,
M. Yamada,
M. Argall,
J. Egedal,
Y. -H. Liu,
R. Wilder,
S. Eriksson,
W. Daughton,
K. Bergstedt,
S. Bose,
J. Burch,
R. Torbert,
J. Ng,
L. -J. Chen
Abstract:
A concise review is given on the past two decades' results from laboratory experiments on collisionless magnetic reconnection in direct relation with space measurements, especially by Magnetospheric Multiscale (MMS) mission. Highlights include spatial structures of electromagnetic fields in ion and electron diffusion regions as a function of upstream symmetry and guide field strength; energy conve…
▽ More
A concise review is given on the past two decades' results from laboratory experiments on collisionless magnetic reconnection in direct relation with space measurements, especially by Magnetospheric Multiscale (MMS) mission. Highlights include spatial structures of electromagnetic fields in ion and electron diffusion regions as a function of upstream symmetry and guide field strength; energy conversion and partition from magnetic field to ions and electrons including particle acceleration; electrostatic and electromagnetic kinetic plasma waves with various wavelengths; and plasmoid-mediated multiscale reconnection. Combined with the progress in theoretical, numerical, and observational studies, the physics foundation of fast reconnection in colisionless plasmas has been largely established, at least within the parameter ranges and spatial scales that were studied. Immediate and long-term future opportunities based on multiscale experiments and space missions supported by exascale computation are discussed, including dissipation by kinetic plasma waves, particle heating and acceleration, and multiscale physics across fluid and kinetic scales.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
OG: Equip vision occupancy with instance segmentation and visual grounding
Authors:
Zichao Dong,
Hang Ji,
Weikun Zhang,
Xufeng Huang,
Junbo Chen
Abstract:
Occupancy prediction tasks focus on the inference of both geometry and semantic labels for each voxel, which is an important perception mission. However, it is still a semantic segmentation task without distinguishing various instances. Further, although some existing works, such as Open-Vocabulary Occupancy (OVO), have already solved the problem of open vocabulary detection, visual grounding in o…
▽ More
Occupancy prediction tasks focus on the inference of both geometry and semantic labels for each voxel, which is an important perception mission. However, it is still a semantic segmentation task without distinguishing various instances. Further, although some existing works, such as Open-Vocabulary Occupancy (OVO), have already solved the problem of open vocabulary detection, visual grounding in occupancy has not been solved to the best of our knowledge. To tackle the above two limitations, this paper proposes Occupancy Grounding (OG), a novel method that equips vanilla occupancy instance segmentation ability and could operate visual grounding in a voxel manner with the help of grounded-SAM. Keys to our approach are (1) affinity field prediction for instance clustering and (2) association strategy for aligning 2D instance masks and 3D occupancy instances. Extensive experiments have been conducted whose visualization results and analysis are shown below. Our code will be publicly released soon.
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration
Authors:
Zhenhailong Wang,
Shaoguang Mao,
Wenshan Wu,
Tao Ge,
Furu Wei,
Heng Ji
Abstract:
Human intelligence thrives on cognitive synergy, where collaboration among different minds yield superior outcomes compared to isolated individuals. In this work, we propose Solo Performance Prompting (SPP), which transforms a single LLM into a cognitive synergist by engaging in multi-turn self-collaboration with multiple personas. A cognitive synergist is an intelligent agent that collaboratively…
▽ More
Human intelligence thrives on cognitive synergy, where collaboration among different minds yield superior outcomes compared to isolated individuals. In this work, we propose Solo Performance Prompting (SPP), which transforms a single LLM into a cognitive synergist by engaging in multi-turn self-collaboration with multiple personas. A cognitive synergist is an intelligent agent that collaboratively combines multiple minds' strengths and knowledge to enhance problem-solving in complex tasks. By dynamically identifying and simulating different personas based on task inputs, SPP unleashes the potential of cognitive synergy in LLMs. Our in-depth analysis shows that assigning multiple fine-grained personas in LLMs improves problem-solving abilities compared to using a single or fixed number of personas. We evaluate SPP on three challenging tasks: Trivia Creative Writing, Codenames Collaborative, and Logic Grid Puzzle, encompassing both knowledge-intensive and reasoning-intensive types. Unlike previous works, such as Chain-of-Thought, that solely enhance the reasoning abilities in LLMs, experimental results demonstrate that SPP effectively reduces factual hallucination, and maintains strong reasoning capabilities. Additionally, comparative experiments show that cognitive synergy only emerges in GPT-4 and does not appear in less capable models, such as GPT-3.5-turbo and Llama2-13b-chat, which draws an interesting analogy to human development. Code, data, and prompts can be found at: https://github.com/MikeWangWZHL/Solo-Performance-Prompting.git.
△ Less
Submitted 26 March, 2024; v1 submitted 11 July, 2023;
originally announced July 2023.
-
Open-Domain Hierarchical Event Schema Induction by Incremental Prompting and Verification
Authors:
Sha Li,
Ruining Zhao,
Manling Li,
Heng Ji,
Chris Callison-Burch,
Jiawei Han
Abstract:
Event schemas are a form of world knowledge about the typical progression of events. Recent methods for event schema induction use information extraction systems to construct a large number of event graph instances from documents, and then learn to generalize the schema from such instances. In contrast, we propose to treat event schemas as a form of commonsense knowledge that can be derived from l…
▽ More
Event schemas are a form of world knowledge about the typical progression of events. Recent methods for event schema induction use information extraction systems to construct a large number of event graph instances from documents, and then learn to generalize the schema from such instances. In contrast, we propose to treat event schemas as a form of commonsense knowledge that can be derived from large language models (LLMs). This new paradigm greatly simplifies the schema induction process and allows us to handle both hierarchical relations and temporal relations between events in a straightforward way. Since event schemas have complex graph structures, we design an incremental prompting and verification method to break down the construction of a complex event graph into three stages: event skeleton construction, event expansion, and event-event relation verification. Compared to directly using LLMs to generate a linearized graph, our method can generate large and complex schemas with 7.2% F1 improvement in temporal relations and 31.0% F1 improvement in hierarchical relations. In addition, compared to the previous state-of-the-art closed-domain schema induction model, human assessors were able to cover $\sim$10% more events when translating the schemas into coherent stories and rated our schemas 1.3 points higher (on a 5-point scale) in terms of readability.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
An electro-hydrodynamics modeling of droplet actuation on solid surface by surfactant-mediated electro-dewetting
Authors:
Weiqi Chu,
Hangjie Ji,
Qining Wang,
Chang-** "CJ'' Kim,
Andrea L. Bertozzi
Abstract:
We propose an electro-hydrodynamics model to describe the dynamic evolution of a slender drop containing a dilute ionic surfactant on a naturally wettable surface, with a varying external electric field. This unified model reproduces fundamental microfluidic operations controlled by electrical signals, including dewetting, rewetting, and droplet shifting. In this paper, lubrication theory analysis…
▽ More
We propose an electro-hydrodynamics model to describe the dynamic evolution of a slender drop containing a dilute ionic surfactant on a naturally wettable surface, with a varying external electric field. This unified model reproduces fundamental microfluidic operations controlled by electrical signals, including dewetting, rewetting, and droplet shifting. In this paper, lubrication theory analysis and numerical simulations illustrate how to electrically control the wettability of surface via the charged surfactant. Our numerical results show that electric field promotes dewetting by attracting ionic surfactants onto the transition thin-film region and promotes rewetting by attracting them away from the region.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue Evaluation
Authors:
Liliang Ren,
Mankeerat Sidhu,
Qi Zeng,
Revanth Gangi Reddy,
Heng Ji,
ChengXiang Zhai
Abstract:
Existing reference-free turn-level evaluation metrics for chatbots inadequately capture the interaction between the user and the system. Consequently, they often correlate poorly with human evaluations. To address this issue, we propose a novel model-agnostic approach that leverages Conditional Pointwise Mutual Information (C-PMI) to measure the turn-level interaction between the system and the us…
▽ More
Existing reference-free turn-level evaluation metrics for chatbots inadequately capture the interaction between the user and the system. Consequently, they often correlate poorly with human evaluations. To address this issue, we propose a novel model-agnostic approach that leverages Conditional Pointwise Mutual Information (C-PMI) to measure the turn-level interaction between the system and the user based on a given evaluation dimension. Experimental results on the widely used FED dialogue evaluation dataset demonstrate that our approach significantly improves the correlation with human judgment compared with existing evaluation systems. By replacing the negative log-likelihood-based scorer with our proposed C-PMI scorer, we achieve a relative 62.6% higher Spearman correlation on average for the FED evaluation metric. Our code is publicly available at https://github.com/renll/C-PMI.
△ Less
Submitted 1 September, 2023; v1 submitted 27 June, 2023;
originally announced June 2023.
-
An Empirical Study of Untangling Patterns of Two-Class Dependency Cycles
Authors:
Qiong Feng,
Shuwen Liu,
Huan Ji,
Xiaotian Ma,
Peng Liang
Abstract:
Dependency cycles pose a significant challenge to software quality and maintainability. However, there is limited understanding of how practitioners resolve dependency cycles in real-world scenarios. This paper presents an empirical study investigating the recurring patterns employed by software developers to resolve dependency cycles between two classes in practice. We analyzed the data from 38 o…
▽ More
Dependency cycles pose a significant challenge to software quality and maintainability. However, there is limited understanding of how practitioners resolve dependency cycles in real-world scenarios. This paper presents an empirical study investigating the recurring patterns employed by software developers to resolve dependency cycles between two classes in practice. We analyzed the data from 38 open-source projects across different domains and manually inspected hundreds of cycle untangling cases. Our findings reveal that developers tend to employ five recurring patterns to address dependency cycles. The chosen patterns are not only determined by dependency relations between cyclic classes, but also highly related to their design context, i.e., how cyclic classes depend on or are depended by their neighbor classes. Through this empirical study, we also discovered three common counterintuitive solutions developers usually adopted during cycles' handling. These recurring patterns and common counterintuitive solutions observed in dependency cycles' practice can serve as a taxonomy to improve developers' awareness and also be used as learning materials for students in software engineering and inexperienced developers. Our results also suggest that, in addition to considering the internal structure of dependency cycles, automatic tools need to consider the design context of cycles to provide better support for refactoring dependency cycles.
△ Less
Submitted 17 December, 2023; v1 submitted 18 June, 2023;
originally announced June 2023.
-
Particle acceleration in solar flares with imaging-spectroscopy in soft X-rays
Authors:
Mitsuo Oka,
Amir Caspi,
Bin Chen,
Mark Cheung,
James Drake,
Dale Gary,
Lindsay Glesener,
Fan Guo,
Hantao Ji,
Xiaocan Li,
Takuma Nakamura,
Noriyuki Narukage,
Katharine Reeves,
Pascal Saint-Hilaire,
Taro Sakao,
Chengcai Shen,
Amy Winebarger,
Tom Woods
Abstract:
Particles are accelerated to very high, non-thermal energies during explosive energy-release phenomena in space, solar, and astrophysical plasma environments. In the case of solar flares, it has been established that magnetic reconnection plays an important role for releasing the magnetic energy, but it remains unclear if or how magnetic reconnection can further explain particle acceleration durin…
▽ More
Particles are accelerated to very high, non-thermal energies during explosive energy-release phenomena in space, solar, and astrophysical plasma environments. In the case of solar flares, it has been established that magnetic reconnection plays an important role for releasing the magnetic energy, but it remains unclear if or how magnetic reconnection can further explain particle acceleration during flares. Here we argue that the key issue is the lack of understanding of the precise context of particle acceleration but it can be overcome, in the near future, by performing imaging-spectroscopy in soft X-rays (SXRs). Such observations should be complemented by observations in other wavelengths such as extreme-ultraviolets (EUVs), microwaves, hard X-rays (HXRs), and gamma-rays. Also, numerical simulations will be crucial for further narrowing down the particle acceleration mechanism in the context revealed by the observations. Of all these efforts, imaging-spectroscopy in SXRs, if successfully applied to large limb flares, will be a milestone in our challenge of understanding electron acceleration in solar flares and beyond, i.e. the Plasma Universe.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations
Authors:
Lifan Yuan,
Yangyi Chen,
Ganqu Cui,
Hongcheng Gao,
Fangyuan Zou,
Xingyi Cheng,
Heng Ji,
Zhiyuan Liu,
Maosong Sun
Abstract:
This paper reexamines the research on out-of-distribution (OOD) robustness in the field of NLP. We find that the distribution shift settings in previous studies commonly lack adequate challenges, hindering the accurate evaluation of OOD robustness. To address these issues, we propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts. Then we i…
▽ More
This paper reexamines the research on out-of-distribution (OOD) robustness in the field of NLP. We find that the distribution shift settings in previous studies commonly lack adequate challenges, hindering the accurate evaluation of OOD robustness. To address these issues, we propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts. Then we introduce BOSS, a Benchmark suite for Out-of-distribution robustneSS evaluation covering 5 tasks and 20 datasets. Based on BOSS, we conduct a series of experiments on pre-trained language models for analysis and evaluation of OOD robustness. First, for vanilla fine-tuning, we examine the relationship between in-distribution (ID) and OOD performance. We identify three typical types that unveil the inner learning mechanism, which could potentially facilitate the forecasting of OOD robustness, correlating with the advancements on ID datasets. Then, we evaluate 5 classic methods on BOSS and find that, despite exhibiting some effectiveness in specific cases, they do not offer significant improvement compared to vanilla fine-tuning. Further, we evaluate 5 LLMs with various adaptation paradigms and find that when sufficient ID data is available, fine-tuning domain-specific models outperform LLMs on ID examples significantly. However, in the case of OOD instances, prioritizing LLMs with in-context learning yields better results. We identify that both fine-tuned small models and LLMs face challenges in effectively addressing downstream tasks. The code is public at \url{https://github.com/lifan-yuan/OOD_NLP}.
△ Less
Submitted 26 October, 2023; v1 submitted 7 June, 2023;
originally announced June 2023.
-
OpenPI-C: A Better Benchmark and Stronger Baseline for Open-Vocabulary State Tracking
Authors:
Xueqing Wu,
Sha Li,
Heng Ji
Abstract:
Open-vocabulary state tracking is a more practical version of state tracking that aims to track state changes of entities throughout a process without restricting the state space and entity space. OpenPI is to date the only dataset annotated for open-vocabulary state tracking. However, we identify issues with the dataset quality and evaluation metric. For the dataset, we categorize 3 types of prob…
▽ More
Open-vocabulary state tracking is a more practical version of state tracking that aims to track state changes of entities throughout a process without restricting the state space and entity space. OpenPI is to date the only dataset annotated for open-vocabulary state tracking. However, we identify issues with the dataset quality and evaluation metric. For the dataset, we categorize 3 types of problems on the procedure level, step level and state change level respectively, and build a clean dataset OpenPI-C using multiple rounds of human judgment. For the evaluation metric, we propose a cluster-based metric to fix the original metric's preference for repetition.
Model-wise, we enhance the seq2seq generation baseline by reinstating two key properties for state tracking: temporal dependency and entity awareness. The state of the world after an action is inherently dependent on the previous state. We model this dependency through a dynamic memory bank and allow the model to attend to the memory slots during decoding. On the other hand, the state of the world is naturally a union of the states of involved entities. Since the entities are unknown in the open-vocabulary setting, we propose a two-stage model that refines the state change prediction conditioned on entities predicted from the first stage. Empirical results show the effectiveness of our proposed model especially on the cluster-based metric. The code and data are released at https://github.com/shirley-wu/openpi-c
△ Less
Submitted 20 June, 2023; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs
Authors:
Mingyang Zhou,
Yi R. Fung,
Long Chen,
Christopher Thomas,
Heng Ji,
Shih-Fu Chang
Abstract:
Building cross-model intelligence that can understand charts and communicate the salient information hidden behind them is an appealing challenge in the vision and language(V+L) community. The capability to uncover the underlined table data of chart figures is a critical key to automatic chart understanding. We introduce ChartT5, a V+L model that learns how to interpret table information from char…
▽ More
Building cross-model intelligence that can understand charts and communicate the salient information hidden behind them is an appealing challenge in the vision and language(V+L) community. The capability to uncover the underlined table data of chart figures is a critical key to automatic chart understanding. We introduce ChartT5, a V+L model that learns how to interpret table information from chart images via cross-modal pre-training on plot table pairs. Specifically, we propose two novel pre-training objectives: Masked Header Prediction (MHP) and Masked Value Prediction (MVP) to facilitate the model with different skills to interpret the table information. We have conducted extensive experiments on chart question answering and chart summarization to verify the effectiveness of the proposed pre-training strategies. In particular, on the ChartQA benchmark, our ChartT5 outperforms the state-of-the-art non-pretraining methods by over 8% performance gains.
△ Less
Submitted 29 May, 2023;
originally announced May 2023.
-
Information Association for Language Model Updating by Mitigating LM-Logical Discrepancy
Authors:
Pengfei Yu,
Heng Ji
Abstract:
Large Language Models~(LLMs) struggle with providing current information due to the outdated pre-training data. Existing methods for updating LLMs, such as knowledge editing and continual fine-tuning, have significant drawbacks in generalizability of new information and the requirements on structured updating corpus. We identify the core challenge behind these drawbacks: the LM-logical discrepancy…
▽ More
Large Language Models~(LLMs) struggle with providing current information due to the outdated pre-training data. Existing methods for updating LLMs, such as knowledge editing and continual fine-tuning, have significant drawbacks in generalizability of new information and the requirements on structured updating corpus. We identify the core challenge behind these drawbacks: the LM-logical discrepancy featuring the difference between language modeling probabilities and logical probabilities. To evaluate and address the core challenge, we propose a new task formulation of the information updating task that only requires the provision of an unstructured updating corpus and evaluates the performance of information updating on the generalizability to question-answer pairs pertaining to the updating information. We further propose a novel and effective pipeline approach for the task, highlighting a self-prompting-based question-answer generation process and a associative distillation methods to bridge the LM-logical discrepancy. We develop two datasets for evaluation, one sourced from news articles published in March and April 2023, and the other from the Natural Questions benchmark. Experimental results demonstrate the superiority of our approach, significantly increasing the factual consistency score (on a scale from 0 to 1) by up to 0.16. Furthermore, our method effectively mitigates forgetting utilizing a compact replay buffer with only 2.3% of the training tokens.
△ Less
Submitted 9 February, 2024; v1 submitted 29 May, 2023;
originally announced May 2023.
-
From Adversarial Arms Race to Model-centric Evaluation: Motivating a Unified Automatic Robustness Evaluation Framework
Authors:
Yangyi Chen,
Hongcheng Gao,
Ganqu Cui,
Lifan Yuan,
Dehan Kong,
Hanlu Wu,
Ning Shi,
Bo Yuan,
Longtao Huang,
Hui Xue,
Zhiyuan Liu,
Maosong Sun,
Heng Ji
Abstract:
Textual adversarial attacks can discover models' weaknesses by adding semantic-preserved but misleading perturbations to the inputs. The long-lasting adversarial attack-and-defense arms race in Natural Language Processing (NLP) is algorithm-centric, providing valuable techniques for automatic robustness evaluation. However, the existing practice of robustness evaluation may exhibit issues of incom…
▽ More
Textual adversarial attacks can discover models' weaknesses by adding semantic-preserved but misleading perturbations to the inputs. The long-lasting adversarial attack-and-defense arms race in Natural Language Processing (NLP) is algorithm-centric, providing valuable techniques for automatic robustness evaluation. However, the existing practice of robustness evaluation may exhibit issues of incomprehensive evaluation, impractical evaluation protocol, and invalid adversarial samples. In this paper, we aim to set up a unified automatic robustness evaluation framework, shifting towards model-centric evaluation to further exploit the advantages of adversarial attacks. To address the above challenges, we first determine robustness evaluation dimensions based on model capabilities and specify the reasonable algorithm to generate adversarial samples for each dimension. Then we establish the evaluation protocol, including evaluation settings and metrics, under realistic demands. Finally, we use the perturbation degree of adversarial samples to control the sample validity. We implement a toolkit RobTest that realizes our automatic robustness evaluation framework. In our experiments, we conduct a robustness evaluation of RoBERTa models to demonstrate the effectiveness of our evaluation framework, and further show the rationality of each component in the framework. The code will be made public at \url{https://github.com/thunlp/RobTest}.
△ Less
Submitted 29 May, 2023;
originally announced May 2023.
-
Non-Sequential Graph Script Induction via Multimedia Grounding
Authors:
Yu Zhou,
Sha Li,
Manling Li,
Xudong Lin,
Shih-Fu Chang,
Mohit Bansal,
Heng Ji
Abstract:
Online resources such as WikiHow compile a wide range of scripts for performing everyday tasks, which can assist models in learning to reason about procedures. However, the scripts are always presented in a linear manner, which does not reflect the flexibility displayed by people executing tasks in real life. For example, in the CrossTask Dataset, 64.5% of consecutive step pairs are also observed…
▽ More
Online resources such as WikiHow compile a wide range of scripts for performing everyday tasks, which can assist models in learning to reason about procedures. However, the scripts are always presented in a linear manner, which does not reflect the flexibility displayed by people executing tasks in real life. For example, in the CrossTask Dataset, 64.5% of consecutive step pairs are also observed in the reverse order, suggesting their ordering is not fixed. In addition, each step has an average of 2.56 frequent next steps, demonstrating "branching". In this paper, we propose the new challenging task of non-sequential graph script induction, aiming to capture optional and interchangeable steps in procedural planning. To automate the induction of such graph scripts for given tasks, we propose to take advantage of loosely aligned videos of people performing the tasks. In particular, we design a multimodal framework to ground procedural videos to WikiHow textual steps and thus transform each video into an observed step path on the latent ground truth graph script. This key transformation enables us to train a script knowledge model capable of both generating explicit graph scripts for learnt tasks and predicting future steps given a partial step sequence. Our best model outperforms the strongest pure text/vision baselines by 17.52% absolute gains on F1@3 for next step prediction and 13.8% absolute gains on Acc@1 for partial sequence completion. Human evaluation shows our model outperforming the WikiHow linear baseline by 48.76% absolute gains in capturing sequential and non-sequential step relationships.
△ Less
Submitted 27 May, 2023;
originally announced May 2023.
-
Zero- and Few-Shot Event Detection via Prompt-Based Meta Learning
Authors:
Zhenrui Yue,
Huimin Zeng,
Mengfei Lan,
Heng Ji,
Dong Wang
Abstract:
With emerging online topics as a source for numerous new events, detecting unseen / rare event types presents an elusive challenge for existing event detection methods, where only limited data access is provided for training. To address the data scarcity problem in event detection, we propose MetaEvent, a meta learning-based framework for zero- and few-shot event detection. Specifically, we sample…
▽ More
With emerging online topics as a source for numerous new events, detecting unseen / rare event types presents an elusive challenge for existing event detection methods, where only limited data access is provided for training. To address the data scarcity problem in event detection, we propose MetaEvent, a meta learning-based framework for zero- and few-shot event detection. Specifically, we sample training tasks from existing event types and perform meta training to search for optimal parameters that quickly adapt to unseen tasks. In our framework, we propose to use the cloze-based prompt and a trigger-aware soft verbalizer to efficiently project output to unseen event types. Moreover, we design a contrastive meta objective based on maximum mean discrepancy (MMD) to learn class-separating features. As such, the proposed MetaEvent can perform zero-shot event detection by map** features to event types without any prior knowledge. In our experiments, we demonstrate the effectiveness of MetaEvent in both zero-shot and few-shot scenarios, where the proposed method achieves state-of-the-art performance in extensive experiments on benchmark datasets FewEvent and MAVEN.
△ Less
Submitted 27 May, 2023;
originally announced May 2023.
-
Measuring the Effect of Influential Messages on Varying Personas
Authors:
Chenkai Sun,
**ning Li,
Hou Pong Chan,
ChengXiang Zhai,
Heng Ji
Abstract:
Predicting how a user responds to news events enables important applications such as allowing intelligent agents or content producers to estimate the effect on different communities and revise unreleased messages to prevent unexpected bad outcomes such as social conflict and moral injury. We present a new task, Response Forecasting on Personas for News Media, to estimate the response a persona (ch…
▽ More
Predicting how a user responds to news events enables important applications such as allowing intelligent agents or content producers to estimate the effect on different communities and revise unreleased messages to prevent unexpected bad outcomes such as social conflict and moral injury. We present a new task, Response Forecasting on Personas for News Media, to estimate the response a persona (characterizing an individual or a group) might have upon seeing a news message. Compared to the previous efforts which only predict generic comments to news, the proposed task not only introduces personalization in the modeling but also predicts the sentiment polarity and intensity of each response. This enables more accurate and comprehensive inference on the mental state of the persona. Meanwhile, the generated sentiment dimensions make the evaluation and application more reliable. We create the first benchmark dataset, which consists of 13,357 responses to 3,847 news headlines from Twitter. We further evaluate the SOTA neural language models with our dataset. The empirical results suggest that the included persona attributes are helpful for the performance of all response dimensions. Our analysis shows that the best-performing models are capable of predicting responses that are consistent with the personas, and as a byproduct, the task formulation also enables many interesting applications in the analysis of social network groups and their opinions, such as the discovery of extreme opinion groups.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
OVO: Open-Vocabulary Occupancy
Authors:
Zhiyu Tan,
Zichao Dong,
Cheng Zhang,
Weikun Zhang,
Hang Ji,
Hao Li
Abstract:
Semantic occupancy prediction aims to infer dense geometry and semantics of surroundings for an autonomous agent to operate safely in the 3D environment. Existing occupancy prediction methods are almost entirely trained on human-annotated volumetric data. Although of high quality, the generation of such 3D annotations is laborious and costly, restricting them to a few specific object categories in…
▽ More
Semantic occupancy prediction aims to infer dense geometry and semantics of surroundings for an autonomous agent to operate safely in the 3D environment. Existing occupancy prediction methods are almost entirely trained on human-annotated volumetric data. Although of high quality, the generation of such 3D annotations is laborious and costly, restricting them to a few specific object categories in the training dataset. To address this limitation, this paper proposes Open Vocabulary Occupancy (OVO), a novel approach that allows semantic occupancy prediction of arbitrary classes but without the need for 3D annotations during training. Keys to our approach are (1) knowledge distillation from a pre-trained 2D open-vocabulary segmentation model to the 3D occupancy network, and (2) pixel-voxel filtering for high-quality training data generation. The resulting framework is simple, compact, and compatible with most state-of-the-art semantic occupancy prediction models. On NYUv2 and SemanticKITTI datasets, OVO achieves competitive performance compared to supervised semantic occupancy prediction approaches. Furthermore, we conduct extensive analyses and ablation studies to offer insights into the design of the proposed framework. Our code is publicly available at https://github.com/dzcgaara/OVO.
△ Less
Submitted 14 June, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Scientific Opinion Summarization: Paper Meta-review Generation Dataset, Methods, and Evaluation
Authors:
Qi Zeng,
Mankeerat Sidhu,
Ansel Blume,
Hou Pong Chan,
Lu Wang,
Heng Ji
Abstract:
Opinions in scientific research papers can be divergent, leading to controversies among reviewers. However, most existing datasets for opinion summarization are centered around product reviews and assume that the analyzed opinions are non-controversial, failing to account for the variability seen in other contexts such as academic papers, political debates, or social media discussions. To address…
▽ More
Opinions in scientific research papers can be divergent, leading to controversies among reviewers. However, most existing datasets for opinion summarization are centered around product reviews and assume that the analyzed opinions are non-controversial, failing to account for the variability seen in other contexts such as academic papers, political debates, or social media discussions. To address this gap, we propose the task of scientific opinion summarization, where research paper reviews are synthesized into meta-reviews. To facilitate this task, we introduce the ORSUM dataset covering 15,062 paper meta-reviews and 57,536 paper reviews from 47 conferences. Furthermore, we propose the Checklist-guided Iterative Introspection approach, which breaks down scientific opinion summarization into several stages, iteratively refining the summary under the guidance of questions from a checklist. Our experiments show that (1) human-written summaries do not always satisfy all necessary criteria such as depth of discussion, and identifying consensus and controversy for the specific domain, and (2) the combination of task decomposition and iterative self-refinement shows strong potential for enhancing the opinions and can be applied to other complex text generation using black-box LLMs.
△ Less
Submitted 15 June, 2024; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Interpretable Automatic Fine-grained Inconsistency Detection in Text Summarization
Authors:
Hou Pong Chan,
Qi Zeng,
Heng Ji
Abstract:
Existing factual consistency evaluation approaches for text summarization provide binary predictions and limited insights into the weakness of summarization systems. Therefore, we propose the task of fine-grained inconsistency detection, the goal of which is to predict the fine-grained types of factual errors in a summary. Motivated by how humans inspect factual inconsistency in summaries, we prop…
▽ More
Existing factual consistency evaluation approaches for text summarization provide binary predictions and limited insights into the weakness of summarization systems. Therefore, we propose the task of fine-grained inconsistency detection, the goal of which is to predict the fine-grained types of factual errors in a summary. Motivated by how humans inspect factual inconsistency in summaries, we propose an interpretable fine-grained inconsistency detection model, FineGrainFact, which explicitly represents the facts in the documents and summaries with semantic frames extracted by semantic role labeling, and highlights the related semantic frames to predict inconsistency. The highlighted semantic frames help verify predicted error types and correct inconsistent summaries. Experiment results demonstrate that our model outperforms strong baselines and provides evidence to support or refute the summary.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models
Authors:
Cheng Qian,
Chi Han,
Yi R. Fung,
Yujia Qin,
Zhiyuan Liu,
Heng Ji
Abstract:
Large Language Models (LLMs) have made significant progress in utilizing tools, but their ability is limited by API availability and the instability of implicit reasoning, particularly when both planning and execution are involved. To overcome these limitations, we propose CREATOR, a novel framework that enables LLMs to create their own tools using documentation and code realization. CREATOR disen…
▽ More
Large Language Models (LLMs) have made significant progress in utilizing tools, but their ability is limited by API availability and the instability of implicit reasoning, particularly when both planning and execution are involved. To overcome these limitations, we propose CREATOR, a novel framework that enables LLMs to create their own tools using documentation and code realization. CREATOR disentangles abstract tool creation and concrete decision execution, resulting in improved performance. We evaluate CREATOR on MATH and TabMWP benchmarks, respectively consisting of challenging math competition problems and diverse tabular contents. Remarkably, CREATOR outperforms existing chain-of-thought, program-of-thought, and tool-using baselines. Additionally, we introduce the Creation Challenge dataset, featuring 2K diverse questions, to emphasize the necessity and benefits of LLMs' tool creation ability. Further research demonstrates that leveraging LLMs as tool creators facilitates knowledge transfer, and LLMs exhibit varying levels of tool creation abilities, enabling them to adapt to diverse situations. The tool creation ability revolutionizes the LLM's problem-solving paradigm, driving us closer to the next frontier of artificial intelligence. All the codes and data are released.
△ Less
Submitted 21 June, 2024; v1 submitted 23 May, 2023;
originally announced May 2023.
-
SciMON: Scientific Inspiration Machines Optimized for Novelty
Authors:
Qingyun Wang,
Doug Downey,
Heng Ji,
Tom Hope
Abstract:
We explore and enhance the ability of neural language models to generate novel scientific directions grounded in literature. Work on literature-based hypothesis generation has traditionally focused on binary link prediction--severely limiting the expressivity of hypotheses. This line of work also does not focus on optimizing novelty. We take a dramatic departure with a novel setting in which model…
▽ More
We explore and enhance the ability of neural language models to generate novel scientific directions grounded in literature. Work on literature-based hypothesis generation has traditionally focused on binary link prediction--severely limiting the expressivity of hypotheses. This line of work also does not focus on optimizing novelty. We take a dramatic departure with a novel setting in which models use as input background contexts (e.g., problems, experimental settings, goals), and output natural language ideas grounded in literature. We present SciMON, a modeling framework that uses retrieval of "inspirations" from past scientific papers, and explicitly optimizes for novelty by iteratively comparing to prior papers and updating idea suggestions until sufficient novelty is achieved. Comprehensive evaluations reveal that GPT-4 tends to generate ideas with overall low technical depth and novelty, while our methods partially mitigate this issue. Our work represents a first step toward evaluating and develo** language models that generate new ideas derived from the scientific literature
△ Less
Submitted 3 June, 2024; v1 submitted 23 May, 2023;
originally announced May 2023.
-
ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media
Authors:
Kung-Hsiang Huang,
Hou Pong Chan,
Kathleen McKeown,
Heng Ji
Abstract:
Considerable advancements have been made to tackle the misrepresentation of information derived from reference articles in the domains of fact-checking and faithful summarization. However, an unaddressed aspect remains - the identification of social media posts that manipulate information within associated news articles. This task presents a significant challenge, primarily due to the prevalence o…
▽ More
Considerable advancements have been made to tackle the misrepresentation of information derived from reference articles in the domains of fact-checking and faithful summarization. However, an unaddressed aspect remains - the identification of social media posts that manipulate information within associated news articles. This task presents a significant challenge, primarily due to the prevalence of personal opinions in such posts. We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information. To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles. Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance. Additionally, we have developed a simple yet effective basic model that outperforms LLMs significantly on the ManiTweet dataset. Finally, we have conducted an exploratory analysis of human-written tweets, unveiling intriguing connections between manipulation and the domain and factuality of news articles, as well as revealing that manipulated sentences are more likely to encapsulate the main story or consequences of a news outlet.
△ Less
Submitted 12 June, 2024; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Word Embeddings Are Steers for Language Models
Authors:
Chi Han,
Jialiang Xu,
Manling Li,
Yi Fung,
Chenkai Sun,
Nan Jiang,
Tarek Abdelzaher,
Heng Ji
Abstract:
Language models (LMs) automatically learn word embeddings during pre-training on language corpora. Although word embeddings are usually interpreted as feature vectors for individual words, their roles in language model generation remain underexplored. In this work, we theoretically and empirically revisit output word embeddings and find that their linear transformations are equivalent to steering…
▽ More
Language models (LMs) automatically learn word embeddings during pre-training on language corpora. Although word embeddings are usually interpreted as feature vectors for individual words, their roles in language model generation remain underexplored. In this work, we theoretically and empirically revisit output word embeddings and find that their linear transformations are equivalent to steering language model generation styles. We name such steers LM-Steers and find them existing in LMs of all sizes. It requires learning parameters equal to 0.2% of the original LMs' size for steering each style. On tasks such as language model detoxification and sentiment control, LM-Steers can achieve comparable or superior performance compared with state-of-the-art controlled generation methods while maintaining a better balance with generation quality. The learned LM-Steer serves as a lens in text styles: it reveals that word embeddings are interpretable when associated with language model generations and can highlight text spans that most indicate the style differences. An LM-Steer is transferrable between different language models by an explicit form calculation. One can also continuously steer LMs simply by scaling the LM-Steer or compose multiple LM-Steers by adding their transformations. Our codes are publicly available at \url{https://github.com/Glaciohound/LM-Steer}.
△ Less
Submitted 6 June, 2024; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Explaining Emergent In-Context Learning as Kernel Regression
Authors:
Chi Han,
Ziqi Wang,
Han Zhao,
Heng Ji
Abstract:
Large language models (LLMs) have initiated a paradigm shift in transfer learning. In contrast to the classic pretraining-then-finetuning procedure, in order to use LLMs for downstream prediction tasks, one only needs to provide a few demonstrations, known as in-context examples, without adding more or updating existing model parameters. This in-context learning (ICL) capability of LLMs is intrigu…
▽ More
Large language models (LLMs) have initiated a paradigm shift in transfer learning. In contrast to the classic pretraining-then-finetuning procedure, in order to use LLMs for downstream prediction tasks, one only needs to provide a few demonstrations, known as in-context examples, without adding more or updating existing model parameters. This in-context learning (ICL) capability of LLMs is intriguing, and it is not yet fully understood how pretrained LLMs acquire such capabilities. In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training on a general language corpus by proposing one hypothesis that LLMs can simulate kernel regression with internal representations when faced with in-context examples. More concretely, we first prove that Bayesian inference on in-context prompts can be asymptotically understood as kernel regression $\hat y = \sum_i y_i K(x, x_i)/\sum_i K(x, x_i)$ as the number of in-context demonstrations grows. Then, we empirically investigate the in-context behaviors of language models. We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression. Finally, our theory provides insights into multiple phenomena observed in the ICL field: why retrieving demonstrative samples similar to test samples can help, why ICL performance is sensitive to the output formats, and why ICL accuracy benefits from selecting in-distribution and representative samples.
△ Less
Submitted 5 October, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Logical Entity Representation in Knowledge-Graphs for Differentiable Rule Learning
Authors:
Chi Han,
Qizheng He,
Charles Yu,
Xinya Du,
Hanghang Tong,
Heng Ji
Abstract:
Probabilistic logical rule learning has shown great strength in logical rule mining and knowledge graph completion. It learns logical rules to predict missing edges by reasoning on existing edges in the knowledge graph. However, previous efforts have largely been limited to only modeling chain-like Horn clauses such as $R_1(x,z)\land R_2(z,y)\Rightarrow H(x,y)$. This formulation overlooks addition…
▽ More
Probabilistic logical rule learning has shown great strength in logical rule mining and knowledge graph completion. It learns logical rules to predict missing edges by reasoning on existing edges in the knowledge graph. However, previous efforts have largely been limited to only modeling chain-like Horn clauses such as $R_1(x,z)\land R_2(z,y)\Rightarrow H(x,y)$. This formulation overlooks additional contextual information from neighboring sub-graphs of entity variables $x$, $y$ and $z$. Intuitively, there is a large gap here, as local sub-graphs have been found to provide important information for knowledge graph completion. Inspired by these observations, we propose Logical Entity RePresentation (LERP) to encode contextual information of entities in the knowledge graph. A LERP is designed as a vector of probabilistic logical functions on the entity's neighboring sub-graph. It is an interpretable representation while allowing for differentiable optimization. We can then incorporate LERP into probabilistic logical rule learning to learn more expressive rules. Empirical results demonstrate that with LERP, our model outperforms other rule learning methods in knowledge graph completion and is comparable or even superior to state-of-the-art black-box methods. Moreover, we find that our model can discover a more expressive family of logical rules. LERP can also be further combined with embedding learning methods like TransE to make it more interpretable.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Understanding the Effect of Data Augmentation on Knowledge Distillation
Authors:
Ziqi Wang,
Chi Han,
Wenxuan Bao,
Heng Ji
Abstract:
Knowledge distillation (KD) requires sufficient data to transfer knowledge from large-scale teacher models to small-scale student models. Therefore, data augmentation has been widely used to mitigate the shortage of data under specific scenarios. Classic data augmentation techniques, such as synonym replacement and k-nearest-neighbors, are initially designed for fine-tuning. To avoid severe semant…
▽ More
Knowledge distillation (KD) requires sufficient data to transfer knowledge from large-scale teacher models to small-scale student models. Therefore, data augmentation has been widely used to mitigate the shortage of data under specific scenarios. Classic data augmentation techniques, such as synonym replacement and k-nearest-neighbors, are initially designed for fine-tuning. To avoid severe semantic shifts and preserve task-specific labels, those methods prefer to change only a small proportion of tokens (e.g., changing 10% tokens is generally the best option for fine-tuning). However, such data augmentation methods are sub-optimal for knowledge distillation since the teacher model could provide label distributions and is more tolerant to semantic shifts. We first observe that KD prefers as much data as possible, which is different from fine-tuning that too much data will not gain more performance. Since changing more tokens leads to more semantic shifts, we use the proportion of changed tokens to reflect semantic shift degrees. Then we find that KD prefers augmented data with a larger semantic shift degree (e.g., changing 30% tokens is generally the best option for KD) than fine-tuning (changing 10% tokens). Besides, our findings show that smaller datasets prefer larger degrees until the out-of-distribution problem occurs (e.g., datasets with less than 10k inputs may prefer the 50% degree, and datasets with more than 100k inputs may prefer the 10% degree). Our work sheds light on the preference difference in data augmentation between fine-tuning and knowledge distillation and encourages the community to explore KD-specific data augmentation methods.
△ Less
Submitted 21 May, 2023;
originally announced May 2023.
-
ReFIT: Relevance Feedback from a Reranker during Inference
Authors:
Revanth Gangi Reddy,
Pradeep Dasigi,
Md Arafat Sultan,
Arman Cohan,
Avirup Sil,
Heng Ji,
Hannaneh Hajishirzi
Abstract:
Retrieve-and-rerank is a prevalent framework in neural information retrieval, wherein a bi-encoder network initially retrieves a pre-defined number of candidates (e.g., K=100), which are then reranked by a more powerful cross-encoder model. While the reranker often yields improved candidate scores compared to the retriever, its scope is confined to only the top K retrieved candidates. As a result,…
▽ More
Retrieve-and-rerank is a prevalent framework in neural information retrieval, wherein a bi-encoder network initially retrieves a pre-defined number of candidates (e.g., K=100), which are then reranked by a more powerful cross-encoder model. While the reranker often yields improved candidate scores compared to the retriever, its scope is confined to only the top K retrieved candidates. As a result, the reranker cannot improve retrieval performance in terms of Recall@K. In this work, we propose to leverage the reranker to improve recall by making it provide relevance feedback to the retriever at inference time. Specifically, given a test instance during inference, we distill the reranker's predictions for that instance into the retriever's query representation using a lightweight update mechanism. The aim of the distillation loss is to align the retriever's candidate scores more closely with those produced by the reranker. The algorithm then proceeds by executing a second retrieval step using the updated query vector. We empirically demonstrate that this method, applicable to various retrieve-and-rerank frameworks, substantially enhances retrieval recall across multiple domains, languages, and modalities.
△ Less
Submitted 28 May, 2024; v1 submitted 19 May, 2023;
originally announced May 2023.
-
RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought
Authors:
Tianci Xue,
Ziqi Wang,
Zhenhailong Wang,
Chi Han,
Pengfei Yu,
Heng Ji
Abstract:
Large language Models (LLMs) have achieved promising performance on arithmetic reasoning tasks by incorporating step-by-step chain-of-thought (CoT) prompting. However, LLMs face challenges in maintaining factual consistency during reasoning, exhibiting tendencies to condition overlooking, question misinterpretation, and condition hallucination over given problems. Existing methods use coarse-grain…
▽ More
Large language Models (LLMs) have achieved promising performance on arithmetic reasoning tasks by incorporating step-by-step chain-of-thought (CoT) prompting. However, LLMs face challenges in maintaining factual consistency during reasoning, exhibiting tendencies to condition overlooking, question misinterpretation, and condition hallucination over given problems. Existing methods use coarse-grained feedback (e.g., whether the answer is correct) to improve factual consistency. In this work, we propose RCoT (Reversing Chain-of-Thought), a novel method to improve LLMs' reasoning abilities by automatically detecting and rectifying factual inconsistency in LLMs, generated solutions. To detect factual inconsistency, RCoT first asks LLMs to reconstruct the problem based on generated solutions. Then fine-grained comparisons between the original problem and the reconstructed problem expose the factual inconsistency in the original solutions. To rectify the solution, RCoT formulates detected factual inconsistency into fine-grained feedback to guide LLMs in revising solutions. Experimental results demonstrate improvements of RCoT over standard CoT, Self-Consistency and Self-Refine across seven arithmetic datasets. Moreover, we find that manually written fine-grained feedback can dramatically improve LLMs' reasoning abilities (e.g., ChatGPT reaches 94.6% accuracy on GSM8K), encouraging the community to further explore the fine-grained feedback generation methods.
△ Less
Submitted 1 October, 2023; v1 submitted 19 May, 2023;
originally announced May 2023.
-
Paxion: Patching Action Knowledge in Video-Language Foundation Models
Authors:
Zhenhailong Wang,
Ansel Blume,
Sha Li,
Genglin Liu,
Jaemin Cho,
Zineng Tang,
Mohit Bansal,
Heng Ji
Abstract:
Action knowledge involves the understanding of textual, visual, and temporal aspects of actions. We introduce the Action Dynamics Benchmark (ActionBench) containing two carefully designed probing tasks: Action Antonym and Video Reversal, which targets multimodal alignment capabilities and temporal understanding skills of the model, respectively. Despite recent video-language models' (VidLM) impres…
▽ More
Action knowledge involves the understanding of textual, visual, and temporal aspects of actions. We introduce the Action Dynamics Benchmark (ActionBench) containing two carefully designed probing tasks: Action Antonym and Video Reversal, which targets multimodal alignment capabilities and temporal understanding skills of the model, respectively. Despite recent video-language models' (VidLM) impressive performance on various benchmark tasks, our diagnostic tasks reveal their surprising deficiency (near-random performance) in action knowledge, suggesting that current models rely on object recognition abilities as a shortcut for action understanding. To remedy this, we propose a novel framework, Paxion, along with a new Discriminative Video Dynamics Modeling (DVDM) objective. The Paxion framework utilizes a Knowledge Patcher network to encode new action knowledge and a Knowledge Fuser component to integrate the Patcher into frozen VidLMs without compromising their existing capabilities. Due to limitations of the widely-used Video-Text Contrastive (VTC) loss for learning action knowledge, we introduce the DVDM objective to train the Knowledge Patcher. DVDM forces the model to encode the correlation between the action text and the correct ordering of video frames. Our extensive analyses show that Paxion and DVDM together effectively fill the gap in action knowledge understanding (~50% to 80%), while maintaining or improving performance on a wide spectrum of both object- and action-centric downstream tasks. The code and data will be made publicly available for research purposes at https://github.com/MikeWangWZHL/Paxion.git.
△ Less
Submitted 21 October, 2023; v1 submitted 17 May, 2023;
originally announced May 2023.
-
LeTI: Learning to Generate from Textual Interactions
Authors:
Xingyao Wang,
Hao Peng,
Reyhaneh Jabbarvand,
Heng Ji
Abstract:
Fine-tuning pre-trained language models (LMs) is essential for enhancing their capabilities. Existing techniques commonly fine-tune on input-output pairs (e.g., instruction tuning) or with numerical rewards that gauge the output quality (e.g., RLHF). We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and ex…
▽ More
Fine-tuning pre-trained language models (LMs) is essential for enhancing their capabilities. Existing techniques commonly fine-tune on input-output pairs (e.g., instruction tuning) or with numerical rewards that gauge the output quality (e.g., RLHF). We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and explain errors in their outputs through textual feedback. Our focus is the code generation task, where the model produces code based on natural language instructions. This setting invites a natural and scalable way to acquire textual feedback: the error messages and stack traces from code execution using a Python interpreter. LETI iteratively fine-tunes the model, using the LM objective, on a concatenation of natural language instructions, LM-generated programs, and textual feedback. Prepended to this fine-tuning text, a binary reward token is used to differentiate correct and buggy solutions. LETI requires no ground-truth outputs for training and even outperforms a fine-tuned baseline that does. LETI not only improves the performance of LMs on a code generation dataset MBPP, but also generalizes to other datasets. Trained on MBPP, it achieves comparable or better performance than the base LMs on unseen problems in HumanEval. Furthermore, compared to binary feedback, we observe that textual feedback leads to improved generation quality and sample efficiency, achieving the same performance with fewer than half of the gradient steps. LETI is equally applicable in natural language tasks when they can be formulated as code generation, which we empirically verified on event argument extraction.
△ Less
Submitted 19 March, 2024; v1 submitted 17 May, 2023;
originally announced May 2023.
-
Zero-shot Faithful Factual Error Correction
Authors:
Kung-Hsiang Huang,
Hou Pong Chan,
Heng Ji
Abstract:
Faithfully correcting factual errors is critical for maintaining the integrity of textual knowledge bases and preventing hallucinations in sequence-to-sequence models. Drawing on humans' ability to identify and correct factual errors, we present a zero-shot framework that formulates questions about input claims, looks for correct answers in the given evidence, and assesses the faithfulness of each…
▽ More
Faithfully correcting factual errors is critical for maintaining the integrity of textual knowledge bases and preventing hallucinations in sequence-to-sequence models. Drawing on humans' ability to identify and correct factual errors, we present a zero-shot framework that formulates questions about input claims, looks for correct answers in the given evidence, and assesses the faithfulness of each correction based on its consistency with the evidence. Our zero-shot framework outperforms fully-supervised approaches, as demonstrated by experiments on the FEVER and SciFact datasets, where our outputs are shown to be more faithful. More importantly, the decomposability nature of our framework inherently provides interpretability. Additionally, to reveal the most suitable metrics for evaluating factual error corrections, we analyze the correlation between commonly used metrics with human judgments in terms of three different dimensions regarding intelligibility and faithfulness.
△ Less
Submitted 27 May, 2023; v1 submitted 13 May, 2023;
originally announced May 2023.
-
Combo of Thinking and Observing for Outside-Knowledge VQA
Authors:
Qingyi Si,
Yuchen Mo,
Zheng Lin,
Huishan Ji,
Wei** Wang
Abstract:
Outside-knowledge visual question answering is a challenging task that requires both the acquisition and the use of open-ended real-world knowledge. Some existing solutions draw external knowledge into the cross-modality space which overlooks the much vaster textual knowledge in natural-language space, while others transform the image into a text that further fuses with the textual knowledge into…
▽ More
Outside-knowledge visual question answering is a challenging task that requires both the acquisition and the use of open-ended real-world knowledge. Some existing solutions draw external knowledge into the cross-modality space which overlooks the much vaster textual knowledge in natural-language space, while others transform the image into a text that further fuses with the textual knowledge into the natural-language space and completely abandons the use of visual features. In this paper, we are inspired to constrain the cross-modality space into the same space of natural-language space which makes the visual features preserved directly, and the model still benefits from the vast knowledge in natural-language space. To this end, we propose a novel framework consisting of a multimodal encoder, a textual encoder and an answer decoder. Such structure allows us to introduce more types of knowledge including explicit and implicit multimodal and textual knowledge. Extensive experiments validate the superiority of the proposed method which outperforms the state-of-the-art by 6.17% accuracy. We also conduct comprehensive ablations of each component, and systematically study the roles of varying types of knowledge. Codes and knowledge data can be found at https://github.com/PhoebusSi/Thinking-while-Observing.
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
Decay of geometry for a class of cubic polynomials
Authors:
Haoyang Ji,
Wenxiu Ma
Abstract:
In this paper we study a class of bimodal cubic polynomials for which its critical points have the same $ω$-limit set which is an invariant Cantor set. These maps have generalized Fibonacci combinatorics in terms of generalized renormalization on the twin principal nest. It is proved that such maps possess `decay of geometry' in the sense that the scaling factor of the twin principal nest decrease…
▽ More
In this paper we study a class of bimodal cubic polynomials for which its critical points have the same $ω$-limit set which is an invariant Cantor set. These maps have generalized Fibonacci combinatorics in terms of generalized renormalization on the twin principal nest. It is proved that such maps possess `decay of geometry' in the sense that the scaling factor of the twin principal nest decreases at least exponentially fast. As an application, we prove that they have no Cantor attractor.
△ Less
Submitted 5 July, 2023; v1 submitted 20 April, 2023;
originally announced April 2023.
-
Tool Learning with Foundation Models
Authors:
Yujia Qin,
Shengding Hu,
Yankai Lin,
Weize Chen,
Ning Ding,
Ganqu Cui,
Zheni Zeng,
Yufei Huang,
Chaojun Xiao,
Chi Han,
Yi Ren Fung,
Yusheng Su,
Huadong Wang,
Cheng Qian,
Runchu Tian,
Kunlun Zhu,
Shihao Liang,
Xingyu Shen,
Bokai Xu,
Zhen Zhang,
Yining Ye,
Bowen Li,
Ziwei Tang,
**g Yi,
Yuzhang Zhu
, et al. (16 additional authors not shown)
Abstract:
Humans possess an extraordinary ability to create and utilize tools, allowing them to overcome physical limitations and explore new frontiers. With the advent of foundation models, AI systems have the potential to be equally adept in tool use as humans. This paradigm, i.e., tool learning with foundation models, combines the strengths of specialized tools and foundation models to achieve enhanced a…
▽ More
Humans possess an extraordinary ability to create and utilize tools, allowing them to overcome physical limitations and explore new frontiers. With the advent of foundation models, AI systems have the potential to be equally adept in tool use as humans. This paradigm, i.e., tool learning with foundation models, combines the strengths of specialized tools and foundation models to achieve enhanced accuracy, efficiency, and automation in problem-solving. Despite its immense potential, there is still a lack of a comprehensive understanding of key challenges, opportunities, and future endeavors in this field. To this end, we present a systematic investigation of tool learning in this paper. We first introduce the background of tool learning, including its cognitive origins, the paradigm shift of foundation models, and the complementary roles of tools and models. Then we recapitulate existing tool learning research into tool-augmented and tool-oriented learning. We formulate a general tool learning framework: starting from understanding the user instruction, models should learn to decompose a complex task into several subtasks, dynamically adjust their plan through reasoning, and effectively conquer each sub-task by selecting appropriate tools. We also discuss how to train models for improved tool-use capabilities and facilitate the generalization in tool learning. Considering the lack of a systematic tool learning evaluation in prior works, we experiment with 18 representative tools and show the potential of current foundation models in skillfully utilizing tools. Finally, we discuss several open problems that require further investigation for tool learning. Overall, we hope this paper could inspire future research in integrating tools with foundation models.
△ Less
Submitted 15 June, 2023; v1 submitted 17 April, 2023;
originally announced April 2023.
-
Demystifying CXL Memory with Genuine CXL-Ready Systems and Devices
Authors:
Yan Sun,
Yifan Yuan,
Zeduo Yu,
Reese Kuper,
Chihun Song,
**ghan Huang,
Houxiang Ji,
Siddharth Agarwal,
Jiaqi Lou,
Ipoom Jeong,
Ren Wang,
Jung Ho Ahn,
Tianyin Xu,
Nam Sung Kim
Abstract:
The ever-growing demands for memory with larger capacity and higher bandwidth have driven recent innovations on memory expansion and disaggregation technologies based on Compute eXpress Link (CXL). Especially, CXL-based memory expansion technology has recently gained notable attention for its ability not only to economically expand memory capacity and bandwidth but also to decouple memory technolo…
▽ More
The ever-growing demands for memory with larger capacity and higher bandwidth have driven recent innovations on memory expansion and disaggregation technologies based on Compute eXpress Link (CXL). Especially, CXL-based memory expansion technology has recently gained notable attention for its ability not only to economically expand memory capacity and bandwidth but also to decouple memory technologies from a specific memory interface of the CPU. However, since CXL memory devices have not been widely available, they have been emulated using DDR memory in a remote NUMA node. In this paper, for the first time, we comprehensively evaluate a true CXL-ready system based on the latest 4th-generation Intel Xeon CPU with three CXL memory devices from different manufacturers. Specifically, we run a set of microbenchmarks not only to compare the performance of true CXL memory with that of emulated CXL memory but also to analyze the complex interplay between the CPU and CXL memory in depth. This reveals important differences between emulated CXL memory and true CXL memory, some of which will compel researchers to revisit the analyses and proposals from recent work. Next, we identify opportunities for memory-bandwidth-intensive applications to benefit from the use of CXL memory. Lastly, we propose a CXL-memory-aware dynamic page allocation policy, Caption to more efficiently use CXL memory as a bandwidth expander. We demonstrate that Caption can automatically converge to an empirically favorable percentage of pages allocated to CXL memory, which improves the performance of memory-bandwidth-intensive applications by up to 24% when compared to the default page allocation policy designed for traditional NUMA systems.
△ Less
Submitted 4 October, 2023; v1 submitted 27 March, 2023;
originally announced March 2023.
-
Anomalous open orbits in Hofstadter spectrum of Chern insulator
Authors:
Haijiao Ji,
Noah F. Q. Yuan,
Hua Jiang,
Haiwen Liu,
X. C. Xie
Abstract:
The nontrivial band topology can influence the Hofstadter spectrum. We investigate the Hofstadter spectrum for various models of Chern insulators under a rational flux $\frac{φ_{0}}{q}$, here $φ_{0}=\frac{h}{e}$ and $q$ being an integer. We find two major features. First, the number of splitting subbands is $|q-C|$ with Chern number $C$. Second, the anomalous open-orbit subbands with Chern numbers…
▽ More
The nontrivial band topology can influence the Hofstadter spectrum. We investigate the Hofstadter spectrum for various models of Chern insulators under a rational flux $\frac{φ_{0}}{q}$, here $φ_{0}=\frac{h}{e}$ and $q$ being an integer. We find two major features. First, the number of splitting subbands is $|q-C|$ with Chern number $C$. Second, the anomalous open-orbit subbands with Chern numbers $q-1$ and $-q-1$ emerge, which are beyond the parameter window $(-q/2,q/2)$ of the Diophantine equation studied by Thouless-Kohmoto-Nightingale-den Nijs [Phys. Rev. Lett. \textbf{49}, 405 (1982)]. These two findings are explained by semiclassical dynamics. We propose that the number of splitting subbands can be utilized to determine Chern number in cold atom systems, and the open-orbit subbands can provide routes to study exotic features beyond the Landau level physics.
△ Less
Submitted 26 March, 2023;
originally announced March 2023.
-
Coarsening of thin films with weak condensation
Authors:
Hangjie Ji,
Thomas P. Witelski
Abstract:
A lubrication model can be used to describe the dynamics of a weakly volatile viscous fluid layer on a hydrophobic substrate. Thin layers of the fluid are unstable to perturbations and break up into slowly evolving interacting droplets. A reduced-order dynamical system is derived from the lubrication model based on the nearest-neighbor droplet interactions in the weak condensation limit. Dynamics…
▽ More
A lubrication model can be used to describe the dynamics of a weakly volatile viscous fluid layer on a hydrophobic substrate. Thin layers of the fluid are unstable to perturbations and break up into slowly evolving interacting droplets. A reduced-order dynamical system is derived from the lubrication model based on the nearest-neighbor droplet interactions in the weak condensation limit. Dynamics for periodic arrays of identical drops and pairwise droplet interactions are investigated which provide insights into the coarsening dynamics for large systems. Weak condensation is shown to be a singular perturbation, fundamentally changing the long-time coarsening dynamics for the droplets and the overall mass of the fluid in two additional regimes of long-time dynamics.
△ Less
Submitted 16 January, 2024; v1 submitted 26 March, 2023;
originally announced March 2023.
-
SmartBook: AI-Assisted Situation Report Generation for Intelligence Analysts
Authors:
Revanth Gangi Reddy,
Daniel Lee,
Yi R. Fung,
Khanh Duy Nguyen,
Qi Zeng,
Manling Li,
Ziqi Wang,
Clare Voss,
Heng Ji
Abstract:
Timely and comprehensive understanding of emerging events is crucial for effective decision-making; automating situation report generation can significantly reduce the time, effort, and cost for intelligence analysts. In this work, we identify intelligence analysts' practices and preferences for AI assistance in situation report generation to guide the design strategies for an effective, trust-bui…
▽ More
Timely and comprehensive understanding of emerging events is crucial for effective decision-making; automating situation report generation can significantly reduce the time, effort, and cost for intelligence analysts. In this work, we identify intelligence analysts' practices and preferences for AI assistance in situation report generation to guide the design strategies for an effective, trust-building interface that aligns with their thought processes and needs. Next, we introduce SmartBook, an automated framework designed to generate situation reports from large volumes of news data, creating structured reports by automatically discovering event-related strategic questions. These reports include multiple hypotheses (claims), summarized and grounded to sources with factual evidence, to promote in-depth situation understanding. Our comprehensive evaluation of SmartBook, encompassing a user study alongside a content review with an editing study, reveals SmartBook's effectiveness in generating accurate and relevant situation reports. Qualitative evaluations indicate over 80% of questions probe for strategic information, and over 90% of summaries produce tactically useful content, being consistently favored over summaries from a large language model integrated with web search. The editing study reveals that minimal information is removed from the generated text (under 2.5%), suggesting that SmartBook provides analysts with a valuable foundation for situation reports
△ Less
Submitted 27 May, 2024; v1 submitted 24 March, 2023;
originally announced March 2023.
-
GLEN: General-Purpose Event Detection for Thousands of Types
Authors:
Qiusi Zhan,
Sha Li,
Kathryn Conger,
Martha Palmer,
Heng Ji,
Jiawei Han
Abstract:
The progress of event extraction research has been hindered by the absence of wide-coverage, large-scale datasets. To make event extraction systems more accessible, we build a general-purpose event detection dataset GLEN, which covers 205K event mentions with 3,465 different types, making it more than 20x larger in ontology than today's largest event dataset. GLEN is created by utilizing the DWD O…
▽ More
The progress of event extraction research has been hindered by the absence of wide-coverage, large-scale datasets. To make event extraction systems more accessible, we build a general-purpose event detection dataset GLEN, which covers 205K event mentions with 3,465 different types, making it more than 20x larger in ontology than today's largest event dataset. GLEN is created by utilizing the DWD Overlay, which provides a map** between Wikidata Qnodes and PropBank rolesets. This enables us to use the abundant existing annotation for PropBank as distant supervision. In addition, we also propose a new multi-stage event detection model CEDAR specifically designed to handle the large ontology size in GLEN. We show that our model exhibits superior performance compared to a range of baselines including InstructGPT. Finally, we perform error analysis and show that label noise is still the largest challenge for improving performance for this new dataset. Our dataset, code, and models are released at \url{https://github.com/ZQS1943/GLEN}.}
△ Less
Submitted 31 October, 2023; v1 submitted 16 March, 2023;
originally announced March 2023.