-
Reinforcement Learning Based Gasoline Blending Optimization: Achieving More Efficient Nonlinear Online Blending of Fuels
Authors:
Muyi Huang,
Renchu He,
Xin Dai,
Xin Peng,
Wenli Du,
Feng Qian
Abstract:
The online optimization of gasoline blending benefits refinery economies. However, the nonlinear blending mechanism, the oil property fluctuations, and the blending model mismatch bring difficulties to the optimization. To solve the above issues, this paper proposes a novel online optimization method based on deep reinforcement learning algorithm (DRL). The Markov decision process (MDP) expression…
▽ More
The online optimization of gasoline blending benefits refinery economies. However, the nonlinear blending mechanism, the oil property fluctuations, and the blending model mismatch bring difficulties to the optimization. To solve the above issues, this paper proposes a novel online optimization method based on deep reinforcement learning algorithm (DRL). The Markov decision process (MDP) expression are given considering a practical gasoline blending system. Then, the environment simulator of gasoline blending process is established based on the MDP expression and the one-year measurement data of a real-world refinery. The soft actor-critic (SAC) DRL algorithm is applied to improve the DRL agent policy by using the data obtained from the interaction between DRL agent and environment simulator. Compared with a traditional method, the proposed method has better economic performance. Meanwhile, it is more robust under property fluctuations and component oil switching. Furthermore, the proposed method maintains performance by automatically adapting to system drift.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
Collective and non-collective molecular dynamics in a ferroelectric nematic liquid crystal studied by broadband dielectric spectroscopy
Authors:
Aitor Erkoreka,
Alenka Mertelj,
Mingjun Huang,
Satoshi Aya,
Nerea Sebastián,
Josu Martinez-Perdiguero
Abstract:
A great deal of effort has been recently devoted to the study of dielectric relaxation processes in ferroelectric nematic liquid crystals, yet their interpretation remains unclear. In this work, we present the results of broadband dielectric spectroscopy experiments of a prototypical ferroelectric nematogen in the frequency range 10 Hz-110 MHz at different electrode separations and under the appli…
▽ More
A great deal of effort has been recently devoted to the study of dielectric relaxation processes in ferroelectric nematic liquid crystals, yet their interpretation remains unclear. In this work, we present the results of broadband dielectric spectroscopy experiments of a prototypical ferroelectric nematogen in the frequency range 10 Hz-110 MHz at different electrode separations and under the application of DC bias fields. The results evidence a complex behavior in all phases due to the magnitude of polar correlations in these systems. The observed modes have been assigned to different relaxation mechanisms based on existing theoretical frameworks.
△ Less
Submitted 1 February, 2024; v1 submitted 4 September, 2023;
originally announced September 2023.
-
ADI schemes for heat equations with irregular boundaries and interfaces in 3D with applications
Authors:
Han Zhou,
Minsheng Huang,
Wenjun Ying
Abstract:
In this paper, efficient alternating direction implicit (ADI) schemes are proposed to solve three-dimensional heat equations with irregular boundaries and interfaces. Starting from the well-known Douglas-Gunn ADI scheme, a modified ADI scheme is constructed to mitigate the issue of accuracy loss in solving problems with time-dependent boundary conditions. The unconditional stability of the new ADI…
▽ More
In this paper, efficient alternating direction implicit (ADI) schemes are proposed to solve three-dimensional heat equations with irregular boundaries and interfaces. Starting from the well-known Douglas-Gunn ADI scheme, a modified ADI scheme is constructed to mitigate the issue of accuracy loss in solving problems with time-dependent boundary conditions. The unconditional stability of the new ADI scheme is also rigorously proven with the Fourier analysis. Then, by combining the ADI schemes with a 1D kernel-free boundary integral (KFBI) method, KFBI-ADI schemes are developed to solve the heat equation with irregular boundaries. In 1D sub-problems of the KFBI-ADI schemes, the KFBI discretization takes advantage of the Cartesian grid and preserves the structure of the coefficient matrix so that the fast Thomas algorithm can be applied to solve the linear system efficiently. Second-order accuracy and unconditional stability of the KFBI-ADI schemes are verified through several numerical tests for both the heat equation and a reaction-diffusion equation. For the Stefan problem, which is a free boundary problem of the heat equation, a level set method is incorporated into the ADI method to capture the time-dependent interface. Numerical examples for simulating 3D dendritic solidification phenomenons are also presented.
△ Less
Submitted 2 September, 2023;
originally announced September 2023.
-
Friezes of cluster algebras of geometric type
Authors:
Antoine de Saint Germain,
Min Huang,
Jiang-Hua Lu
Abstract:
For a cluster algebra $\mathcal{A}$ over $\mathbb{Q}$ of geometric type, a $\textit{frieze}$ of $\mathcal{A}$ is defined to be a $\mathbb{Q}$-algebra homomorphism from $\mathcal{A}$ to $\mathbb{Q}$ that takes positive integer values on all cluster variables and all frozen variables. We present some basic facts on friezes, including frieze testing criteria, the notion of $\textit{frieze points}$ wh…
▽ More
For a cluster algebra $\mathcal{A}$ over $\mathbb{Q}$ of geometric type, a $\textit{frieze}$ of $\mathcal{A}$ is defined to be a $\mathbb{Q}$-algebra homomorphism from $\mathcal{A}$ to $\mathbb{Q}$ that takes positive integer values on all cluster variables and all frozen variables. We present some basic facts on friezes, including frieze testing criteria, the notion of $\textit{frieze points}$ when $\mathcal{A}$ is finitely generated, and pullbacks of friezes under certain $\mathbb{Q}$-algebra homomorphisms. When the cluster algebra $\mathcal{A}$ is acyclic, we define $\textit{frieze patterns associated to acyclic seeds of }\mathcal{A}$, generalizing the $\textit{ frieze patterns with coefficients of type } A$ studied by J. Propp and by M. Cuntz, T. Holm, and P. Jorgensen, and we give a sufficient condition for such frieze patterns to be equivalent to friezes. For the special cases when $\mathcal{A}$ has an acyclic seed with either trivial coefficients, principal coefficients, or what we call the $\textit{BFZ coefficients}$ (named after A. Berenstein, S. Fomin, and A. Zelevinsky), we identify frieze points of $\mathcal{A}$ both geometrically as certain positive integral points in explicitly described affine varieties and Lie theoretically (in the finite case) in terms of reduced double Bruhat cells and generalized minors on the associated semi-simple Lie groups. Furthermore, extending the gliding symmetry of the classical Coxeter frieze patterns of type $A$, we determine the symmetry of frieze patterns of any finite type with arbitrary coefficients.
△ Less
Submitted 3 October, 2023; v1 submitted 2 September, 2023;
originally announced September 2023.
-
Accelerating LSTM-based High-Rate Dynamic System Models
Authors:
Ehsan Kabir,
Daniel Coble,
Joud N. Satme,
Austin R. J. Downey,
Jason D. Bakos,
David Andrews,
Miaoqing Huang
Abstract:
In this paper, we evaluate the use of a trained Long Short-Term Memory (LSTM) network as a surrogate for a Euler-Bernoulli beam model, and then we describe and characterize an FPGA-based deployment of the model for use in real-time structural health monitoring applications. The focus of our efforts is the DROPBEAR (Dynamic Reproduction of Projectiles in Ballistic Environments for Advanced Research…
▽ More
In this paper, we evaluate the use of a trained Long Short-Term Memory (LSTM) network as a surrogate for a Euler-Bernoulli beam model, and then we describe and characterize an FPGA-based deployment of the model for use in real-time structural health monitoring applications. The focus of our efforts is the DROPBEAR (Dynamic Reproduction of Projectiles in Ballistic Environments for Advanced Research) dataset, which was generated as a benchmark for the study of real-time structural modeling applications. The purpose of DROPBEAR is to evaluate models that take vibration data as input and give the initial conditions of the cantilever beam on which the measurements were taken as output. DROPBEAR is meant to serve an exemplar for emerging high-rate "active structures" that can be actively controlled with feedback latencies of less than one microsecond. Although the Euler-Bernoulli beam model is a well-known solution to this modeling problem, its computational cost is prohibitive for the time scales of interest. It has been previously shown that a properly structured LSTM network can achieve comparable accuracy with less workload, but achieving sub-microsecond model latency remains a challenge. Our approach is to deploy the LSTM optimized specifically for latency on FPGA. We designed the model using both high-level synthesis (HLS) and hardware description language (HDL). The lowest latency of 1.42 $μ$S and the highest throughput of 7.87 Gops/s were achieved on Alveo U55C platform for HDL design.
△ Less
Submitted 1 September, 2023;
originally announced September 2023.
-
Layer-dependent magnetism and spin fluctuations in atomically thin van der Waals magnet CrPS4
Authors:
Mengqi Huang,
Jazmine C. Green,
**gcheng Zhou,
Violet Williams,
Senlei Li,
Hanyi Lu,
Dziga Djugba,
Hailong Wang,
Benedetta Flebus,
Ni Ni,
Chunhui Rita Du
Abstract:
van der Waals (vdW) magnets, an emerging family of two-dimensional (2D) materials, have received tremendous attention due to their rich fundamental physics and significant potential for cutting-edge technological applications. In contrast to the conventional bulk counterparts, vdW magnets exhibit significant tunability of local material properties, such as stacking engineered interlayer coupling a…
▽ More
van der Waals (vdW) magnets, an emerging family of two-dimensional (2D) materials, have received tremendous attention due to their rich fundamental physics and significant potential for cutting-edge technological applications. In contrast to the conventional bulk counterparts, vdW magnets exhibit significant tunability of local material properties, such as stacking engineered interlayer coupling and layer-number dependent magnetic and electronic interactions, which promise to deliver previously unavailable merits to develop multifunctional microelectronic devices. As a further ingredient of this emerging topic, here we report nanoscale quantum sensing and imaging of atomically thin vdW magnet chromium thiophosphate CrPS4, revealing its characteristic layer-dependent 2D static magnetism and dynamic spin fluctuations. We also show a large tunneling magnetoresistance in CrPS4-based spin filter vdW heterostructures. The excellent material stability, robust strategy against environmental degradation, in combination with tailored magnetic properties highlight the potential of CrPS4 in develo** state-of-the-art 2D spintronic devices for next-generation information technologies.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
The electromagnetic form factors in the $N_{f}=4$ holographic QCD
Authors:
Hiwa A. Ahmed,
Yidian Chen,
Mei Huang
Abstract:
In this study, we employ a modified soft-wall holographic model with four flavors to investigate the meson spectra, decay constants, electromagnetic form factors, and charge radius of various mesons. We obtain the spectra for vector, axial vector, and pseudoscalar mesons. Decay constants are calculated and compared with experimental and lattice QCD data. The pion and kaon electromagnetic form fact…
▽ More
In this study, we employ a modified soft-wall holographic model with four flavors to investigate the meson spectra, decay constants, electromagnetic form factors, and charge radius of various mesons. We obtain the spectra for vector, axial vector, and pseudoscalar mesons. Decay constants are calculated and compared with experimental and lattice QCD data. The pion and kaon electromagnetic form factors are compared with the experimental data, and a good agreement is achieved for the kaon at low $Q^{2}$. For the charmed mesons, the electromagnetic form factors of the $D$ and $D_{s}$ and electric form factors of the $D^{*}$ and $D_{s}^{*}$ are well consistent with the lattice QCD data. Moreover, the electric, magnetic, and quadrupole form factors are predicted for the $ρ$, $K^{*}$, $a_1$, $K_1$, $D_1$, and $D_{s1}$ mesons. Furthermore, the charge radius of the vector, axial vector, and pseudoscalars, including the strange and charmed mesons, are computed.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
Learning from Negative User Feedback and Measuring Responsiveness for Sequential Recommenders
Authors:
Yueqi Wang,
Yoni Halpern,
Shuo Chang,
**gchen Feng,
Elaine Ya Le,
Longfei Li,
Xujian Liang,
Min-Cheng Huang,
Shane Li,
Alex Beutel,
Ya** Zhang,
Shuchao Bi
Abstract:
Sequential recommenders have been widely used in industry due to their strength in modeling user preferences. While these models excel at learning a user's positive interests, less attention has been paid to learning from negative user feedback. Negative user feedback is an important lever of user control, and comes with an expectation that recommenders should respond quickly and reduce similar re…
▽ More
Sequential recommenders have been widely used in industry due to their strength in modeling user preferences. While these models excel at learning a user's positive interests, less attention has been paid to learning from negative user feedback. Negative user feedback is an important lever of user control, and comes with an expectation that recommenders should respond quickly and reduce similar recommendations to the user. However, negative feedback signals are often ignored in the training objective of sequential retrieval models, which primarily aim at predicting positive user interactions. In this work, we incorporate explicit and implicit negative user feedback into the training objective of sequential recommenders in the retrieval stage using a "not-to-recommend" loss function that optimizes for the log-likelihood of not recommending items with negative feedback. We demonstrate the effectiveness of this approach using live experiments on a large-scale industrial recommender system. Furthermore, we address a challenge in measuring recommender responsiveness to negative feedback by develo** a counterfactual simulation framework to compare recommender responses between different user actions, showing improved responsiveness from the modeling change.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
LatEval: An Interactive LLMs Evaluation Benchmark with Incomplete Information from Lateral Thinking Puzzles
Authors:
Shulin Huang,
Shirong Ma,
Yinghui Li,
Mengzuo Huang,
Wuhe Zou,
Weidong Zhang,
Hai-Tao Zheng
Abstract:
With the continuous evolution and refinement of LLMs, they are endowed with impressive logical reasoning or vertical thinking capabilities. But can they think out of the box? Do they possess proficient lateral thinking abilities? Following the setup of Lateral Thinking Puzzles, we propose a novel evaluation benchmark, LatEval, which assesses the model's lateral thinking within an interactive frame…
▽ More
With the continuous evolution and refinement of LLMs, they are endowed with impressive logical reasoning or vertical thinking capabilities. But can they think out of the box? Do they possess proficient lateral thinking abilities? Following the setup of Lateral Thinking Puzzles, we propose a novel evaluation benchmark, LatEval, which assesses the model's lateral thinking within an interactive framework. In our benchmark, we challenge LLMs with 2 aspects: the quality of questions posed by the model and the model's capability to integrate information for problem-solving. We find that nearly all LLMs struggle with employing lateral thinking during interactions. For example, even the most advanced model, GPT-4, exhibits the advantage to some extent, yet still maintain a noticeable gap when compared to human. This evaluation benchmark provides LLMs with a highly challenging and distinctive task that is crucial to an effective AI assistant.
△ Less
Submitted 17 March, 2024; v1 submitted 21 August, 2023;
originally announced August 2023.
-
To Healthier Ethereum: A Comprehensive and Iterative Smart Contract Weakness Enumeration
Authors:
Jiachi Chen,
Mingyuan Huang,
Zewei Lin,
Peilin Zheng,
Zibin Zheng
Abstract:
With the increasing popularity of cryptocurrencies and blockchain technology, smart contracts have become a prominent feature in develo** decentralized applications. However, these smart contracts are susceptible to vulnerabilities that hackers can exploit, resulting in significant financial losses. In response to this growing concern, various initiatives have emerged. Notably, the SWC vulnerabi…
▽ More
With the increasing popularity of cryptocurrencies and blockchain technology, smart contracts have become a prominent feature in develo** decentralized applications. However, these smart contracts are susceptible to vulnerabilities that hackers can exploit, resulting in significant financial losses. In response to this growing concern, various initiatives have emerged. Notably, the SWC vulnerability list played an important role in raising awareness and understanding of smart contract weaknesses. However, the SWC list lacks maintenance and has not been updated with new vulnerabilities since 2020. To address this gap, this paper introduces the Smart Contract Weakness Enumeration (SWE), a comprehensive and practical vulnerability list up until 2023. We collect 273 vulnerability descriptions from 86 top conference papers and journal papers, employing open card sorting techniques to deduplicate and categorize these descriptions. This process results in the identification of 40 common contract weaknesses, which are further classified into 20 sub-research fields through thorough discussion and analysis. SWE provides a systematic and comprehensive list of smart contract vulnerabilities, covering existing and emerging vulnerabilities in the last few years. Moreover, SWE is a scalable, continuously iterative program. We propose two update mechanisms for the maintenance of SWE. Regular updates involve the inclusion of new vulnerabilities from future top papers, while irregular updates enable individuals to report new weaknesses for review and potential addition to SWE.
△ Less
Submitted 20 August, 2023;
originally announced August 2023.
-
ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer
Authors:
Mingxin Huang,
Jiaxin Zhang,
Dezhi Peng,
Hao Lu,
Can Huang,
Yuliang Liu,
Xiang Bai,
Lianwen **
Abstract:
In recent years, end-to-end scene text spotting approaches are evolving to the Transformer-based framework. While previous studies have shown the crucial importance of the intrinsic synergy between text detection and recognition, recent advances in Transformer-based methods usually adopt an implicit synergy strategy with shared query, which can not fully realize the potential of these two interact…
▽ More
In recent years, end-to-end scene text spotting approaches are evolving to the Transformer-based framework. While previous studies have shown the crucial importance of the intrinsic synergy between text detection and recognition, recent advances in Transformer-based methods usually adopt an implicit synergy strategy with shared query, which can not fully realize the potential of these two interactive tasks. In this paper, we argue that the explicit synergy considering distinct characteristics of text detection and recognition can significantly improve the performance text spotting. To this end, we introduce a new model named Explicit Synergy-based Text Spotting Transformer framework (ESTextSpotter), which achieves explicit synergy by modeling discriminative and interactive features for text detection and recognition within a single decoder. Specifically, we decompose the conventional shared query into task-aware queries for text polygon and content, respectively. Through the decoder with the proposed vision-language communication module, the queries interact with each other in an explicit manner while preserving discriminative patterns of text detection and recognition, thus improving performance significantly. Additionally, we propose a task-aware query initialization scheme to ensure stable training. Experimental results demonstrate that our model significantly outperforms previous state-of-the-art methods. Code is available at https://github.com/mxin262/ESTextSpotter.
△ Less
Submitted 19 August, 2023;
originally announced August 2023.
-
Positive 2-bridge knots and chirally cosmetic surgeries
Authors:
Michael Huang,
Zelong Li,
Rahi Tanaz,
Chengyi Zhang
Abstract:
In this paper we verify that with the exception of the $(2, 2n+1)$ torus knots, positive 2-bridge knots up to 31 crossings do not admit chirally cosmetic surgeries. A knot $K$ admits chirally cosmetic surgeries if there exist surgeries $S^3_r$ and $S^3_{r'}$ with distinct slopes $r$ and $r'$ such that $S^3_r(K) \cong -S^3_{r'}(K)$, where the negative represents an orientation reversal. To verify t…
▽ More
In this paper we verify that with the exception of the $(2, 2n+1)$ torus knots, positive 2-bridge knots up to 31 crossings do not admit chirally cosmetic surgeries. A knot $K$ admits chirally cosmetic surgeries if there exist surgeries $S^3_r$ and $S^3_{r'}$ with distinct slopes $r$ and $r'$ such that $S^3_r(K) \cong -S^3_{r'}(K)$, where the negative represents an orientation reversal. To verify this, we use the obstruction formula from arXiv:2112.03144 which relates classical knot invariants to the existence of chirally cosmetic surgeries. To check the formula, we develop a Python program that computes the classical knot invariants $a_2$, $a_4$, $v_3$, $\det$, and $g$ of a positive 2-bridge knot.
△ Less
Submitted 22 August, 2023; v1 submitted 19 August, 2023;
originally announced August 2023.
-
Language-guided Human Motion Synthesis with Atomic Actions
Authors:
Yuanhao Zhai,
Mingzhen Huang,
Tianyu Luan,
Lu Dong,
Ifeoma Nwogu,
Siwei Lyu,
David Doermann,
Junsong Yuan
Abstract:
Language-guided human motion synthesis has been a challenging task due to the inherent complexity and diversity of human behaviors. Previous methods face limitations in generalization to novel actions, often resulting in unrealistic or incoherent motion sequences. In this paper, we propose ATOM (ATomic mOtion Modeling) to mitigate this problem, by decomposing actions into atomic actions, and emplo…
▽ More
Language-guided human motion synthesis has been a challenging task due to the inherent complexity and diversity of human behaviors. Previous methods face limitations in generalization to novel actions, often resulting in unrealistic or incoherent motion sequences. In this paper, we propose ATOM (ATomic mOtion Modeling) to mitigate this problem, by decomposing actions into atomic actions, and employing a curriculum learning strategy to learn atomic action composition. First, we disentangle complex human motions into a set of atomic actions during learning, and then assemble novel actions using the learned atomic actions, which offers better adaptability to new actions. Moreover, we introduce a curriculum learning training strategy that leverages masked motion modeling with a gradual increase in the mask ratio, and thus facilitates atomic action assembly. This approach mitigates the overfitting problem commonly encountered in previous methods while enforcing the model to learn better motion representations. We demonstrate the effectiveness of ATOM through extensive experiments, including text-to-motion and action-to-motion synthesis tasks. We further illustrate its superiority in synthesizing plausible and coherent text-guided human motion sequences.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Calibration and Physics with ARA Station 1: A Unique Askaryan Radio Array Detector
Authors:
M. F. H Seikh,
D. Z. Besson,
S. Ali,
P. Allison,
S. Archambault,
J. J. Beatty,
A. Bishop,
P. Chen,
Y. C. Chen,
B. A. Clark,
W. Clay,
A. Connolly,
K. Couberly,
L. Cremonesi,
A. Cummings,
P. Dasgupta,
R. Debolt,
S. De Kockere,
K. D. de Vries,
C. Deaconu,
M. A. DuVernois,
J. Flaherty,
E. Friedman,
R. Gaior,
P. Giri
, et al. (48 additional authors not shown)
Abstract:
The Askaryan Radio Array Station 1 (A1), the first among five autonomous stations deployed for the ARA experiment at the South Pole, is a unique ultra-high energy neutrino (UHEN) detector based on the Askaryan effect that uses Antarctic ice as the detector medium. Its 16 radio antennas (distributed across 4 strings, each with 2 Vertically Polarized (VPol), 2 Horizontally Polarized (HPol) receivers…
▽ More
The Askaryan Radio Array Station 1 (A1), the first among five autonomous stations deployed for the ARA experiment at the South Pole, is a unique ultra-high energy neutrino (UHEN) detector based on the Askaryan effect that uses Antarctic ice as the detector medium. Its 16 radio antennas (distributed across 4 strings, each with 2 Vertically Polarized (VPol), 2 Horizontally Polarized (HPol) receivers), and 2 strings of transmitting antennas (calibration pulsers, CPs), each with 1 VPol and 1 HPol channel, are deployed at depths less than 100 m within the shallow firn zone of the 2.8 km thick South Pole (SP) ice. We apply different methods to calibrate its Ice Ray Sampler second generation (IRS2) chip for timing offset and ADC-to-Voltage conversion factors using a known continuous wave input signal to the digitizer, and achieve a precision of sub-nanoseconds. We achieve better calibration for odd, compared to even samples, and also find that the HPols under-perform relative to the VPol channels. Our timing calibrated data is subsequently used to calibrate the ADC-to-Voltage conversion as well as precise antenna locations, as a precursor to vertex reconstruction. The calibrated data will then be analyzed for UHEN signals in the final step of data compression. The ability of A1 to scan the firn region of SP ice sheet will contribute greatly towards a 5-station analysis and will inform the design of the planned IceCube Gen-2 radio array.
△ Less
Submitted 14 August, 2023;
originally announced August 2023.
-
FPGA Processor In Memory Architectures (PIMs): Overlay or Overhaul ?
Authors:
MD Arafat Kabir,
Ehsan Kabir,
Joshua Hollis,
Eli Levy-Mackay,
Atiyehsadat Panahi,
Jason Bakos,
Miaoqing Huang,
David Andrews
Abstract:
The dominance of machine learning and the ending of Moore's law have renewed interests in Processor in Memory (PIM) architectures. This interest has produced several recent proposals to modify an FPGA's BRAM architecture to form a next-generation PIM reconfigurable fabric. PIM architectures can also be realized within today's FPGAs as overlays without the need to modify the underlying FPGA archite…
▽ More
The dominance of machine learning and the ending of Moore's law have renewed interests in Processor in Memory (PIM) architectures. This interest has produced several recent proposals to modify an FPGA's BRAM architecture to form a next-generation PIM reconfigurable fabric. PIM architectures can also be realized within today's FPGAs as overlays without the need to modify the underlying FPGA architecture. To date, there has been no study to understand the comparative advantages of the two approaches. In this paper, we present a study that explores the comparative advantages between two proposed custom architectures and a PIM overlay running on a commodity FPGA. We created PiCaSO, a Processor in/near Memory Scalable and Fast Overlay architecture as a representative PIM overlay. The results of this study show that the PiCaSO overlay achieves up to 80% of the peak throughput of the custom designs with 2.56x shorter latency and 25% - 43% better BRAM memory utilization efficiency. We then show how several key features of the PiCaSO overlay can be integrated into the custom PIM designs to further improve their throughput by 18%, latency by 19.5%, and memory efficiency by 6.2%.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
AgentBench: Evaluating LLMs as Agents
Authors:
Xiao Liu,
Hao Yu,
Hanchen Zhang,
Yifan Xu,
Xuanyu Lei,
Hanyu Lai,
Yu Gu,
Hangliang Ding,
Kaiwen Men,
Kejuan Yang,
Shudan Zhang,
Xiang Deng,
Aohan Zeng,
Zhengxiao Du,
Chenhui Zhang,
Sheng Shen,
Tianjun Zhang,
Yu Su,
Huan Sun,
Minlie Huang,
Yuxiao Dong,
Jie Tang
Abstract:
Large Language Models (LLMs) are becoming increasingly smart and autonomous, targeting real-world pragmatic missions beyond traditional NLP tasks. As a result, there has been an urgent need to evaluate LLMs as agents on challenging tasks in interactive environments. We present AgentBench, a multi-dimensional evolving benchmark that currently consists of 8 distinct environments to assess LLM-as-Age…
▽ More
Large Language Models (LLMs) are becoming increasingly smart and autonomous, targeting real-world pragmatic missions beyond traditional NLP tasks. As a result, there has been an urgent need to evaluate LLMs as agents on challenging tasks in interactive environments. We present AgentBench, a multi-dimensional evolving benchmark that currently consists of 8 distinct environments to assess LLM-as-Agent's reasoning and decision-making abilities in a multi-turn open-ended generation setting. Our extensive test over 27 API-based and open-sourced (OSS) LLMs shows that, while top commercial LLMs present a strong ability of acting as agents in complex environments, there is a significant disparity in performance between them and OSS competitors. We identify the typical reasons of failures in environments and LLMs, showing that poor long-term reasoning, decision-making, and instruction following abilities are the main obstacles for develo** usable LLM agents. Training on code and high quality multi-turn alignment data could improve agent performance. Datasets, environments, and an integrated evaluation package for AgentBench are released at \url{https://github.com/THUDM/AgentBench}.
△ Less
Submitted 25 October, 2023; v1 submitted 7 August, 2023;
originally announced August 2023.
-
The dynamical magnetic-induced quark spin polarization at the early stage in heavy-ion collisions
Authors:
An** Huang,
Zilin Yuan,
Mei Huang
Abstract:
There has been significant interest in recent years in studying the spin polarization of hyperons in heavy-ion collisions. However, describing the dynamic spin polarization of quarks in quark-gluon plasma fireballs in detail presents a significant challenge. In this study, we conducted a phenomenological investigation of the dynamical spin polarization of quarks induced by the magnetic field at th…
▽ More
There has been significant interest in recent years in studying the spin polarization of hyperons in heavy-ion collisions. However, describing the dynamic spin polarization of quarks in quark-gluon plasma fireballs in detail presents a significant challenge. In this study, we conducted a phenomenological investigation of the dynamical spin polarization of quarks induced by the magnetic field at the pre-thermal stage in heavy-ion collisions using the recently developed theoretical tool of chiral kinetic theory. This study presents a comprehensive analysis of the dynamic process of quark spin polarization induced by magnetic fields. Our findings demonstrate that the spin polarization of quarks is highly sensitive to the interactions between quarks. These interactions can delay the decay of the early spin polarization vector while accelerating the decay of the later spin polarization vector. Specifically, our simulations model the detailed process of how magnetic fields polarize quarks within the fireball by the average spin vector and reveal that quark interactions lead to an acceleration effect on the average spin, causing it to rapidly increase from zero before being rapidly destroyed. Notably, the induced magnetic field in the fireball has an incomplete electromagnetic response effect, and its increase in the opposite direction of the external magnetic field decaying has a delay effect.
△ Less
Submitted 27 July, 2023;
originally announced August 2023.
-
The Splitting of Chiral and Deconfinement Phase Transitions induced by Rotation
Authors:
Fei Sun,
Kun Xu,
Mei Huang
Abstract:
The chiral and deconfinement phase transitions under rotation have been simultaneously investigated in the Polyakov-Nambu-Jona-Lasinio (PNJL) model. An interesting observation has been found that the chiral phase transition is catalyzed and the deconfinement phase transition is decelerated by rotation, therefore a chiral symmetric but confined phase is induced by rotation, which indicates that chi…
▽ More
The chiral and deconfinement phase transitions under rotation have been simultaneously investigated in the Polyakov-Nambu-Jona-Lasinio (PNJL) model. An interesting observation has been found that the chiral phase transition is catalyzed and the deconfinement phase transition is decelerated by rotation, therefore a chiral symmetric but confined phase is induced by rotation, which indicates that chiral dynamics and gluon dynamics can be split by rotation.
△ Less
Submitted 8 November, 2023; v1 submitted 26 July, 2023;
originally announced July 2023.
-
Fluid pendulum explains reversals of the large-scale circulation in thermal convection
Authors:
Nicholas J. Moore,
**zi Mac Huang
Abstract:
We introduce a low-dimensional dynamical system to describe thermal convection in an annulus. The model derives systematically from a Fourier-Laurent truncation of the governing Navier-Stokes Boussinesq equations with no adjustable parameters and with the ability to generalize to any order. Comparison with fully resolved numerical solutions shows that the leading-order model captures parameter bif…
▽ More
We introduce a low-dimensional dynamical system to describe thermal convection in an annulus. The model derives systematically from a Fourier-Laurent truncation of the governing Navier-Stokes Boussinesq equations with no adjustable parameters and with the ability to generalize to any order. Comparison with fully resolved numerical solutions shows that the leading-order model captures parameter bifurcations and reversals of the large-scale circulation (LSC) with quantitative accuracy, including states of (i) steady circulating flow, (ii) chaotic LSC reversals, and (iii) periodic LSC reversals. Casting the system in terms of the fluid's angular momentum and center of mass (CoM) reveals equivalence to a damped pendulum with forcing that raises the CoM above the fulcrum. This formulation offers a transparent mechanism for LSC reversals, namely the inertial overshoot of a driven pendulum, and it yields accurate predictions for the frequency of regular LSC reversals in the high Rayleigh-number limit.
△ Less
Submitted 25 July, 2023; v1 submitted 24 July, 2023;
originally announced July 2023.
-
A convective fluid pendulum revealing states of order and chaos
Authors:
**zi Mac Huang,
Nicholas J. Moore
Abstract:
We examine thermal convection in a two-dimensional annulus using fully resolved direct numerical simulation (DNS) in conjunction with a low-dimensional model deriving from Galerkin truncation of the governing Navier-Stokes Boussinesq (NSB) equations. The DNS is based on fast and accurate pseudo-spectral discretization of the full NSB system with implicit-explicit time step**. Inspired by the num…
▽ More
We examine thermal convection in a two-dimensional annulus using fully resolved direct numerical simulation (DNS) in conjunction with a low-dimensional model deriving from Galerkin truncation of the governing Navier-Stokes Boussinesq (NSB) equations. The DNS is based on fast and accurate pseudo-spectral discretization of the full NSB system with implicit-explicit time step**. Inspired by the numerical results, we propose a reduced model that is based on a Fourier-Laurent truncation of the NSB system and can generalize to any degree of accuracy. We demonstrate that the lowest-order model capable of satisfying all boundary conditions on the annulus successfully captures reversals of the large-scale circulation (LSC) in certain regimes. Based on both the DNS and stability analysis of the reduced model, we identify a sequence of transitions between (i) a motionless conductive state, (ii) a state of steady circulation, (iii) non-periodic dynamics and chaotic reversals of the LSC, (iv) a high Rayleigh-number state in which LSC reversals are periodic despite turbulent fluctuations at the small scale. The reduced model reveals a link to a damped pendulum system with a particular form of external forcing. The oscillatory pendulum motion provides an accurate prediction for the LSC reversal frequency in the high Rayleigh-number regime.
△ Less
Submitted 25 July, 2023; v1 submitted 24 July, 2023;
originally announced July 2023.
-
The baryon number fluctuation $κσ^2$ as a probe of nuclear matter phase transition at high baryon density
Authors:
Kun Xu,
Mei Huang
Abstract:
Two critical end points (CEPs) of the chiral phase transition and the nuclear liquid-gas phase transition show up at finite baryon chemical potential. The kurtosis $κσ^2$ of baryon number fluctuation on the $T-μ_B$ plane is positive on the first-order side and negative on the crossover side along the phase boundary. The freeze-out line extracted from the heavy ion collisions crosses between these…
▽ More
Two critical end points (CEPs) of the chiral phase transition and the nuclear liquid-gas phase transition show up at finite baryon chemical potential. The kurtosis $κσ^2$ of baryon number fluctuation on the $T-μ_B$ plane is positive on the first-order side and negative on the crossover side along the phase boundary. The freeze-out line extracted from the heavy ion collisions crosses between these two phase boundaries, one can observe a peak of $κσ^2$ around the collision energy $5 {\rm GeV}$ near the CEP of the chiral phase transition, and negative $κσ^2$ at low collision energies due to the CEP of the nuclear liquid-gas phase transition. This expalains the experimental measurement of $κσ^2$ at the collision energies of 2.4 GeV at HADES and 3 GeV and 7.7-200 GeV at STAR for most central collision. Thus we propose that the baryon number fluctuation $κσ^2$ can be used as a probe of nuclear matter phase structure at high baryon density.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
Quantum LiDAR with Frequency Modulated Continuous Wave
Authors:
Ming-Da Huang,
Zhan-Feng Jiang,
Hong-Yi Chen,
Ying Zuo,
Xiao-Peng Hu,
Hai-Dong Yuan,
Li-Jian Zhang,
Qi Qin
Abstract:
The range and speed of a moving object can be ascertained using the sensing technique known as light detection and ranging (LiDAR). It has recently been suggested that quantum LiDAR, which uses entangled states of light, can enhance the capabilities of LiDAR. Entangled pulsed light is used in prior quantum LiDAR approaches to assess both range and velocity at the same time using the pulses' time o…
▽ More
The range and speed of a moving object can be ascertained using the sensing technique known as light detection and ranging (LiDAR). It has recently been suggested that quantum LiDAR, which uses entangled states of light, can enhance the capabilities of LiDAR. Entangled pulsed light is used in prior quantum LiDAR approaches to assess both range and velocity at the same time using the pulses' time of flight and Doppler shift. The entangled pulsed light generation and detection, which are crucial for pulsed quantum LiDAR, are often inefficient. Here, we study a quantum LiDAR that operates on a frequency-modulated continuous wave (FMCW), as opposed to pulses. We first outline the design of the quantum FMCW LiDAR using entangled frequency-modulated photons in a Mach-Zehnder interferometer, and we demonstrate how it can increase accuracy and resolution for range and velocity measurements by $\sqrt{n}$ and $n$, respectively, with $n$ entangled photons. We also demonstrate that quantum FMCW LiDAR may perform simultaneous measurements of the range and velocity without the need for quantum pulsed compression, which is necessary in pulsed quantum LiDAR. Since the generation of entangled photons is the only inefficient nonlinear optical process needed, the quantum FMCW LiDAR is better suited for practical implementations. Additionally, most measurements in the quantum FMCW LiDAR can be carried out electronically by down-converting optical signal to microwave region.
△ Less
Submitted 21 July, 2023;
originally announced July 2023.
-
Facilitating Multi-turn Emotional Support Conversation with Positive Emotion Elicitation: A Reinforcement Learning Approach
Authors:
**feng Zhou,
Zhuang Chen,
Bo Wang,
Minlie Huang
Abstract:
Emotional support conversation (ESC) aims to provide emotional support (ES) to improve one's mental state. Existing works stay at fitting grounded responses and responding strategies (e.g., question), which ignore the effect on ES and lack explicit goals to guide emotional positive transition. To this end, we introduce a new paradigm to formalize multi-turn ESC as a process of positive emotion eli…
▽ More
Emotional support conversation (ESC) aims to provide emotional support (ES) to improve one's mental state. Existing works stay at fitting grounded responses and responding strategies (e.g., question), which ignore the effect on ES and lack explicit goals to guide emotional positive transition. To this end, we introduce a new paradigm to formalize multi-turn ESC as a process of positive emotion elicitation. Addressing this task requires finely adjusting the elicitation intensity in ES as the conversation progresses while maintaining conversational goals like coherence. In this paper, we propose Supporter, a mixture-of-expert-based reinforcement learning model, and well design ES and dialogue coherence rewards to guide policy's learning for responding. Experiments verify the superiority of Supporter in achieving positive emotion elicitation during responding while maintaining conversational goals including coherence.
△ Less
Submitted 16 July, 2023;
originally announced July 2023.
-
DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question Answering
Authors:
Pei Ke,
Fei Huang,
Fei Mi,
Yasheng Wang,
Qun Liu,
Xiaoyan Zhu,
Minlie Huang
Abstract:
Existing evaluation metrics for natural language generation (NLG) tasks face the challenges on generalization ability and interpretability. Specifically, most of the well-performed metrics are required to train on evaluation datasets of specific NLG tasks and evaluation dimensions, which may cause over-fitting to task-specific datasets. Furthermore, existing metrics only provide an evaluation scor…
▽ More
Existing evaluation metrics for natural language generation (NLG) tasks face the challenges on generalization ability and interpretability. Specifically, most of the well-performed metrics are required to train on evaluation datasets of specific NLG tasks and evaluation dimensions, which may cause over-fitting to task-specific datasets. Furthermore, existing metrics only provide an evaluation score for each dimension without revealing the evidence to interpret how this score is obtained. To deal with these challenges, we propose a simple yet effective metric called DecompEval. This metric formulates NLG evaluation as an instruction-style question answering task and utilizes instruction-tuned pre-trained language models (PLMs) without training on evaluation datasets, aiming to enhance the generalization ability. To make the evaluation process more interpretable, we decompose our devised instruction-style question about the quality of generated texts into the subquestions that measure the quality of each sentence. The subquestions with their answers generated by PLMs are then recomposed as evidence to obtain the evaluation result. Experimental results show that DecompEval achieves state-of-the-art performance in untrained metrics for evaluating text summarization and dialogue generation, which also exhibits strong dimension-level / task-level generalization ability and interpretability.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation
Authors:
Zhexin Zhang,
Jiaxin Wen,
Minlie Huang
Abstract:
Large pre-trained language models achieve impressive results across many tasks. However, recent works point out that pre-trained language models may memorize a considerable fraction of their training data, leading to the privacy risk of information leakage. In this paper, we propose a method named Ethicist for targeted training data extraction through loss smoothed soft prompting and calibrated co…
▽ More
Large pre-trained language models achieve impressive results across many tasks. However, recent works point out that pre-trained language models may memorize a considerable fraction of their training data, leading to the privacy risk of information leakage. In this paper, we propose a method named Ethicist for targeted training data extraction through loss smoothed soft prompting and calibrated confidence estimation, investigating how to recover the suffix in the training data when given a prefix. To elicit memorization in the attacked model, we tune soft prompt embeddings while kee** the model fixed. We further propose a smoothing loss that smooths the loss distribution of the suffix tokens to make it easier to sample the correct suffix. In order to select the most probable suffix from a collection of sampled suffixes and estimate the prediction confidence, we propose a calibrated confidence estimation method, which normalizes the confidence of the generated suffixes with a local estimation. We show that Ethicist significantly improves the extraction performance on a recently proposed public benchmark. We also investigate several factors influencing the data extraction performance, including decoding strategy, model scale, prefix length, and suffix length. Our code is available at https://github.com/thu-coai/Targeted-Data-Extraction.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
Revealing intrinsic domains and fluctuations of moiré magnetism by a wide-field quantum microscope
Authors:
Mengqi Huang,
Zeliang Sun,
Gerald Yan,
Hongchao Xie,
Nishkarsh Agarwal,
Gaihua Ye,
Suk Hyun Sung,
Hanyi Lu,
**gcheng Zhou,
Shaohua Yan,
Shangjie Tian,
Hechang Lei,
Robert Hovden,
Rui He,
Hailong Wang,
Liuyan Zhao,
Chunhui Rita Du
Abstract:
Moiré magnetism featured by stacking engineered atomic registry and lattice interactions has recently emerged as an appealing quantum state of matter at the forefront condensed matter physics research. Nanoscale imaging of moiré magnets is highly desirable and serves as a prerequisite to investigate a broad range of intriguing physics underlying the interplay between topology, electronic correlati…
▽ More
Moiré magnetism featured by stacking engineered atomic registry and lattice interactions has recently emerged as an appealing quantum state of matter at the forefront condensed matter physics research. Nanoscale imaging of moiré magnets is highly desirable and serves as a prerequisite to investigate a broad range of intriguing physics underlying the interplay between topology, electronic correlations, and unconventional nanomagnetism. Here we report spin defect-based wide-field imaging of magnetic domains and spin fluctuations in twisted double trilayer (tDT) chromium triiodide CrI3. We explicitly show that intrinsic moiré domains of opposite magnetizations appear over arrays of moiré supercells in low-twist-angle tDT CrI3. In contrast, spin fluctuations measured in tDT CrI3 manifest little spatial variations on the same mesoscopic length scale due to the dominant driving force of intralayer exchange interaction. Our results enrich the current understanding of exotic magnetic phases sustained by moiré magnetism and highlight the opportunities provided by quantum spin sensors in probing microscopic spin related phenomena on two-dimensional flatland.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
Ultrasonic backscattering model for Rayleigh waves in polycrystals with Born and independent scattering approximations
Authors:
Shan Li,
Ming Huang,
Yongfeng Song,
Bo Lan,
Xiongbing Li
Abstract:
This paper presents theoretical and numerical models for the backscattering of 2D Rayleigh waves in single-phase, untextured polycrystalline materials with statistically equiaxed grains. The theoretical model, based on our prior inclusion-induced Rayleigh wave scattering model and the independent scattering approximation, considers single scattering of Rayleigh-to-Rayleigh (R-R) waves. The numeric…
▽ More
This paper presents theoretical and numerical models for the backscattering of 2D Rayleigh waves in single-phase, untextured polycrystalline materials with statistically equiaxed grains. The theoretical model, based on our prior inclusion-induced Rayleigh wave scattering model and the independent scattering approximation, considers single scattering of Rayleigh-to-Rayleigh (R-R) waves. The numerical finite element model is established to accurately simulate the scattering problem and evaluate the theoretical model. Good quantitative agreement is observed between the theoretical model and the finite element results, especially for weakly scattering materials. The agreement decreases with the increase of the anisotropy index, owing to the reduced applicability of the Born approximation. However, the agreement remains generally good when weak multiple scattering is involved. In addition, the R-R backscattering behaviour of 2D Rayleigh waves is similar to the longitudinal-to-longitudinal and transverse-to-transverse backscattering of bulk waves, with the former exhibiting stronger scattering. These findings establish a foundation for using Rayleigh waves in quantitative characterisation of polycrystalline materials.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
Did the nHZ Gravitational Waves Signatures Observed By NANOGrav Indicate Multiple Sector SUSY Breaking?
Authors:
Xiao Kang Du,
Ming Xia Huang,
Fei Wang,
Ying Kai Zhang
Abstract:
Discrete R symmetries always play an important role in low energy SUSY. The spontaneously broken of such discrete R symmetries, for example, by gaugino condensation, can lead to domain walls, which need to be either inflated away or collapse to avoid cosmic difficulties. We propose that explicitly R symmetry violation needed for collapse of domain walls can be the consequence of multiple sector SU…
▽ More
Discrete R symmetries always play an important role in low energy SUSY. The spontaneously broken of such discrete R symmetries, for example, by gaugino condensation, can lead to domain walls, which need to be either inflated away or collapse to avoid cosmic difficulties. We propose that explicitly R symmetry violation needed for collapse of domain walls can be the consequence of multiple sector SUSY breaking. The consistency constraints for the generation of non-problematic domain walls from gaugino condensation are discussed. We also study the emitted gravitational waves related to the collapse of domain walls. We find that, for SUSY breaking scale of order ${\cal O}(1)$ ${\rm GeV}$ in one of the sequestered sector (and also a low reheating temperature of order ${\rm MeV}$ if the reheating is not completed when the domain walls collapse), the peak frequency of gravitational waves emitted can lie at nHz. Such a low SUSY breaking scale can be consistency and natural in multiple sector SUSY breaking scenario. The GWs signal by NANOGrav could be a signal of such multiple sector SUSY breaking scenario and it may also indicate the existences of light goldstini at ${\rm eV}$ mass scale.
△ Less
Submitted 6 July, 2023; v1 submitted 6 July, 2023;
originally announced July 2023.
-
Thermal Entropy in Calabi-Yau Quantum Mechanics
Authors:
Min-xin Huang
Abstract:
We consider the von Neumann entropy of a thermal mixed state in quantum systems derived from mirror curves, where the kinetic terms are exponential functions of the momentum operators. Using the mathematical results on the asymptotics of the energy eigenvalues, we compute the asymptotic entropy in high temperature limit and compare with that of the conventional models. We discuss the connections w…
▽ More
We consider the von Neumann entropy of a thermal mixed state in quantum systems derived from mirror curves, where the kinetic terms are exponential functions of the momentum operators. Using the mathematical results on the asymptotics of the energy eigenvalues, we compute the asymptotic entropy in high temperature limit and compare with that of the conventional models. We discuss the connections with some folklores in quantum gravity, particularly on the finiteness of entropy.
△ Less
Submitted 8 October, 2023; v1 submitted 4 July, 2023;
originally announced July 2023.
-
Mitigating the Learning Bias towards Repetition by Self-Contrastive Training for Open-Ended Generation
Authors:
Jian Guan,
Minlie Huang
Abstract:
Despite the huge progress in myriad generation tasks, pretrained language models (LMs) such as GPT2 still tend to generate repetitive texts with maximization-based decoding algorithms for open-ended generation. We attribute their overestimation of token-level repetition probabilities to the learning bias: LMs capture simple repetitive patterns faster with the MLE loss. We propose self-contrastive…
▽ More
Despite the huge progress in myriad generation tasks, pretrained language models (LMs) such as GPT2 still tend to generate repetitive texts with maximization-based decoding algorithms for open-ended generation. We attribute their overestimation of token-level repetition probabilities to the learning bias: LMs capture simple repetitive patterns faster with the MLE loss. We propose self-contrastive training to penalize the output of a premature checkpoint of the same model when it incorrectly predicts repetition, which is shown to mitigate repetition effectively while maintaining fluency on two datasets. Furthermore, we find that LMs use longer-range dependencies to predict repetitive tokens than non-repetitive ones, which may be the cause of sentence-level repetition loops.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
DreamIdentity: Improved Editability for Efficient Face-identity Preserved Image Generation
Authors:
Zhuowei Chen,
Shancheng Fang,
Wei Liu,
Qian He,
Mengqi Huang,
Yongdong Zhang,
Zhendong Mao
Abstract:
While large-scale pre-trained text-to-image models can synthesize diverse and high-quality human-centric images, an intractable problem is how to preserve the face identity for conditioned face images. Existing methods either require time-consuming optimization for each face-identity or learning an efficient encoder at the cost of harming the editability of models. In this work, we present an opti…
▽ More
While large-scale pre-trained text-to-image models can synthesize diverse and high-quality human-centric images, an intractable problem is how to preserve the face identity for conditioned face images. Existing methods either require time-consuming optimization for each face-identity or learning an efficient encoder at the cost of harming the editability of models. In this work, we present an optimization-free method for each face identity, meanwhile kee** the editability for text-to-image models. Specifically, we propose a novel face-identity encoder to learn an accurate representation of human faces, which applies multi-scale face features followed by a multi-embedding projector to directly generate the pseudo words in the text embedding space. Besides, we propose self-augmented editability learning to enhance the editability of models, which is achieved by constructing paired generated face and edited face images using celebrity names, aiming at transferring mature ability of off-the-shelf text-to-image models in celebrity faces to unseen faces. Extensive experiments show that our methods can generate identity-preserved images under different scenes at a much faster speed.
△ Less
Submitted 1 July, 2023;
originally announced July 2023.
-
Design Sensitivity and Its Implications for Weighted Observational Studies
Authors:
Melody Huang,
Dan Soriano,
Samuel D. Pimentel
Abstract:
Sensitivity to unmeasured confounding is not typically a primary consideration in designing treated-control comparisons in observational studies. We introduce a framework allowing researchers to optimize robustness to omitted variable bias at the design stage using a measure called design sensitivity. Design sensitivity, which describes the asymptotic power of a sensitivity analysis, allows transp…
▽ More
Sensitivity to unmeasured confounding is not typically a primary consideration in designing treated-control comparisons in observational studies. We introduce a framework allowing researchers to optimize robustness to omitted variable bias at the design stage using a measure called design sensitivity. Design sensitivity, which describes the asymptotic power of a sensitivity analysis, allows transparent assessment of the impact of different estimation strategies on sensitivity. We apply this general framework to two commonly-used sensitivity models, the marginal sensitivity model and the variance-based sensitivity model. By comparing design sensitivities, we interrogate how key features of weighted designs, including choices about trimming of weights and model augmentation, impact robustness to unmeasured confounding, and how these impacts may differ for the two different sensitivity models. We illustrate the proposed framework on a study examining drivers of support for the 2016 Colombian peace agreement.
△ Less
Submitted 30 June, 2023;
originally announced July 2023.
-
Searching for the nano-Hertz stochastic gravitational wave background with the Chinese Pulsar Timing Array Data Release I
Authors:
Heng Xu,
Siyuan Chen,
Yanjun Guo,
**chen Jiang,
Bojun Wang,
Jiangwei Xu,
Zihan Xue,
R. Nicolas Caballero,
Jian** Yuan,
Yonghua Xu,
**gbo Wang,
Longfei Hao,
**gtao Luo,
Kejia Lee,
**lin Han,
Peng Jiang,
Zhiqiang Shen,
Min Wang,
Na Wang,
Renxin Xu,
** Wu,
Richard Manchester,
Lei Qian,
Xin Guan,
Menglin Huang
, et al. (2 additional authors not shown)
Abstract:
Observing and timing a group of millisecond pulsars (MSPs) with high rotational stability enables the direct detection of gravitational waves (GWs). The GW signals can be identified from the spatial correlations encoded in the times-of-arrival of widely spaced pulsar-pairs. The Chinese Pulsar Timing Array (CPTA) is a collaboration aiming at the direct GW detection with observations carried out usi…
▽ More
Observing and timing a group of millisecond pulsars (MSPs) with high rotational stability enables the direct detection of gravitational waves (GWs). The GW signals can be identified from the spatial correlations encoded in the times-of-arrival of widely spaced pulsar-pairs. The Chinese Pulsar Timing Array (CPTA) is a collaboration aiming at the direct GW detection with observations carried out using Chinese radio telescopes. This short article serves as a `table of contents' for a forthcoming series of papers related to the CPTA Data Release 1 (CPTA DR1) which uses observations from the Five-hundred-meter Aperture Spherical radio Telescope (FAST). Here, after summarizing the time span and accuracy of CPTA DR1, we report the key results of our statistical inference finding a correlated signal with amplitude $\log A_{\rm c}= -14.4 \,^{+1.0}_{-2.8}$ for spectral index in the range of $α\in [-1.8, 1.5]$ assuming a GW background (GWB) induced quadrupolar correlation. The search for the Hellings-Downs (HD) correlation curve is also presented, where some evidence for the HD correlation has been found that a 4.6-$σ$ statistical significance is achieved using the discrete frequency method around the frequency of 14 nHz. We expect that the future International Pulsar Timing Array data analysis and the next CPTA data release will be more sensitive to the nHz GWB, which could verify the current results.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
Covering convection with a thermal blanket: numerical simulation and stochastic modeling
Authors:
**zi Mac Huang
Abstract:
Adding moving boundaries to convective fluids is known to result in nontrivial and surprising dynamics, leading to spectacular geoformations ranging from the kilometer-scale karst terrains to the planetary-scale plate tectonics. On one hand, the moving solid alters the surrounding flow field, but on the other hand, the flow modifies the motion and shape of the solid. This leads to a two-way coupli…
▽ More
Adding moving boundaries to convective fluids is known to result in nontrivial and surprising dynamics, leading to spectacular geoformations ranging from the kilometer-scale karst terrains to the planetary-scale plate tectonics. On one hand, the moving solid alters the surrounding flow field, but on the other hand, the flow modifies the motion and shape of the solid. This leads to a two-way coupling that is significant in the study of fluid-structure interactions and in the understanding of geomorphologies. In this work, we investigate the coupling between a floating plate and the convective fluid below it. Through numerical experiments, we show the motion of this plate is driven by the flow beneath. However the flow structure is also modified by the presence of this plate, leading to the "thermal blanket" effect where the trapped heat beneath the plate results in buoyant and upwelling flows that in turn push the plate away. By analyzing this two-way coupling between moving boundary and fluid, we are able to capture the dynamical behaviors of this plate through a low-dimensional stochastic model. Geophysically, the thermal blanket effect is believed to drive the continental drift, therefore understanding this mechanism has significance beyond fluid dynamics.
△ Less
Submitted 13 December, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Quantum metric nonlinear Hall effect in a topological antiferromagnetic heterostructure
Authors:
Anyuan Gao,
Yu-Fei Liu,
Jian-Xiang Qiu,
Barun Ghosh,
Thaís V. Trevisan,
Yugo Onishi,
Chaowei Hu,
Tiema Qian,
Hung-Ju Tien,
Shao-Wen Chen,
Mengqi Huang,
Damien Bérubé,
Houchen Li,
Christian Tzschaschel,
Thao Dinh,
Zhe Sun,
Sheng-Chin Ho,
Shang-Wei Lien,
Bahadur Singh,
Kenji Watanabe,
Takashi Taniguchi,
David C. Bell,
Hsin Lin,
Tay-Rong Chang,
Chunhui Rita Du
, et al. (6 additional authors not shown)
Abstract:
Quantum geometry - the geometry of electron Bloch wavefunctions - is central to modern condensed matter physics. Due to the quantum nature, quantum geometry has two parts, the real part quantum metric and the imaginary part Berry curvature. The studies of Berry curvature have led to countless breakthroughs, ranging from the quantum Hall effect in 2DEGs to the anomalous Hall effect (AHE) in ferroma…
▽ More
Quantum geometry - the geometry of electron Bloch wavefunctions - is central to modern condensed matter physics. Due to the quantum nature, quantum geometry has two parts, the real part quantum metric and the imaginary part Berry curvature. The studies of Berry curvature have led to countless breakthroughs, ranging from the quantum Hall effect in 2DEGs to the anomalous Hall effect (AHE) in ferromagnets. However, in contrast to Berry curvature, the quantum metric has rarely been explored. Here, we report a new nonlinear Hall effect induced by quantum metric by interfacing even-layered MnBi2Te4 (a PT-symmetric antiferromagnet (AFM)) with black phosphorus. This novel nonlinear Hall effect switches direction upon reversing the AFM spins and exhibits distinct scaling that suggests a non-dissipative nature. Like the AHE brought Berry curvature under the spotlight, our results open the door to discovering quantum metric responses. Moreover, we demonstrate that the AFM can harvest wireless electromagnetic energy via the new nonlinear Hall effect, therefore enabling intriguing applications that bridges nonlinear electronics with AFM spintronics.
△ Less
Submitted 23 July, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
MiniLLM: Knowledge Distillation of Large Language Models
Authors:
Yuxian Gu,
Li Dong,
Furu Wei,
Minlie Huang
Abstract:
Knowledge Distillation (KD) is a promising technique for reducing the high computational demand of large language models (LLMs). However, previous KD methods are primarily applied to white-box classification models or training small models to imitate black-box model APIs like ChatGPT. How to effectively distill the knowledge of white-box LLMs into small models is still under-explored, which become…
▽ More
Knowledge Distillation (KD) is a promising technique for reducing the high computational demand of large language models (LLMs). However, previous KD methods are primarily applied to white-box classification models or training small models to imitate black-box model APIs like ChatGPT. How to effectively distill the knowledge of white-box LLMs into small models is still under-explored, which becomes more important with the prosperity of open-source LLMs. In this work, we propose a KD approach that distills LLMs into smaller language models. We first replace the forward Kullback-Leibler divergence (KLD) objective in the standard KD approaches with reverse KLD, which is more suitable for KD on generative language models, to prevent the student model from overestimating the low-probability regions of the teacher distribution. Then, we derive an effective optimization approach to learn this objective. The student models are named MiniLLM. Extensive experiments in the instruction-following setting show that MiniLLM generates more precise responses with higher overall quality, lower exposure bias, better calibration, and higher long-text generation performance than the baselines. Our method is scalable for different model families with 120M to 13B parameters. Our code, data, and model checkpoints can be found in https://github.com/microsoft/LMOps/tree/main/minillm.
△ Less
Submitted 9 April, 2024; v1 submitted 14 June, 2023;
originally announced June 2023.
-
Uncertainty in Natural Language Processing: Sources, Quantification, and Applications
Authors:
Mengting Hu,
Zhen Zhang,
Shiwan Zhao,
Minlie Huang,
Bingzhe Wu
Abstract:
As a main field of artificial intelligence, natural language processing (NLP) has achieved remarkable success via deep neural networks. Plenty of NLP tasks have been addressed in a unified manner, with various tasks being associated with each other through sharing the same paradigm. However, neural networks are black boxes and rely on probability computation. Making mistakes is inevitable. Therefo…
▽ More
As a main field of artificial intelligence, natural language processing (NLP) has achieved remarkable success via deep neural networks. Plenty of NLP tasks have been addressed in a unified manner, with various tasks being associated with each other through sharing the same paradigm. However, neural networks are black boxes and rely on probability computation. Making mistakes is inevitable. Therefore, estimating the reliability and trustworthiness (in other words, uncertainty) of neural networks becomes a key research direction, which plays a crucial role in reducing models' risks and making better decisions. Therefore, in this survey, we provide a comprehensive review of uncertainty-relevant works in the NLP field. Considering the data and paradigms characteristics, we first categorize the sources of uncertainty in natural language into three types, including input, system, and output. Then, we systemically review uncertainty quantification approaches and the main applications. Finally, we discuss the challenges of uncertainty estimation in NLP and discuss potential future directions, taking into account recent trends in the field. Though there have been a few surveys about uncertainty estimation, our work is the first to review uncertainty from the NLP perspective.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
Functional renormalization group study of neutral and charged pion under magnetic fields in the quark-meson model
Authors:
Rui Wen,
Shi Yin,
Wei-jie Fu,
Mei Huang
Abstract:
We calculated the masses of neutral and charged pion and pion decay constants under an extra magnetic field at zero temperature. The quantum fluctuations are integrated through the functional renormalization group. We consider the quark and meson propagators in the Landau level representation and weak-field expansion, respectively. The neutral pion mass monotonically decreases with the magnetic fi…
▽ More
We calculated the masses of neutral and charged pion and pion decay constants under an extra magnetic field at zero temperature. The quantum fluctuations are integrated through the functional renormalization group. We consider the quark and meson propagators in the Landau level representation and weak-field expansion, respectively. The neutral pion mass monotonically decreases with the magnetic field, while the charged pion mass monotonically increases with the magnetic field. The pion decay constant and the quark mass show the magnetic catalysis behavior at vanishing temperature. The neutral pion mass and pion decay constant are quantitatively in agreement with the lattice QCD results in the region of $eB < 1.2 {\rm GeV}^2$, and no non-monotonic mass behavior for charged pion has been observed in this framework.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
Click: Controllable Text Generation with Sequence Likelihood Contrastive Learning
Authors:
Chujie Zheng,
Pei Ke,
Zheng Zhang,
Minlie Huang
Abstract:
It has always been an important yet challenging problem to control language models to avoid generating texts with undesirable attributes, such as toxic language and unnatural repetition. We introduce Click for controllable text generation, which needs no modification to the model architecture and facilitates out-of-the-box use of trained models. It employs a contrastive loss on sequence likelihood…
▽ More
It has always been an important yet challenging problem to control language models to avoid generating texts with undesirable attributes, such as toxic language and unnatural repetition. We introduce Click for controllable text generation, which needs no modification to the model architecture and facilitates out-of-the-box use of trained models. It employs a contrastive loss on sequence likelihood, which fundamentally decreases the generation probability of negative samples (i.e., generations with undesirable attributes). It also adopts a novel likelihood ranking-based strategy to construct contrastive samples from model generations. On the tasks of language detoxification, sentiment steering, and repetition reduction, we show that Click outperforms strong baselines of controllable text generation and demonstrate the superiority of Click's sample construction strategy.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
Entanglement and Teleportation in a 1-D Network with Repeaters
Authors:
Ganesh Mylavarapu,
Indranil Chakrabarty,
Kaushiki Mukherjee,
Minyi Huang,
Junde Wu
Abstract:
The most simplest form of quantum network is an one dimensional quantum network with a single player in each node. In remote entanglement distribution each of the players carry out measurement at the intermediate nodes to produce an entangled state between initial and final node which are remotely separated. It is imperative to say that the flow of information as well as the percolation of entangl…
▽ More
The most simplest form of quantum network is an one dimensional quantum network with a single player in each node. In remote entanglement distribution each of the players carry out measurement at the intermediate nodes to produce an entangled state between initial and final node which are remotely separated. It is imperative to say that the flow of information as well as the percolation of entanglement in a network between the source and target node is an important area of study. This will help us to understand the limits of the resource states as well as the measurements that are carried out in the process of remote entanglement distribution. In this article we investigate how the concurrence of the final entangled state obtained is connected with the concurrences of the initial entangled states present in a 1-D chain. We extend the works done for the pure entangled states for mixed entangled states like Werner states, Bell diagonal states and for general mixed states. We did not limit ourselves to a situation where the measurements are happening perfectly. We also investigate how these relations change when we consider imperfect swap**. We obtain the limits on the number of swap**s as well as the success probability measurements to ensure the final state to be entangled state after swap**. In addition to these we also investigate on how much quantum information can be sent from the initial node to the final node (by computing the teleportation fidelity) when the measurement is perfect and imperfect with the same set of examples. Here also we obtain the limits on the number of swap** and the success probability of measurement to ensure that the final state obtained is capable of transferring the information . These results have tremendous future applications in sending quantum information between two quantum processors in remote entangled distribution.
△ Less
Submitted 26 June, 2023; v1 submitted 2 June, 2023;
originally announced June 2023.
-
Federated Domain Generalization: A Survey
Authors:
Ying Li,
Xingwei Wang,
Rongfei Zeng,
Praveen Kumar Donta,
Ilir Murturi,
Min Huang,
Schahram Dustdar
Abstract:
Machine learning typically relies on the assumption that training and testing distributions are identical and that data is centrally stored for training and testing. However, in real-world scenarios, distributions may differ significantly and data is often distributed across different devices, organizations, or edge nodes. Consequently, it is imperative to develop models that can effectively gener…
▽ More
Machine learning typically relies on the assumption that training and testing distributions are identical and that data is centrally stored for training and testing. However, in real-world scenarios, distributions may differ significantly and data is often distributed across different devices, organizations, or edge nodes. Consequently, it is imperative to develop models that can effectively generalize to unseen distributions where data is distributed across different domains. In response to this challenge, there has been a surge of interest in federated domain generalization (FDG) in recent years. FDG combines the strengths of federated learning (FL) and domain generalization (DG) techniques to enable multiple source domains to collaboratively learn a model capable of directly generalizing to unseen domains while preserving data privacy. However, generalizing the federated model under domain shifts is a technically challenging problem that has received scant attention in the research area so far. This paper presents the first survey of recent advances in this area. Initially, we discuss the development process from traditional machine learning to domain adaptation and domain generalization, leading to FDG as well as provide the corresponding formal definition. Then, we categorize recent methodologies into four classes: federated domain alignment, data manipulation, learning strategies, and aggregation optimization, and present suitable algorithms in detail for each category. Next, we introduce commonly used datasets, applications, evaluations, and benchmarks. Finally, we conclude this survey by providing some potential research topics for the future.
△ Less
Submitted 1 March, 2024; v1 submitted 2 June, 2023;
originally announced June 2023.
-
Uncertainty-Aware Unlikelihood Learning Improves Generative Aspect Sentiment Quad Prediction
Authors:
Mengting Hu,
Yinhao Bai,
Yike Wu,
Zhen Zhang,
Liqi Zhang,
Hang Gao,
Shiwan Zhao,
Minlie Huang
Abstract:
Recently, aspect sentiment quad prediction has received widespread attention in the field of aspect-based sentiment analysis. Existing studies extract quadruplets via pre-trained generative language models to paraphrase the original sentence into a templated target sequence. However, previous works only focus on what to generate but ignore what not to generate. We argue that considering the negati…
▽ More
Recently, aspect sentiment quad prediction has received widespread attention in the field of aspect-based sentiment analysis. Existing studies extract quadruplets via pre-trained generative language models to paraphrase the original sentence into a templated target sequence. However, previous works only focus on what to generate but ignore what not to generate. We argue that considering the negative samples also leads to potential benefits. In this work, we propose a template-agnostic method to control the token-level generation, which boosts original learning and reduces mistakes simultaneously. Specifically, we introduce Monte Carlo dropout to understand the built-in uncertainty of pre-trained language models, acquiring the noises and errors. We further propose marginalized unlikelihood learning to suppress the uncertainty-aware mistake tokens. Finally, we introduce minimization entropy to balance the effects of marginalized unlikelihood learning. Extensive experiments on four public datasets demonstrate the effectiveness of our approach on various generation templates.
△ Less
Submitted 3 June, 2023; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Magnetic field regression using artificial neural networks for cold atom experiments
Authors:
Ziting Chen,
Kin To Wong,
Bojeong Seo,
Mingchen Huang,
Mithilesh K. Parit,
Haoting Zhen,
Jensen Li,
Gyu-Boong Jo
Abstract:
Accurately measuring magnetic fields is essential for magnetic-field sensitive experiments in fields like atomic, molecular, and optical physics, condensed matter experiments, and other areas. However, since many experiments are conducted in an isolated vacuum environment that is inaccessible to experimentalists, it can be challenging to accurately determine the magnetic field. Here, we propose an…
▽ More
Accurately measuring magnetic fields is essential for magnetic-field sensitive experiments in fields like atomic, molecular, and optical physics, condensed matter experiments, and other areas. However, since many experiments are conducted in an isolated vacuum environment that is inaccessible to experimentalists, it can be challenging to accurately determine the magnetic field. Here, we propose an efficient method for detecting magnetic fields with the assistance of an artificial neural network (NN). Instead of measuring the magnetic field directly at the desired location, we detect magnetic fields at several surrounding positions, and a trained NN can accurately predict the magnetic field at the target location. After training, we achieve a relative error of magnetic field magnitude (magnitude of error over the magnitude of magnetic field) below 0.3$\%$, and we successfully apply this method to our erbium quantum gas apparatus. This approach significantly simplifies the process of determining magnetic fields in isolated vacuum environments and can be applied to various research fields across a wide range of magnetic field magnitudes.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
Tannaka-Krein duality for finite 2-groups
Authors:
Mo Huang,
Zhi-Hao Zhang
Abstract:
Let $\mathcal{G}$ be a finite 2-group. We show that the 2-category $2\mathrm{Rep}(\mathcal{G})$ of finite semisimple 2-representations is a symmetric fusion 2-category. We also relate the auto-equivalence 2-group of the symmetric monoidal forgetful 2-functor $ω: 2\mathrm{Rep}(\mathcal{G}) \to 2\mathrm{Vec}$ to the auto-equivalence 2-group of the regular algebra and show that they are equivalent to…
▽ More
Let $\mathcal{G}$ be a finite 2-group. We show that the 2-category $2\mathrm{Rep}(\mathcal{G})$ of finite semisimple 2-representations is a symmetric fusion 2-category. We also relate the auto-equivalence 2-group of the symmetric monoidal forgetful 2-functor $ω: 2\mathrm{Rep}(\mathcal{G}) \to 2\mathrm{Vec}$ to the auto-equivalence 2-group of the regular algebra and show that they are equivalent to $\mathcal{G}$. This result categorifies the usual Tannaka-Krein duality for finite groups.
△ Less
Submitted 27 September, 2023; v1 submitted 29 May, 2023;
originally announced May 2023.
-
Performance Analysis of Discrete-Phase-Shifter IRS-aided Amplify-and-Forward Relay Network
Authors:
Rongen Dong,
Zhongyi Xie,
Feng Shu,
Mengxing Huang,
Jiangzhou Wang
Abstract:
As a new technology to reconfigure wireless communication environment by signal reflection controlled by software, intelligent reflecting surface (IRS) has attracted lots of attention in recent years. Compared with conventional relay system, the relay system aided by IRS can effectively save the cost and energy consumption, and significantly enhance the system performance. However, the phase quant…
▽ More
As a new technology to reconfigure wireless communication environment by signal reflection controlled by software, intelligent reflecting surface (IRS) has attracted lots of attention in recent years. Compared with conventional relay system, the relay system aided by IRS can effectively save the cost and energy consumption, and significantly enhance the system performance. However, the phase quantization error generated by IRS with discrete phase shifter may degrade the receiving performance of the receiver. To analyze the performance loss arising from IRS phase quantization error, in accordance with the law of large numbers and Rayleigh distribution, the closed-form expressions for the signal-to-noise ratio (SNR) performance loss and achievable rate of the double IRS-aided amplify-and-forward (AF) relay network, which are associated with the number of phase shifter quantization bits, are derived in the Rayleigh channels. In addition, their approximate performance loss closed-form expressions are also derived based on the Taylor series expansion. Simulation results show that the performance losses of SNR and achievable rate decrease with the number of quantization bits increases gradually, and increase with the number $k$ of IRS phase shift elements. The SNR and achievable rate performance losses of the system are smaller than 0.06dB and 0.03bits/s/Hz when $k$ is equal to 4 and 3, respectively.
△ Less
Submitted 4 November, 2023; v1 submitted 29 May, 2023;
originally announced May 2023.
-
E-NER: Evidential Deep Learning for Trustworthy Named Entity Recognition
Authors:
Zhen Zhang,
Mengting Hu,
Shiwan Zhao,
Minlie Huang,
Haotian Wang,
Lemao Liu,
Zhirui Zhang,
Zhe Liu,
Bingzhe Wu
Abstract:
Most named entity recognition (NER) systems focus on improving model performance, ignoring the need to quantify model uncertainty, which is critical to the reliability of NER systems in open environments. Evidential deep learning (EDL) has recently been proposed as a promising solution to explicitly model predictive uncertainty for classification tasks. However, directly applying EDL to NER applic…
▽ More
Most named entity recognition (NER) systems focus on improving model performance, ignoring the need to quantify model uncertainty, which is critical to the reliability of NER systems in open environments. Evidential deep learning (EDL) has recently been proposed as a promising solution to explicitly model predictive uncertainty for classification tasks. However, directly applying EDL to NER applications faces two challenges, i.e., the problems of sparse entities and OOV/OOD entities in NER tasks. To address these challenges, we propose a trustworthy NER framework named E-NER by introducing two uncertainty-guided loss terms to the conventional EDL, along with a series of uncertainty-guided training strategies. Experiments show that E-NER can be applied to multiple NER paradigms to obtain accurate uncertainty estimation. Furthermore, compared to state-of-the-art baselines, the proposed method achieves a better OOV/OOD detection performance and better generalization ability on OOV entities.
△ Less
Submitted 28 May, 2023;
originally announced May 2023.
-
Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy
Authors:
Zhihong Shao,
Yeyun Gong,
Yelong Shen,
Minlie Huang,
Nan Duan,
Weizhu Chen
Abstract:
Large language models are powerful text processors and reasoners, but are still subject to limitations including outdated knowledge and hallucinations, which necessitates connecting them to the world. Retrieval-augmented large language models have raised extensive attention for grounding model generation on external knowledge. However, retrievers struggle to capture relevance, especially for queri…
▽ More
Large language models are powerful text processors and reasoners, but are still subject to limitations including outdated knowledge and hallucinations, which necessitates connecting them to the world. Retrieval-augmented large language models have raised extensive attention for grounding model generation on external knowledge. However, retrievers struggle to capture relevance, especially for queries with complex information needs. Recent work has proposed to improve relevance modeling by having large language models actively involved in retrieval, i.e., to improve retrieval with generation. In this paper, we show that strong performance can be achieved by a method we call Iter-RetGen, which synergizes retrieval and generation in an iterative manner. A model output shows what might be needed to finish a task, and thus provides an informative context for retrieving more relevant knowledge which in turn helps generate a better output in the next iteration. Compared with recent work which interleaves retrieval with generation when producing an output, Iter-RetGen processes all retrieved knowledge as a whole and largely preserves the flexibility in generation without structural constraints. We evaluate Iter-RetGen on multi-hop question answering, fact verification, and commonsense reasoning, and show that it can flexibly leverage parametric knowledge and non-parametric knowledge, and is superior to or competitive with state-of-the-art retrieval-augmented baselines while causing fewer overheads of retrieval and generation. We can further improve performance via generation-augmented retrieval adaptation.
△ Less
Submitted 23 October, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
The Lobster Eye Imager for Astronomy Onboard the SATech-01 Satellite
Authors:
Z. X. Ling,
X. J. Sun,
C. Zhang,
S. L. Sun,
G. **,
S. N. Zhang,
X. F. Zhang,
J. B. Chang,
F. S. Chen,
Y. F. Chen,
Z. W. Cheng,
W. Fu,
Y. X. Han,
H. Li,
J. F. Li,
Y. Li,
Z. D. Li,
P. R. Liu,
Y. H. Lv,
X. H. Ma,
Y. J. Tang,
C. B. Wang,
R. J. Xie,
Y. L. Xue,
A. L. Yan
, et al. (101 additional authors not shown)
Abstract:
The Lobster Eye Imager for Astronomy (LEIA), a pathfinder of the Wide-field X-ray Telescope of the Einstein Probe (EP) mission, was successfully launched onboard the SATech-01 satellite of the Chinese Academy of Sciences on 27 July 2022. In this paper, we introduce the design and on-ground test results of the LEIA instrument. Using state-of-the-art Micro-Pore Optics (MPO), a wide field-of-view (Fo…
▽ More
The Lobster Eye Imager for Astronomy (LEIA), a pathfinder of the Wide-field X-ray Telescope of the Einstein Probe (EP) mission, was successfully launched onboard the SATech-01 satellite of the Chinese Academy of Sciences on 27 July 2022. In this paper, we introduce the design and on-ground test results of the LEIA instrument. Using state-of-the-art Micro-Pore Optics (MPO), a wide field-of-view (FoV) of 346 square degrees (18.6 degrees * 18.6 degrees) of the X-ray imager is realized. An optical assembly composed of 36 MPO chips is used to focus incident X-ray photons, and four large-format complementary metal-oxide semiconductor (CMOS) sensors, each of 6 cm * 6 cm, are used as the focal plane detectors. The instrument has an angular resolution of 4 - 8 arcmin (in FWHM) for the central focal spot of the point spread function, and an effective area of 2 - 3 cm2 at 1 keV in essentially all the directions within the field of view. The detection passband is 0.5 - 4 keV in the soft X-rays and the sensitivity is 2 - 3 * 10-11 erg s-1 cm-2 (about 1 mini-Crab) at 1,000 second observation. The total weight of LEIA is 56 kg and the power is 85 W. The satellite, with a design lifetime of 2 years, operates in a Sun-synchronous orbit of 500 km with an orbital period of 95 minutes. LEIA is paving the way for future missions by verifying in flight the technologies of both novel focusing imaging optics and CMOS sensors for X-ray observation, and by optimizing the working setups of the instrumental parameters. In addition, LEIA is able to carry out scientific observations to find new transients and to monitor known sources in the soft X-ray band, albeit limited useful observing time available.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
Not All Image Regions Matter: Masked Vector Quantization for Autoregressive Image Generation
Authors:
Mengqi Huang,
Zhendong Mao,
Quan Wang,
Yongdong Zhang
Abstract:
Existing autoregressive models follow the two-stage generation paradigm that first learns a codebook in the latent space for image reconstruction and then completes the image generation autoregressively based on the learned codebook. However, existing codebook learning simply models all local region information of images without distinguishing their different perceptual importance, which brings re…
▽ More
Existing autoregressive models follow the two-stage generation paradigm that first learns a codebook in the latent space for image reconstruction and then completes the image generation autoregressively based on the learned codebook. However, existing codebook learning simply models all local region information of images without distinguishing their different perceptual importance, which brings redundancy in the learned codebook that not only limits the next stage's autoregressive model's ability to model important structure but also results in high training cost and slow generation speed. In this study, we borrow the idea of importance perception from classical image coding theory and propose a novel two-stage framework, which consists of Masked Quantization VAE (MQ-VAE) and Stackformer, to relieve the model from modeling redundancy. Specifically, MQ-VAE incorporates an adaptive mask module for masking redundant region features before quantization and an adaptive de-mask module for recovering the original grid image feature map to faithfully reconstruct the original images after quantization. Then, Stackformer learns to predict the combination of the next code and its position in the feature map. Comprehensive experiments on various image generation validate our effectiveness and efficiency. Code will be released at https://github.com/CrossmodalGroup/MaskedVectorQuantization.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization
Authors:
Mengqi Huang,
Zhendong Mao,
Zhuowei Chen,
Yongdong Zhang
Abstract:
Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm that first learns a codebook to encode images as discrete codes, and then completes generation based on the learned codebook. However, they encode fixed-size image regions into fixed-length codes and ignore their naturally different information densities, which results in insufficiency in important…
▽ More
Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm that first learns a codebook to encode images as discrete codes, and then completes generation based on the learned codebook. However, they encode fixed-size image regions into fixed-length codes and ignore their naturally different information densities, which results in insufficiency in important regions and redundancy in unimportant ones, and finally degrades the generation quality and speed. Moreover, the fixed-length coding leads to an unnatural raster-scan autoregressive generation. To address the problem, we propose a novel two-stage framework: (1) Dynamic-Quantization VAE (DQ-VAE) which encodes image regions into variable-length codes based on their information densities for an accurate and compact code representation. (2) DQ-Transformer which thereby generates images autoregressively from coarse-grained (smooth regions with fewer codes) to fine-grained (details regions with more codes) by modeling the position and content of codes in each granularity alternately, through a novel stacked-transformer architecture and shared-content, non-shared position input layers designs. Comprehensive experiments on various generation tasks validate our superiorities in both effectiveness and efficiency. Code will be released at https://github.com/CrossmodalGroup/DynamicVectorQuantization.
△ Less
Submitted 19 May, 2023;
originally announced May 2023.