-
On Completeness of SDP-Based Barrier Certificate Synthesis over Unbounded Domains
Authors:
Hao Wu,
Shenghua Feng,
Ting Gan,
Jie Wang,
Bican Xia,
Naijun Zhan
Abstract:
Barrier certificates, serving as differential invariants that witness system safety, play a crucial role in the verification of cyber-physical systems (CPS). Prevailing computational methods for synthesizing barrier certificates are based on semidefinite programming (SDP) by exploiting Putinar Positivstellensatz. Consequently, these approaches are limited by the Archimedean condition, which requir…
▽ More
Barrier certificates, serving as differential invariants that witness system safety, play a crucial role in the verification of cyber-physical systems (CPS). Prevailing computational methods for synthesizing barrier certificates are based on semidefinite programming (SDP) by exploiting Putinar Positivstellensatz. Consequently, these approaches are limited by the Archimedean condition, which requires all variables to be bounded, i.e., systems are defined over bounded domains. For systems over unbounded domains, unfortunately, existing methods become incomplete and may fail to identify potential barrier certificates.
In this paper, we address this limitation for the unbounded cases. We first give a complete characterization of polynomial barrier certificates by using homogenization, a recent technique in the optimization community to reduce an unbounded optimization problem to a bounded one. Furthermore, motivated by this formulation, we introduce the definition of homogenized systems and propose a complete characterization of a family of non-polynomial barrier certificates with more expressive power. Experimental results demonstrate that our two approaches are more effective while maintaining a comparable level of efficiency.
△ Less
Submitted 30 June, 2024; v1 submitted 24 December, 2023;
originally announced December 2023.
-
Turning Dust into Gold: Distilling Complex Reasoning Capabilities from LLMs by Leveraging Negative Data
Authors:
Yiwei Li,
Peiwen Yuan,
Shaoxiong Feng,
Boyuan Pan,
Bin Sun,
Xinglin Wang,
Heda Wang,
Kan Li
Abstract:
Large Language Models (LLMs) have performed well on various reasoning tasks, but their inaccessibility and numerous parameters hinder wide application in practice. One promising way is distilling the reasoning ability from LLMs to small models by the generated chain-of-thought reasoning paths. In some cases, however, LLMs may produce incorrect reasoning chains, especially when facing complex mathe…
▽ More
Large Language Models (LLMs) have performed well on various reasoning tasks, but their inaccessibility and numerous parameters hinder wide application in practice. One promising way is distilling the reasoning ability from LLMs to small models by the generated chain-of-thought reasoning paths. In some cases, however, LLMs may produce incorrect reasoning chains, especially when facing complex mathematical problems. Previous studies only transfer knowledge from positive samples and drop the synthesized data with wrong answers. In this work, we illustrate the merit of negative data and propose a model specialization framework to distill LLMs with negative samples besides positive ones. The framework consists of three progressive steps, covering from training to inference stages, to absorb knowledge from negative data. We conduct extensive experiments across arithmetic reasoning tasks to demonstrate the role of negative data in distillation from LLM.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Multi-Correlation Siamese Transformer Network with Dense Connection for 3D Single Object Tracking
Authors:
Shihao Feng,
Pengpeng Liang,
** Gao,
Erkang Cheng
Abstract:
Point cloud-based 3D object tracking is an important task in autonomous driving. Though great advances regarding Siamese-based 3D tracking have been made recently, it remains challenging to learn the correlation between the template and search branches effectively with the sparse LIDAR point cloud data. Instead of performing correlation of the two branches at just one point in the network, in this…
▽ More
Point cloud-based 3D object tracking is an important task in autonomous driving. Though great advances regarding Siamese-based 3D tracking have been made recently, it remains challenging to learn the correlation between the template and search branches effectively with the sparse LIDAR point cloud data. Instead of performing correlation of the two branches at just one point in the network, in this paper, we present a multi-correlation Siamese Transformer network that has multiple stages and carries out feature correlation at the end of each stage based on sparse pillars. More specifically, in each stage, self-attention is first applied to each branch separately to capture the non-local context information. Then, cross-attention is used to inject the template information into the search area. This strategy allows the feature learning of the search area to be aware of the template while kee** the individual characteristics of the template intact. To enable the network to easily preserve the information learned at different stages and ease the optimization, for the search area, we densely connect the initial input sparse pillars and the output of each stage to all subsequent stages and the target localization network, which converts pillars to bird's eye view (BEV) feature maps and predicts the state of the target with a small densely connected convolution network. Deep supervision is added to each stage to further boost the performance as well. The proposed algorithm is evaluated on the popular KITTI, nuScenes, and Waymo datasets, and the experimental results show that our method achieves promising performance compared with the state-of-the-art. Ablation study that shows the effectiveness of each component is provided as well. Code is available at https://github.com/liangp/MCSTN-3DSOT.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Test-Time Domain Adaptation by Learning Domain-Aware Batch Normalization
Authors:
Yanan Wu,
Zhixiang Chi,
Yang Wang,
Konstantinos N. Plataniotis,
Songhe Feng
Abstract:
Test-time domain adaptation aims to adapt the model trained on source domains to unseen target domains using a few unlabeled images. Emerging research has shown that the label and domain information is separately embedded in the weight matrix and batch normalization (BN) layer. Previous works normally update the whole network naively without explicitly decoupling the knowledge between label and do…
▽ More
Test-time domain adaptation aims to adapt the model trained on source domains to unseen target domains using a few unlabeled images. Emerging research has shown that the label and domain information is separately embedded in the weight matrix and batch normalization (BN) layer. Previous works normally update the whole network naively without explicitly decoupling the knowledge between label and domain. As a result, it leads to knowledge interference and defective distribution adaptation. In this work, we propose to reduce such learning interference and elevate the domain knowledge learning by only manipulating the BN layer. However, the normalization step in BN is intrinsically unstable when the statistics are re-estimated from a few samples. We find that ambiguities can be greatly reduced when only updating the two affine parameters in BN while kee** the source domain statistics. To further enhance the domain knowledge extraction from unlabeled data, we construct an auxiliary branch with label-independent self-supervised learning (SSL) to provide supervision. Moreover, we propose a bi-level optimization based on meta-learning to enforce the alignment of two learning objectives of auxiliary and main branches. The goal is to use the auxiliary branch to adapt the domain and benefit main task for subsequent inference. Our method keeps the same computational cost at inference as the auxiliary branch can be thoroughly discarded after adaptation. Extensive experiments show that our method outperforms the prior works on five WILDS real-world domain shift datasets. Our method can also be integrated with methods with label-dependent optimization to further push the performance boundary. Our code is available at https://github.com/ynanwu/MABN.
△ Less
Submitted 16 January, 2024; v1 submitted 15 December, 2023;
originally announced December 2023.
-
Heterogenous Network Analytics of Small Group Teamwork: Using Multimodal Data to Uncover Individual Behavioral Engagement Strategies
Authors:
Shihui Feng,
Lixiang Yan,
Linxuan Zhao,
Roberto Martinez Maldonado,
Dragan Gašević
Abstract:
Individual behavioral engagement is an important indicator of active learning in collaborative settings, encompassing multidimensional behaviors mediated through various interaction modes. Little existing work has explored the use of multimodal process data to understand individual behavioral engagement in face-to-face collaborative learning settings. In this study we bridge this gap, for the firs…
▽ More
Individual behavioral engagement is an important indicator of active learning in collaborative settings, encompassing multidimensional behaviors mediated through various interaction modes. Little existing work has explored the use of multimodal process data to understand individual behavioral engagement in face-to-face collaborative learning settings. In this study we bridge this gap, for the first time, introducing a heterogeneous tripartite network approach to analyze the interconnections among multimodal process data in collaborative learning. Students' behavioral engagement strategies are analyzed based on their interaction patterns with various spatial locations and verbal communication types using a heterogeneous tripartite network. The multimodal collaborative learning process data were collected from 15 teams of four students. We conducted stochastic blockmodeling on a projection of the heterogeneous tripartite network to cluster students into groups that shared similar spatial and oral engagement patterns. We found two distinct clusters of students, whose characteristic behavioural engagement strategies were identified by extracting interaction patterns that were statistically significant relative to a multinomial null model. The two identified clusters also exhibited a statistically significant difference regarding students' perceived collaboration satisfaction and teacher-assessed team performance level. This study advances collaboration analytics methodology and provides new insights into personalized support in collaborative learning.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
Designing with Language: Wireframing UI Design Intent with Generative Large Language Models
Authors:
Sidong Feng,
Mingyue Yuan,
Jieshan Chen,
Zhenchang Xing,
Chunyang Chen
Abstract:
Wireframing is a critical step in the UI design process. Mid-fidelity wireframes offer more impactful and engaging visuals compared to low-fidelity versions. However, their creation can be time-consuming and labor-intensive, requiring the addition of actual content and semantic icons. In this paper, we introduce a novel solution WireGen, to automatically generate mid-fidelity wireframes with just…
▽ More
Wireframing is a critical step in the UI design process. Mid-fidelity wireframes offer more impactful and engaging visuals compared to low-fidelity versions. However, their creation can be time-consuming and labor-intensive, requiring the addition of actual content and semantic icons. In this paper, we introduce a novel solution WireGen, to automatically generate mid-fidelity wireframes with just a brief design intent description using the generative Large Language Models (LLMs). Our experiments demonstrate the effectiveness of WireGen in producing 77.5% significantly better wireframes, outperforming two widely-used in-context learning baselines. A user study with 5 designers further validates its real-world usefulness, highlighting its potential value to enhance UI design process.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Rigorous and efficient diffraction modeling between arbitrary planes by angular spectrum rearrangement
Authors:
Yiwen Hu,
Xin Liu,
Shi Feng,
Xu Liu,
Xiang Hao
Abstract:
In computational optics, numerical modeling of diffraction between arbitrary planes offers unparalleled flexibility. However, existing methods suffer from the trade-off between computational accuracy and efficiency. To resolve this dilemma, we present a novel approach that rigorously and efficiently models wave propagation between two arbitrary planes. This is achieved by rearranging the angular s…
▽ More
In computational optics, numerical modeling of diffraction between arbitrary planes offers unparalleled flexibility. However, existing methods suffer from the trade-off between computational accuracy and efficiency. To resolve this dilemma, we present a novel approach that rigorously and efficiently models wave propagation between two arbitrary planes. This is achieved by rearranging the angular spectrum of the source field, coupled with linear algebraic computations. Notably, our method achieves comparable computational efficiency to the control method for both scalar and vectorial diffraction modeling, while eliminating nearly all numerical errors. Furthermore, we selectively merge the angular spectrum to further enhance the efficiency at the expense of precision in a controlled manner. Thereafter, the time consumption is reduced to at most 3% of that before merging.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Hund's coupling driven interorbital entanglement in orbital-selective Mott phase
Authors:
Yuekun Niu,
Yu Ni,
Haishan Zhang,
Liang Qiu,
Jianli Wang,
Leiming Chen,
Yun Song,
Shi** Feng
Abstract:
We examine the orbital-selective Mott transition in the non-hybridized two-band Hubbard model using the dynamical mean-field theory. We find that the orbital-selective Mott transition could be depicted by the local quantum state fidelity. Additionally, within the orbital-selective Mott phase, the combined characteristics of the two orbitals lead to the presence of interorbital entanglement, which…
▽ More
We examine the orbital-selective Mott transition in the non-hybridized two-band Hubbard model using the dynamical mean-field theory. We find that the orbital-selective Mott transition could be depicted by the local quantum state fidelity. Additionally, within the orbital-selective Mott phase, the combined characteristics of the two orbitals lead to the presence of interorbital entanglement, which is characterized by the non-semi-integer values of local quantum state fidelity. It is demonstrated that this entanglement is driven by transverse Hund's coupling, and the mechanisms underlying the orbital-selective Mott transition show prominent variations depending on the presence or absence of Hund's coupling and its transverse terms.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
Dynamic Adjustment of Matching Radii under the Broadcasting Mode: A Novel Multitask Learning Strategy and Temporal Modeling Approach
Authors:
Taijie Chen,
Zijian Shen,
Siyuan Feng,
Linchuan Yang,
**tao Ke
Abstract:
As ride-hailing services have experienced significant growth, the majority of research has concentrated on the dispatching mode, where drivers must adhere to the platform's assigned routes. However, the broadcasting mode, in which drivers can freely choose their preferred orders from those broadcast by the platform, has received less attention. One important but challenging task in such a system i…
▽ More
As ride-hailing services have experienced significant growth, the majority of research has concentrated on the dispatching mode, where drivers must adhere to the platform's assigned routes. However, the broadcasting mode, in which drivers can freely choose their preferred orders from those broadcast by the platform, has received less attention. One important but challenging task in such a system is the determination of the optimal matching radius, which usually varies across space, time, and real-time supply/demand characteristics. This study develops a Transformer-Encoder-Based (TEB) model that predicts key system performance metrics for a range of matching radii, which enables the ride-hailing platform to select an optimal matching radius that maximizes overall system performance according to real-time supply and demand information. To simultaneously maximize multiple system performance metrics for matching radius determination, we devise a novel multi-task learning algorithm that enhances convergence speed of each task (corresponding to the optimization of one metric) and delivers more accurate overall predictions. We evaluate our methods in a simulation environment specifically designed for broadcasting-mode-based ride-hailing service. Our findings reveal that dynamically adjusting matching radii based on our proposed predict-then-optimize approach significantly improves system performance, e.g., increasing platform revenue by 7.55% and enhancing order fulfillment rate by 13% compared to benchmark algorithms.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
DPI: Ensuring Strict Differential Privacy for Infinite Data Streaming
Authors:
Shuya Feng,
Meisam Mohammady,
Han Wang,
Xiaochen Li,
Zhan Qin,
Yuan Hong
Abstract:
Streaming data, crucial for applications like crowdsourcing analytics, behavior studies, and real-time monitoring, faces significant privacy risks due to the large and diverse data linked to individuals. In particular, recent efforts to release data streams, using the rigorous privacy notion of differential privacy (DP), have encountered issues with unbounded privacy leakage. This challenge limits…
▽ More
Streaming data, crucial for applications like crowdsourcing analytics, behavior studies, and real-time monitoring, faces significant privacy risks due to the large and diverse data linked to individuals. In particular, recent efforts to release data streams, using the rigorous privacy notion of differential privacy (DP), have encountered issues with unbounded privacy leakage. This challenge limits their applicability to only a finite number of time slots (''finite data stream'') or relaxation to protecting the events (''event or $w$-event DP'') rather than all the records of users. A persistent challenge is managing the sensitivity of outputs to inputs in situations where users contribute many activities and data distributions evolve over time. In this paper, we present a novel technique for Differentially Private data streaming over Infinite disclosure (DPI) that effectively bounds the total privacy leakage of each user in infinite data streams while enabling accurate data collection and analysis. Furthermore, we also maximize the accuracy of DPI via a novel boosting mechanism. Finally, extensive experiments across various streaming applications and real datasets (e.g., COVID-19, Network Traffic, and USDA Production), show that DPI maintains high utility for infinite data streams in diverse settings. Code for DPI is available at https://github.com/ShuyaFeng/DPI.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
FERGI: Automatic Annotation of User Preferences for Text-to-Image Generation from Spontaneous Facial Expression Reaction
Authors:
Shuangquan Feng,
Junhua Ma,
Virginia R. de Sa
Abstract:
Researchers have proposed to use data of human preference feedback to fine-tune text-to-image generative models. However, the scalability of human feedback collection has been limited by its reliance on manual annotation. Therefore, we develop and test a method to automatically annotate user preferences from their spontaneous facial expression reaction to the generated images. We collect a dataset…
▽ More
Researchers have proposed to use data of human preference feedback to fine-tune text-to-image generative models. However, the scalability of human feedback collection has been limited by its reliance on manual annotation. Therefore, we develop and test a method to automatically annotate user preferences from their spontaneous facial expression reaction to the generated images. We collect a dataset of Facial Expression Reaction to Generated Images (FERGI) and show that the activations of multiple facial action units (AUs) are highly correlated with user evaluations of the generated images. Specifically, AU4 (brow lowerer) is reflective of negative evaluations of the generated image whereas AU12 (lip corner puller) is reflective of positive evaluations. These can be useful in two ways. Firstly, we can automatically annotate user preferences between image pairs with substantial difference in these AU responses with an accuracy significantly outperforming state-of-the-art scoring models. Secondly, directly integrating the AU responses with the scoring models improves their consistency with human preferences. Finally, this method of automatic annotation with facial expression analysis can be potentially generalized to other generation tasks. The code is available at https://github.com/ShuangquanFeng/FERGI, and the dataset is also available at the same link for research purposes.
△ Less
Submitted 21 March, 2024; v1 submitted 5 December, 2023;
originally announced December 2023.
-
The measurement of masses of OB-type stars from LAMOST DR5
Authors:
Zhenyan Huo,
Zhicun Liu,
Wenyuan Cui,
Chao Liu,
Jiaming Liu,
Mingxu Sun,
Shuai Feng,
Linlin Li
Abstract:
The measurements of masses and luminosities of massive stars play an important role in understanding the formation and evolution of their host galaxies. In this work, we present the measurement of masses and luminosities of 2,946 OB-type stars, including 78 O-type stars and 2,868 B-type stars, based on their stellar parameters (effective temperature, surface gravity, and metallicity) and PARSEC is…
▽ More
The measurements of masses and luminosities of massive stars play an important role in understanding the formation and evolution of their host galaxies. In this work, we present the measurement of masses and luminosities of 2,946 OB-type stars, including 78 O-type stars and 2,868 B-type stars, based on their stellar parameters (effective temperature, surface gravity, and metallicity) and PARSEC isochrones model. Our results show that the median mass and luminosity of the 2,946 OB-type stars are 5.4 M$_{\odot}$ and log(L/L$_{\odot}$)=3.2 with the median relative error of 21.4$\%$ and 71.1$\%$, respectively. A good agreement between our results estimated by using our method and those derived by using the orbital motions of binary stars from the literature is found for some B-type stars. In addition, we also fit the mass-luminosity relation of B-type stars by using our derived mass and the luminosity from $Gaia$ DR3.
△ Less
Submitted 29 November, 2023; v1 submitted 27 November, 2023;
originally announced November 2023.
-
Identification of Blue Horizontal-Branch Stars From LAMOST DR5
Authors:
Jie Ju,
Wenyuan Cui,
Zhenyan Huo,
Chao liu,
Xiangxiang Xue,
Jiaming Liu,
Shuai Feng,
Mingxu Sun,
Linlin Li
Abstract:
We construct a new catalog of the blue horizontal-branch (BHB) stars from the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) DR5 dataset, which contains 5355+81 BHB stars at high Galactic latitude (($|Glat|>20^{\circ}$). We combine the spectral line indices with a set of Balmer line profile selection criteria to identify the BHB stars. During the selection process, we use the l…
▽ More
We construct a new catalog of the blue horizontal-branch (BHB) stars from the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) DR5 dataset, which contains 5355+81 BHB stars at high Galactic latitude (($|Glat|>20^{\circ}$). We combine the spectral line indices with a set of Balmer line profile selection criteria to identify the BHB stars. During the selection process, we use the line index of \ion{Ca}{2}\,K to exclude the metal-rich A-type dwarfs. We obtain their atmospheric parameters by cross-matching our BHB stars with the catalog provided by \citet{Xiang2022}. The results show that our sample is consistent with the theoretical $T_{\rm eff}$-log\,$g$ evolutionary tracks of the BHB stars, indicating that our method is robust for identifying BHB stars from the LAMOST spectra. Their spatial distribution indicates that most of our BHB stars are located in the inner halo or the disk of the Milky Way. Combined with other BHB samples from the literature, the BHB stars can cover a large Galactic volume, which makes it a better probe for studying the kinematics, dynamics, and structural characteristics of the Milky Way.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Protein-ligand binding representation learning from fine-grained interactions
Authors:
Shikun Feng,
Minghao Li,
Yinjun Jia,
Weiying Ma,
Yanyan Lan
Abstract:
The binding between proteins and ligands plays a crucial role in the realm of drug discovery. Previous deep learning approaches have shown promising results over traditional computationally intensive methods, but resulting in poor generalization due to limited supervised data. In this paper, we propose to learn protein-ligand binding representation in a self-supervised learning manner. Different f…
▽ More
The binding between proteins and ligands plays a crucial role in the realm of drug discovery. Previous deep learning approaches have shown promising results over traditional computationally intensive methods, but resulting in poor generalization due to limited supervised data. In this paper, we propose to learn protein-ligand binding representation in a self-supervised learning manner. Different from existing pre-training approaches which treat proteins and ligands individually, we emphasize to discern the intricate binding patterns from fine-grained interactions. Specifically, this self-supervised learning problem is formulated as a prediction of the conclusive binding complex structure given a pocket and ligand with a Transformer based interaction module, which naturally emulates the binding process. To ensure the representation of rich binding information, we introduce two pre-training tasks, i.e.~atomic pairwise distance map prediction and mask ligand reconstruction, which comprehensively model the fine-grained interactions from both structure and feature space. Extensive experiments have demonstrated the superiority of our method across various binding tasks, including protein-ligand affinity prediction, virtual screening and protein-ligand docking.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
The Clumpy Structure Of Five Star-bursting Dwarf Galaxies In The MaNGA Survey
Authors:
Mengting Ju,
Jun Yin,
Lei Hao,
Chenxu Liu,
Chao-Wei Tsai,
Junfeng Wang,
Zhengyi Shao,
Shuai Feng,
Yu Rong
Abstract:
The star-forming clumps in star-bursting dwarf galaxies provide valuable insights into the understanding of the evolution of dwarf galaxies. In this paper, we focus on five star-bursting dwarf galaxies featuring off-centered clumps in the Map** Nearby Galaxies at Apache Point Observatory (MaNGA) survey. Using the stellar population synthesis software FADO, we obtain the spatially-resolved distri…
▽ More
The star-forming clumps in star-bursting dwarf galaxies provide valuable insights into the understanding of the evolution of dwarf galaxies. In this paper, we focus on five star-bursting dwarf galaxies featuring off-centered clumps in the Map** Nearby Galaxies at Apache Point Observatory (MaNGA) survey. Using the stellar population synthesis software FADO, we obtain the spatially-resolved distribution of the star formation history, which allows us to construct the $g$-band images of the five galaxies at different ages. These images can help us to probe the evolution of the morphological structures of these galaxies. While images of stellar population older than 1 Gyr are typically smooth, images of stellar population younger than 1 Gyr reveal significant clumps, including multiple clumps which appear at different locations and even different ages. To study the evolutionary connections of these five galaxies to other dwarf galaxies before their star-forming clumps appear, we construct the images of the stellar populations older than three age nodes, and define them to be the images of the "host" galaxies. We find that the properties such as the central surface brightness and the effective radii of the hosts of the five galaxies are in between those of dwarf ellipticals (dEs) and dwarf irregulars (dIrrs), with two clearly more similar to dEs and one more similar to dIrrs. Among the five galaxies, 8257-3704 is particularly interesting, as it shows a previous starburst event that is not quite visible from its $gri$ image, but only visible from images of the stellar population at a few hundred million years. The star-forming clump associated with this event may have appeared at around 600 Myr and disappeared at around 40 Myr.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
An improved method to measure $\rm ^{12}C/^{13}C$ and $\rm ^{14}N/^{15}N$ abundance ratios: revisiting CN isotopologues in the Galactic outer disk
Authors:
Yichen Sun,
Zhi-Yu Zhang,
Junzhi Wang,
Lingrui Lin,
Padelis P. Papadopoulos,
Donatella Romano,
Siyi Feng,
Yan Sun,
Bo Zhang,
Francesca Matteucci
Abstract:
The variations of elemental abundance and their ratios along the Galactocentric radius result from the chemical evolution of the Milky Way disks. The $\rm ^{12}C/^{13}C$ ratio in particular is often used as a proxy to determine other isotopic ratios, such as $\rm ^{16}O/^{18}O$ and $\rm ^{14}N/^{15}N$. Measurements of $\rm ^{12}CN$ and $\rm ^{13}CN$ (or $\rm C^{15}N$) -- with their optical depths…
▽ More
The variations of elemental abundance and their ratios along the Galactocentric radius result from the chemical evolution of the Milky Way disks. The $\rm ^{12}C/^{13}C$ ratio in particular is often used as a proxy to determine other isotopic ratios, such as $\rm ^{16}O/^{18}O$ and $\rm ^{14}N/^{15}N$. Measurements of $\rm ^{12}CN$ and $\rm ^{13}CN$ (or $\rm C^{15}N$) -- with their optical depths corrected via their hyper-fine structure lines -- have traditionally been exploited to constrain the Galactocentric gradients of the CNO isotopic ratios. Such methods typically make several simplifying assumptions (e.g. a filling factor of unity, the Rayleigh-Jeans approximation, and the neglect of the cosmic microwave background) while adopting a single average gas phase. However, these simplifications introduce significant biases to the measured $\rm ^{12}C/^{13}C$ and $\rm ^{14}N/^{15}N$. We demonstrate that exploiting the optically thin satellite lines of $\rm ^{12}CN$ constitutes a more reliable new method to derive $\rm ^{12}C/^{13}C$ and $\rm ^{14}N/^{15}N$ from CN isotopologues. We apply this satellite-line method to new IRAM 30-m observations of $\rm ^{12}CN$, $\rm ^{13}CN$, and $\rm C^{15}N$ $N=1\to0$ towards 15 metal-poor molecular clouds in the Galactic outer disk ($R_{\rm gc} > $ 12 kpc), supplemented by data from the literature. After updating their Galactocentric distances, we find that $\rm ^{12}C/^{13}C$ and $\rm ^{14}N/^{15}N$ gradients are in good agreement with those derived using independent optically thin molecular tracers, even in regions with the lowest metallicities. We therefore recommend using optically thin tracers for Galactic and extragalactic CNO isotopic measurements, which avoids the biases associated with the traditional method.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Density distributions, magnetic field structures and fragmentation in high-mass star formation
Authors:
H. Beuther,
C. Gieser,
J. D. Soler,
Q. Zhang,
R. Rao,
D. Semenov,
Th. Henning,
R. Pudritz,
T. Peters,
P. Klaassen,
M. T. Beltran,
A. Palau,
T. Moeller,
K. G. Johnston,
H. Zinnecker,
J. Urquhart,
R. Kuiper,
A. Ahmadi,
A. Sanchez-Monge,
S. Feng,
S. Leurini,
S. E. Ragan
Abstract:
Methods: Observing the large pc-scale Stokes I mm dust continuum emission with the IRAM 30m telescope and the intermediate-scale (<0.1pc) polarized submm dust emission with the Submillimeter Array toward a sample of 20 high-mass star-forming regions allows us to quantify the dependence of the fragmentation behaviour of these regions depending on the density and magnetic field structures.
Results…
▽ More
Methods: Observing the large pc-scale Stokes I mm dust continuum emission with the IRAM 30m telescope and the intermediate-scale (<0.1pc) polarized submm dust emission with the Submillimeter Array toward a sample of 20 high-mass star-forming regions allows us to quantify the dependence of the fragmentation behaviour of these regions depending on the density and magnetic field structures.
Results: We infer density distributions n~r^{-p} of the regions with typical power-law slopes p around ~1.5. There is no obvious correlation between the power-law slopes of the density structures on larger clump scales (~1pc) and the number of fragments on smaller core scales (<0.1pc). Comparing the large-scale single-dish density profiles to those derived earlier from interferometric observations at smaller spatial scales, we find that the smaller-scale power-law slopes are steeper, typically around ~2.0. The flattening toward larger scales is consistent with the star-forming regions being embedded in larger cloud structures that do not decrease in density away from a particular core. Regarding the magnetic field, for several regions it appears aligned with filamentary structures leading toward the densest central cores. Furthermore, we find different polarization structures with some regions exhibiting central polarization holes whereas other regions show polarized emission also toward the central peak positions. Nevertheless, the polarized intensities are inversely related to the Stokes I intensities. We estimate magnetic field strengths between ~0.2 and ~4.5mG, and we find no clear correlation between magnetic field strength and the fragmentation level of the regions. Comparison of the turbulent to magnetic energies shows that they are of roughly equal importance in this sample. The mass-to-flux ratios range between ~2 and ~7, consistent with collapsing star-forming regions.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
P^3SUM: Preserving Author's Perspective in News Summarization with Diffusion Language Models
Authors:
Yuhan Liu,
Shangbin Feng,
Xiaochuang Han,
Vidhisha Balachandran,
Chan Young Park,
Sachin Kumar,
Yulia Tsvetkov
Abstract:
In this work, we take a first step towards designing summarization systems that are faithful to the author's intent, not only the semantic content of the article. Focusing on a case study of preserving political perspectives in news summarization, we find that existing approaches alter the political opinions and stances of news articles in more than 50% of summaries, misrepresenting the intent and…
▽ More
In this work, we take a first step towards designing summarization systems that are faithful to the author's intent, not only the semantic content of the article. Focusing on a case study of preserving political perspectives in news summarization, we find that existing approaches alter the political opinions and stances of news articles in more than 50% of summaries, misrepresenting the intent and perspectives of the news authors. We thus propose P^3SUM, a diffusion model-based summarization approach controlled by political perspective classifiers. In P^3SUM, the political leaning of a generated summary is iteratively evaluated at each decoding step, and any drift from the article's original stance incurs a loss back-propagated to the embedding layers, steering the political stance of the summary at inference time. Extensive experiments on three news summarization datasets demonstrate that P^3SUM outperforms state-of-the-art summarization systems and large language models by up to 13.7% in terms of the success rate of stance preservation, with competitive performance on standard metrics of summarization quality. Our findings present a first analysis of preservation of pragmatic features in summarization, highlight the lacunae in existing summarization models -- that even state-of-the-art models often struggle to preserve author's intents -- and develop new summarization systems that are more faithful to author's perspectives.
△ Less
Submitted 4 April, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Speech-based Slot Filling using Large Language Models
Authors:
Guangzhi Sun,
Shutong Feng,
Dongcheng Jiang,
Chao Zhang,
Milica Gašić,
Philip C. Woodland
Abstract:
Recently, advancements in large language models (LLMs) have shown an unprecedented ability across various language tasks. This paper investigates the potential application of LLMs to slot filling with noisy ASR transcriptions, via both in-context learning and task-specific fine-tuning. Dedicated prompt designs and fine-tuning approaches are proposed to improve the robustness of LLMs for slot filli…
▽ More
Recently, advancements in large language models (LLMs) have shown an unprecedented ability across various language tasks. This paper investigates the potential application of LLMs to slot filling with noisy ASR transcriptions, via both in-context learning and task-specific fine-tuning. Dedicated prompt designs and fine-tuning approaches are proposed to improve the robustness of LLMs for slot filling with noisy ASR transcriptions. Moreover, a linearised knowledge injection (LKI) scheme is also proposed to integrate dynamic external knowledge into LLMs. Experiments were performed on SLURP to quantify the performance of LLMs, including GPT-3.5-turbo, GPT-4, LLaMA-13B and Vicuna-13B (v1.1 and v1.5) with different ASR error rates. The use of the proposed fine-tuning together with the LKI scheme for LLaMA-13B achieved an 8.3% absolute SLU-F1 improvement compared to the strong Flan-T5-base baseline system on a limited data setup.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Einstein locality: An ignored core element of quantum mechanics
Authors:
Sheng Feng
Abstract:
Quantum mechanics is commonly accepted as a complete theory thanks to experimental tests of non-locality based on Bell's theorem. However, we discover that the completeness of the quantum theory practically suffered from detrimental ignorance of a core element -- Einstein locality. Without this element, important experimental results of relevance could hardly receive full understanding or were eve…
▽ More
Quantum mechanics is commonly accepted as a complete theory thanks to experimental tests of non-locality based on Bell's theorem. However, we discover that the completeness of the quantum theory practically suffered from detrimental ignorance of a core element -- Einstein locality. Without this element, important experimental results of relevance could hardly receive full understanding or were even completely misinterpreted. Here we present the discovery with a theory of Einstein locality developed to recover the completeness of quantum mechanics. The developed theory provides a unified framework to account for the results of, e.g., Bell experiments (on Bell non-locality) and double-slit experiments with entangled photons (on wave-particle duality). The theory reveals the dynamics of Bell non-locality and the principle of biased sampling in measurement in the double-slit experiments, which otherwise will be impossible tasks without introducing Einstein locality. Worse still, ignorance of this element has caused misinterpretation of observations in the double-slit experiments, leading to perplexing statements of duality violation. Einstein locality also manifests indispensability in theory by its connection to the foundations of other fundamental concepts and topics (e.g., entanglement, decoherence, and quantum measurement) and may advance quantum technology by offering a promising approach to optimizing quantum computing hardware.
△ Less
Submitted 19 March, 2024; v1 submitted 6 November, 2023;
originally announced November 2023.
-
Sliced Denoising: A Physics-Informed Molecular Pre-Training Method
Authors:
Yuyan Ni,
Shikun Feng,
Wei-Ying Ma,
Zhi-Ming Ma,
Yanyan Lan
Abstract:
While molecular pre-training has shown great potential in enhancing drug discovery, the lack of a solid physical interpretation in current methods raises concerns about whether the learned representation truly captures the underlying explanatory factors in observed data, ultimately resulting in limited generalization and robustness. Although denoising methods offer a physical interpretation, their…
▽ More
While molecular pre-training has shown great potential in enhancing drug discovery, the lack of a solid physical interpretation in current methods raises concerns about whether the learned representation truly captures the underlying explanatory factors in observed data, ultimately resulting in limited generalization and robustness. Although denoising methods offer a physical interpretation, their accuracy is often compromised by ad-hoc noise design, leading to inaccurate learned force fields. To address this limitation, this paper proposes a new method for molecular pre-training, called sliced denoising (SliDe), which is based on the classical mechanical intramolecular potential theory. SliDe utilizes a novel noise strategy that perturbs bond lengths, angles, and torsion angles to achieve better sampling over conformations. Additionally, it introduces a random slicing approach that circumvents the computationally expensive calculation of the Jacobian matrix, which is otherwise essential for estimating the force field. By aligning with physical principles, SliDe shows a 42\% improvement in the accuracy of estimated force fields compared to current state-of-the-art denoising methods, and thus outperforms traditional baselines on various molecular property prediction tasks.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Authors:
Ruihang Lai,
Junru Shao,
Siyuan Feng,
Steven S. Lyubomirsky,
Bohan Hou,
Wuwei Lin,
Zihao Ye,
Hongyi **,
Yuchen **,
Jiawei Liu,
Lesheng **,
Yaxing Cai,
Ziheng Jiang,
Yong Wu,
Sunghyun Park,
Prakalp Srivastava,
Jared G. Roesch,
Todd C. Mowry,
Tianqi Chen
Abstract:
Dynamic shape computations have become critical in modern machine learning workloads, especially in emerging large language models. The success of these models has driven demand for deploying them to a diverse set of backend environments. In this paper, we present Relax, a compiler abstraction for optimizing end-to-end dynamic machine learning workloads. Relax introduces first-class symbolic shape…
▽ More
Dynamic shape computations have become critical in modern machine learning workloads, especially in emerging large language models. The success of these models has driven demand for deploying them to a diverse set of backend environments. In this paper, we present Relax, a compiler abstraction for optimizing end-to-end dynamic machine learning workloads. Relax introduces first-class symbolic shape annotations to track dynamic shape computations globally across the program. It also introduces a cross-level abstraction that encapsulates computational graphs, loop-level tensor programs, and library calls in a single representation to enable cross-level optimizations. We build an end-to-end compilation framework using the proposed approach to optimize dynamic shape models. Experimental results on large language models show that Relax delivers performance competitive with state-of-the-art hand-optimized systems across platforms and enables deployment of emerging dynamic models to a broader set of environments, including mobile phones, embedded devices, and web browsers.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
The bottleneck and ceiling effects in quantized tracking control of heterogeneous multi-agent systems under DoS attacks
Authors:
Shuai Feng,
Maopeng Ran,
Baoyong Zhang,
Lihua Xie,
Shengyuan Xu
Abstract:
In this paper, we investigate tracking control of heterogeneous multi-agent systems under Denial-of-Service (DoS) attacks and state quantization. Dynamic quantized mechanisms are designed for inter-follower communication and leader-follower communication. Zooming-in and out factors, and data rates of both mechanisms for preventing quantizer saturation are provided. Our results show that by tuning…
▽ More
In this paper, we investigate tracking control of heterogeneous multi-agent systems under Denial-of-Service (DoS) attacks and state quantization. Dynamic quantized mechanisms are designed for inter-follower communication and leader-follower communication. Zooming-in and out factors, and data rates of both mechanisms for preventing quantizer saturation are provided. Our results show that by tuning the inter-follower quantized controller, one cannot improve the resilience beyond a level determined by the data rate of leader-follower quantized communication, i.e., the ceiling effect. Otherwise, overflow of followers' state quantizer can occur. On the other hand, if one selects a "large" data rate for leader-follower quantized communication, then the inter-follower quantized communication determines the resilience, and further increasing the data rate for leader-follower quantized communication cannot improve the resilience, i.e., the bottleneck effect. Simulation examples are provided to justify the results of our paper.
△ Less
Submitted 31 October, 2023;
originally announced November 2023.
-
A Survey of Methods for Estimating Hurst Exponent of Time Sequence
Authors:
Hong-Yan Zhang,
Zhi-Qiang Feng,
Si-Yu Feng,
Yu Zhou
Abstract:
The Hurst exponent is a significant indicator for characterizing the self-similarity and long-term memory properties of time sequences. It has wide applications in physics, technologies, engineering, mathematics, statistics, economics, psychology and so on. Currently, available methods for estimating the Hurst exponent of time sequences can be divided into different categories: time-domain methods…
▽ More
The Hurst exponent is a significant indicator for characterizing the self-similarity and long-term memory properties of time sequences. It has wide applications in physics, technologies, engineering, mathematics, statistics, economics, psychology and so on. Currently, available methods for estimating the Hurst exponent of time sequences can be divided into different categories: time-domain methods and spectrum-domain methods based on the representation of time sequence, linear regression methods and Bayesian methods based on parameter estimation methods. Although various methods are discussed in literature, there are still some deficiencies: the descriptions of the estimation algorithms are just mathematics-oriented and the pseudo-codes are missing; the effectiveness and accuracy of the estimation algorithms are not clear; the classification of estimation methods is not considered and there is a lack of guidance for selecting the estimation methods. In this work, the emphasis is put on thirteen dominant methods for estimating the Hurst exponent. For the purpose of decreasing the difficulty of implementing the estimation methods with computer programs, the mathematical principles are discussed briefly and the pseudo-codes of algorithms are presented with necessary details. It is expected that the survey could help the researchers to select, implement and apply the estimation algorithms of interest in practical situations in an easy way.
△ Less
Submitted 29 October, 2023;
originally announced October 2023.
-
Does or did the supernova remnant Cassiopeia A operate as a PeVatron?
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
For decades, supernova remnants (SNRs) have been considered the prime sources of Galactic Cosmic rays (CRs). But whether SNRs can accelerate CR protons to PeV energies and thus dominate CR flux up to the knee is currently under intensive theoretical and phenomenological debate. The direct test of the ability of SNRs to operate as CR PeVatrons can be provided by ultrahigh-energy (UHE;…
▽ More
For decades, supernova remnants (SNRs) have been considered the prime sources of Galactic Cosmic rays (CRs). But whether SNRs can accelerate CR protons to PeV energies and thus dominate CR flux up to the knee is currently under intensive theoretical and phenomenological debate. The direct test of the ability of SNRs to operate as CR PeVatrons can be provided by ultrahigh-energy (UHE; $E_γ\geq 100$~TeV) $γ$-rays. In this context, the historical SNR Cassiopeia A (Cas A) is considered one of the most promising target for UHE observations. This paper presents the observation of Cas A and its vicinity by the LHAASO KM2A detector. The exceptional sensitivity of LHAASO KM2A in the UHE band, combined with the young age of Cas A, enabled us to derive stringent model-independent limits on the energy budget of UHE protons and nuclei accelerated by Cas A at any epoch after the explosion. The results challenge the prevailing paradigm that Cas A-type SNRs are major suppliers of PeV CRs in the Milky Way.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
An Efficient Method for Realizing Contractions of Access Structures in Cloud Storage
Authors:
Shuai Feng,
Liang Feng Zhang
Abstract:
In single-cloud storage, ciphertext-policy attribute-based encryption (CP-ABE) allows one to encrypt any data under an access structure to a cloud server, specifying what attributes are required to decrypt. In multi-cloud storage, a secret sharing scheme (SSS) allows one to split any data into multiple shares, one to a single server, and specify which subset of the servers are able to recover the…
▽ More
In single-cloud storage, ciphertext-policy attribute-based encryption (CP-ABE) allows one to encrypt any data under an access structure to a cloud server, specifying what attributes are required to decrypt. In multi-cloud storage, a secret sharing scheme (SSS) allows one to split any data into multiple shares, one to a single server, and specify which subset of the servers are able to recover the data. It is an interesting problem to remove some attributes/servers but still enable the remaining attributes/servers in every authorized set to recover the data. The problem is related to the contraction problem of access structures for SSSs. In this paper, we propose a method that can efficiently transform a given SSS for an access structure to SSSs for contractions of the access structure. We show its applications in solving the attribute removal problem in the CP-ABE based single-cloud storage and the data relocating problem in multi-cloud storage. Our method results in solutions that require either less server storage or even no additional server storage.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
UniMAP: Universal SMILES-Graph Representation Learning
Authors:
Shikun Feng,
Lixin Yang,
Weiying Ma,
Yanyan Lan
Abstract:
Molecular representation learning is fundamental for many drug related applications. Most existing molecular pre-training models are limited in using single molecular modality, either SMILES or graph representation. To effectively leverage both modalities, we argue that it is critical to capture the fine-grained 'semantics' between SMILES and graph, because subtle sequence/graph differences may le…
▽ More
Molecular representation learning is fundamental for many drug related applications. Most existing molecular pre-training models are limited in using single molecular modality, either SMILES or graph representation. To effectively leverage both modalities, we argue that it is critical to capture the fine-grained 'semantics' between SMILES and graph, because subtle sequence/graph differences may lead to contrary molecular properties. In this paper, we propose a universal SMILE-graph representation learning model, namely UniMAP. Firstly, an embedding layer is employed to obtain the token and node/edge representation in SMILES and graph, respectively. A multi-layer Transformer is then utilized to conduct deep cross-modality fusion. Specially, four kinds of pre-training tasks are designed for UniMAP, including Multi-Level Cross-Modality Masking (CMM), SMILES-Graph Matching (SGM), Fragment-Level Alignment (FLA), and Domain Knowledge Learning (DKL). In this way, both global (i.e. SGM and DKL) and local (i.e. CMM and FLA) alignments are integrated to achieve comprehensive cross-modality fusion. We evaluate UniMAP on various downstream tasks, i.e. molecular property prediction, drug-target affinity prediction and drug-drug interaction. Experimental results show that UniMAP outperforms current state-of-the-art pre-training methods.We also visualize the learned representations to demonstrate the effect of multi-modality integration.
△ Less
Submitted 22 October, 2023;
originally announced October 2023.
-
Large Language Models Help Humans Verify Truthfulness -- Except When They Are Convincingly Wrong
Authors:
Chenglei Si,
Navita Goyal,
Sherry Tongshuang Wu,
Chen Zhao,
Shi Feng,
Hal Daumé III,
Jordan Boyd-Graber
Abstract:
Large Language Models (LLMs) are increasingly used for accessing information on the web. Their truthfulness and factuality are thus of great interest. To help users make the right decisions about the information they get, LLMs should not only provide information but also help users fact-check it. Our experiments with 80 crowdworkers compare language models with search engines (information retrieva…
▽ More
Large Language Models (LLMs) are increasingly used for accessing information on the web. Their truthfulness and factuality are thus of great interest. To help users make the right decisions about the information they get, LLMs should not only provide information but also help users fact-check it. Our experiments with 80 crowdworkers compare language models with search engines (information retrieval systems) at facilitating fact-checking. We prompt LLMs to validate a given claim and provide corresponding explanations. Users reading LLM explanations are significantly more efficient than those using search engines while achieving similar accuracy. However, they over-rely on the LLMs when the explanation is wrong. To reduce over-reliance on LLMs, we ask LLMs to provide contrastive information - explain both why the claim is true and false, and then we present both sides of the explanation to users. This contrastive explanation mitigates users' over-reliance on LLMs, but cannot significantly outperform search engines. Further, showing both search engine results and LLM explanations offers no complementary benefits compared to search engines alone. Taken together, our study highlights that natural language explanations by LLMs may not be a reliable replacement for reading the retrieved passages, especially in high-stakes settings where over-relying on wrong AI explanations could lead to critical consequences.
△ Less
Submitted 1 April, 2024; v1 submitted 19 October, 2023;
originally announced October 2023.
-
Correlation between the strength of low-temperature T-linear normal-state resistivity and $T_{\rm c}$ in overdoped electron-doped cuprate superconductors
Authors:
Xingyu Ma,
Minghuan Zeng,
Huaiming Guo,
Shi** Feng
Abstract:
The recently observed an intimate link between the nature of the strange metallic normal-state and superconductivity in the overdoped electron-doped cuprate superconductors is calling for an explanation. Here the intrinsic correlation between the strength of the low-temperature linear-in-temperature normal-state resistivity and superconducting transition temperature $T_{\rm c}$ in the overdoped el…
▽ More
The recently observed an intimate link between the nature of the strange metallic normal-state and superconductivity in the overdoped electron-doped cuprate superconductors is calling for an explanation. Here the intrinsic correlation between the strength of the low-temperature linear-in-temperature normal-state resistivity and superconducting transition temperature $T_{\rm c}$ in the overdoped electron-doped cuprate superconductors is studied within the framework of the kinetic-energy-driven superconductivity. On the one hand, the main ingredient is identified into a electron pairing mechanism involving {\it the spin excitation}, and then $T_{\rm c}$ has a dome-like shape do** dependence with the maximal $T_{\rm c}$ that occurs at around the optimal electron do**. On the other hand, in the normal-state above $T_{\rm c}$, the low-temperature linear-in-temperature normal-state resistivity in the overdoped regime arises from the momentum relaxation due to the electron umklapp scattering mediated by {\it the same spin excitation}. This {\it same spin excitation} that governs both the electron umklapp scattering responsible for the low-temperature linear-in-temperature normal-state resistivity and electron pairing responsible for superconductivity naturally generates a correlation between the strength of the low-temperature linear-in-temperature normal-state resistivity and $T_{\rm c}$ in the overdoped regime.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Overconstrained Robotic Limb with Energy-Efficient, Omni-directional Locomotion
Authors:
Ronghan Xu,
Jiayi Yin,
Shihao Feng,
Bangchao Huang,
Haoran Sun,
Jia Pan,
Fang Wan,
Chaoyang Song
Abstract:
This paper studies the design, modeling, and control of a novel quadruped, featuring overconstrained robotic limbs employing the Bennett linkage for motion and power transmission. The modular limb design allows the robot to morph into reptile- or mammal-inspired forms. In contrast to the prevailing focus on planar limbs, this research delves into the classical overconstrained linkages, which have…
▽ More
This paper studies the design, modeling, and control of a novel quadruped, featuring overconstrained robotic limbs employing the Bennett linkage for motion and power transmission. The modular limb design allows the robot to morph into reptile- or mammal-inspired forms. In contrast to the prevailing focus on planar limbs, this research delves into the classical overconstrained linkages, which have strong theoretical foundations in advanced kinematics but limited engineering applications. The study showcases the morphological superiority of overconstrained robotic limbs that can transform into planar or spherical limbs, exemplifying the Bennett linkage. By conducting kinematic and dynamic modeling, we apply model predictive control to simulate a range of locomotion tasks, revealing that overconstrained limbs outperform planar designs in omni-directional tasks like forward trotting, lateral trotting, and turning on the spot when considering foothold distances. These findings highlight the biological distinctions in limb design between reptiles and mammals and represent the first documented instance of overconstrained robotic limbs outperforming planar designs in dynamic locomotion.
△ Less
Submitted 3 February, 2024; v1 submitted 15 October, 2023;
originally announced October 2023.
-
KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models
Authors:
Yuyang Bai,
Shangbin Feng,
Vidhisha Balachandran,
Zhaoxuan Tan,
Shiqi Lou,
Tianxing He,
Yulia Tsvetkov
Abstract:
Large language models (LLMs) demonstrate remarkable performance on knowledge-intensive tasks, suggesting that real-world knowledge is encoded in their model parameters. However, besides explorations on a few probing tasks in limited knowledge domains, it is not well understood how to evaluate LLMs' knowledge systematically and how well their knowledge abilities generalize, across a spectrum of kno…
▽ More
Large language models (LLMs) demonstrate remarkable performance on knowledge-intensive tasks, suggesting that real-world knowledge is encoded in their model parameters. However, besides explorations on a few probing tasks in limited knowledge domains, it is not well understood how to evaluate LLMs' knowledge systematically and how well their knowledge abilities generalize, across a spectrum of knowledge domains and progressively complex task formats. To this end, we propose KGQuiz, a knowledge-intensive benchmark to comprehensively investigate the knowledge generalization abilities of LLMs. KGQuiz is a scalable framework constructed from triplet-based knowledge, which covers three knowledge domains and consists of five tasks with increasing complexity: true-or-false, multiple-choice QA, blank filling, factual editing, and open-ended knowledge generation. To gain a better understanding of LLMs' knowledge abilities and their generalization, we evaluate 10 open-source and black-box LLMs on the KGQuiz benchmark across the five knowledge-intensive tasks and knowledge domains. Extensive experiments demonstrate that LLMs achieve impressive performance in straightforward knowledge QA tasks, while settings and contexts requiring more complex reasoning or employing domain-specific facts still present significant challenges. We envision KGQuiz as a testbed to analyze such nuanced variations in performance across domains and task formats, and ultimately to understand, evaluate, and improve LLMs' knowledge abilities across a wide spectrum of knowledge domains and tasks.
△ Less
Submitted 23 March, 2024; v1 submitted 15 October, 2023;
originally announced October 2023.
-
MM-BigBench: Evaluating Multimodal Models on Multimodal Content Comprehension Tasks
Authors:
Xiaocui Yang,
Wenfang Wu,
Shi Feng,
Ming Wang,
Daling Wang,
Yang Li,
Qi Sun,
Yifei Zhang,
Xiaoming Fu,
Soujanya Poria
Abstract:
The popularity of multimodal large language models (MLLMs) has triggered a recent surge in research efforts dedicated to evaluating these models. Nevertheless, existing evaluation studies of MLLMs primarily focus on the comprehension and reasoning of unimodal (vision) content, neglecting performance evaluations in the domain of multimodal (vision-language) content understanding. Beyond multimodal…
▽ More
The popularity of multimodal large language models (MLLMs) has triggered a recent surge in research efforts dedicated to evaluating these models. Nevertheless, existing evaluation studies of MLLMs primarily focus on the comprehension and reasoning of unimodal (vision) content, neglecting performance evaluations in the domain of multimodal (vision-language) content understanding. Beyond multimodal reasoning, tasks related to multimodal content comprehension necessitate a profound understanding of multimodal contexts, achieved through the multimodal interaction to obtain a final answer. In this paper, we introduce a comprehensive assessment framework called MM-BigBench, which incorporates a diverse range of metrics to offer an extensive evaluation of the performance of various models and instructions across a wide spectrum of diverse multimodal content comprehension tasks. Consequently, our work complements research on the performance of MLLMs in multimodal comprehension tasks, achieving a more comprehensive and holistic evaluation of MLLMs. To begin, we employ the Best Performance metric to ascertain each model's performance upper bound on different datasets. Subsequently, the Mean Relative Gain metric offers an assessment of the overall performance of various models and instructions, while the Stability metric measures their sensitivity. Furthermore, previous research centers on evaluating models independently or solely assessing instructions, neglecting the adaptability between models and instructions. We propose the Adaptability metric to quantify the adaptability between models and instructions. Our paper evaluates a total of 20 language models (14 MLLMs) on 14 multimodal datasets spanning 6 tasks, with 10 instructions for each task, and derives novel insights. Our code will be released at https://github.com/declare-lab/MM-BigBench.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
CAMELL: Confidence-based Acquisition Model for Efficient Self-supervised Active Learning with Label Validation
Authors:
Carel van Niekerk,
Christian Geishauser,
Michael Heck,
Shutong Feng,
Hsien-chin Lin,
Nurul Lubis,
Benjamin Ruppik,
Renato Vukovic,
Milica Gašić
Abstract:
Supervised neural approaches are hindered by their dependence on large, meticulously annotated datasets, a requirement that is particularly cumbersome for sequential tasks. The quality of annotations tends to deteriorate with the transition from expert-based to crowd-sourced labelling. To address these challenges, we present \textbf{CAMELL} (Confidence-based Acquisition Model for Efficient self-su…
▽ More
Supervised neural approaches are hindered by their dependence on large, meticulously annotated datasets, a requirement that is particularly cumbersome for sequential tasks. The quality of annotations tends to deteriorate with the transition from expert-based to crowd-sourced labelling. To address these challenges, we present \textbf{CAMELL} (Confidence-based Acquisition Model for Efficient self-supervised active Learning with Label validation), a pool-based active learning framework tailored for sequential multi-output problems. CAMELL possesses three core features: (1) it requires expert annotators to label only a fraction of a chosen sequence, (2) it facilitates self-supervision for the remainder of the sequence, and (3) it employs a label validation mechanism to prevent erroneous labels from contaminating the dataset and harming model performance. We evaluate CAMELL on sequential tasks, with a special emphasis on dialogue belief tracking, a task plagued by the constraints of limited and noisy datasets. Our experiments demonstrate that CAMELL outperforms the baselines in terms of efficiency. Furthermore, the data corrections suggested by our method contribute to an overall improvement in the quality of the resulting datasets.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
Very high energy gamma-ray emission beyond 10 TeV from GRB 221009A
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
A. Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
The highest energy gamma-rays from gamma-ray bursts (GRBs) have important implications for their radiation mechanism. Here we report for the first time the detection of gamma-rays up to 13 TeV from the brightest GRB 221009A by the Large High Altitude Air-shower Observatory (LHAASO). The LHAASO-KM2A detector registered more than 140 gamma-rays with energies above 3 TeV during 230$-$900s after the t…
▽ More
The highest energy gamma-rays from gamma-ray bursts (GRBs) have important implications for their radiation mechanism. Here we report for the first time the detection of gamma-rays up to 13 TeV from the brightest GRB 221009A by the Large High Altitude Air-shower Observatory (LHAASO). The LHAASO-KM2A detector registered more than 140 gamma-rays with energies above 3 TeV during 230$-$900s after the trigger. The intrinsic energy spectrum of gamma-rays can be described by a power-law after correcting for extragalactic background light (EBL) absorption. Such a hard spectrum challenges the synchrotron self-Compton (SSC) scenario of relativistic electrons for the afterglow emission above several TeV. Observations of gamma-rays up to 13 TeV from a source with a measured redshift of z=0.151 hints more transparency in intergalactic space than previously expected. Alternatively, one may invoke new physics such as Lorentz Invariance Violation (LIV) or an axion origin of very high energy (VHE) signals.
△ Less
Submitted 22 November, 2023; v1 submitted 13 October, 2023;
originally announced October 2023.
-
A High-Mass Young Star-forming Core Esca** from Its Parental Filament
Authors:
Zhiyuan Ren,
Xi Chen,
Tie Liu,
Emma Mannfors,
Leonardo Bronfman,
Fengwei Xu,
Siyi Feng,
Hongli Liu,
Fanyi Meng,
Amelia. M. Stutz,
Shanghuo Li,
Chang Won Lee,
Ke Wang,
Jianwen Zhou,
Di Li,
Chen Wang,
Chakali Eswaraiah,
Anandmayee Tej,
Long-Fei Chen,
Hui Shi
Abstract:
We studied the unique kinematic properties in massive filament G352.63-1.07 at $10^3$-AU spatial scale with the dense molecular tracers observed with the Atacama Large Millimeter/submillimeter Array (ALMA). We find the central massive core M1 (12 $M_\odot$) being separated from the surrounding filament with a velocity difference of $v- {v}_{sys}=-2$ km/s and a transverse separation within 3 arcsec…
▽ More
We studied the unique kinematic properties in massive filament G352.63-1.07 at $10^3$-AU spatial scale with the dense molecular tracers observed with the Atacama Large Millimeter/submillimeter Array (ALMA). We find the central massive core M1 (12 $M_\odot$) being separated from the surrounding filament with a velocity difference of $v- {v}_{sys}=-2$ km/s and a transverse separation within 3 arcsec. Meanwhile, as shown in multiple dense-gas tracers, M1 has a spatial extension closely aligned with the main filament and is connected to the filament towards its both ends. M1 thus represents a very beginning state for a massive young star-forming core esca** from the parental filament, within a time scale of $\sim 4000$ years. Based on its kinetic energy ($3.5\times10^{44}$ erg), the core escape is unlikely solely due to the original filament motion or magnetic field, but requires more energetic events such as a rapid intense anisotropic collapse. The released energy also seems to noticeably increase the environmental turbulence. This may help the filament to become stabilized again.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Overview of Physics-Informed Machine Learning Inversion of Geophysical Data
Authors:
Gerard T. Schuster,
Shihang Feng
Abstract:
We review four types of algorithms for physics-informed machine learning (PIML) inversion of geophysical data. The unifying equation is given by the joint objective function $ε$:
\begin{eqnarray} ε^{||-PIML}&=&λ_1 \overbrace{||{\bf W}^{ML}({\bf H}_{\bf w} {\bf d}^{obs}-{\bf m})||^2}^{NN} + λ_2 \overbrace{{||{\bf W}^{FWI}({\bf L} {\bf m}-{\bf d}^{obs})||^2}}^{FWI} ~+ \nonumber\\ \nonumber\\ && +…
▽ More
We review four types of algorithms for physics-informed machine learning (PIML) inversion of geophysical data. The unifying equation is given by the joint objective function $ε$:
\begin{eqnarray} ε^{||-PIML}&=&λ_1 \overbrace{||{\bf W}^{ML}({\bf H}_{\bf w} {\bf d}^{obs}-{\bf m})||^2}^{NN} + λ_2 \overbrace{{||{\bf W}^{FWI}({\bf L} {\bf m}-{\bf d}^{obs})||^2}}^{FWI} ~+ \nonumber\\ \nonumber\\ && + ~~Regularizer, \label{PIML.eq120} \end{eqnarray}where the optimal model ${\bf m}^*$ and weights $\bf w^*$ minimize $ε$. Here, The matrix weights are given by the boldface symbol $\bf W$, and full waveform inversion (FWI) is typically computed using a finite-difference solution of the wave equation, where $\bf L$ represents the forward modeling operation of the wave equation as a function of the model $\bf m$. Also, a fully-connected neural network (NN) is used to compute the model ${\bf H_w}{\bf d}^{obs} \approx \bf m$ from the observed input data ${\bf d}^{obs}$. The selection of weights $λ_i$ and the NN operations determine one of four different PIML algorithms.
PIML offers potential advantages over standard FWI through its enhanced ability to avoid local minima and the option to locally train the inversion operator, minimizing the requirement for extensive training data for global applicability. However, the effectiveness of PIML relies on the similarity between the test and trained data. Nevertheless, a possible strategy to overcome this limitation involves initial pretraining of a PIML architecture with data from a broader region, followed by fine-tuning for specific data-a method reminiscent of the way large language models are pretrained and adapted for various tasks.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Hypergraph Analysis Based on a Compatible Tensor Product Structure
Authors:
Jiaqi Gu,
Shenghao Feng,
Yimin Wei
Abstract:
We propose a tensor product structure that is compatible with the hypergraph structure. We define the algebraic connectivity of the $(m+1)$-uniform hypergraph in this product, and prove the relationship with the vertex connectivity. We introduce some connectivity optimization problem into the hypergraph, and solve them with the algebraic connectivity. We introduce the Laplacian eigenmap algorithm…
▽ More
We propose a tensor product structure that is compatible with the hypergraph structure. We define the algebraic connectivity of the $(m+1)$-uniform hypergraph in this product, and prove the relationship with the vertex connectivity. We introduce some connectivity optimization problem into the hypergraph, and solve them with the algebraic connectivity. We introduce the Laplacian eigenmap algorithm to the hypergraph under our tensor product.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
Superconductivity with Tc 116 K discovered in antimony polyhydrides
Authors:
K. Lu,
X. He,
C. L. Zhang,
Z. W. Li,
S. J. Zhang,
B. S. Min,
J. Zhang,
J. F. Zhao,
L. C. Shi,
Y. Peng,
S. M. Feng,
Q. Q. Liu,
J. Song,
R. C. Yu,
X. C. Wang,
Y. Wang,
M. Bykov,
C. Q. **
Abstract:
Superconductivity (SC) was experimentally observed for the first time in antimony polyhydride. The diamond anvil cell combined with laser heating system was used to synthesize the antimony polyhydride sample at high pressure and high temperature conditions. In-situ high pressure transport measurements as function of temperature with applied magnet are performed to study the SC properties. It was f…
▽ More
Superconductivity (SC) was experimentally observed for the first time in antimony polyhydride. The diamond anvil cell combined with laser heating system was used to synthesize the antimony polyhydride sample at high pressure and high temperature conditions. In-situ high pressure transport measurements as function of temperature with applied magnet are performed to study the SC properties. It was found that the antimony polyhydride samples show superconducting transition with critical temperature $T_c = 116^\circ$K at 184 GPa. The investigation of SC at magnetic field revealed that the superconducting coherent length ~40 angstroms based on Ginzburg Landau (GL) equation. Antimony polyhydride superconductor has the second highest Tc in addition to sulfur hydride among the polyhydrides of elements from main group IIIA to VIIA in periodic table.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
Machine learning assist nyc subway navigation safer and faster
Authors:
Wencheng Bao,
Shi Feng
Abstract:
Mainstream navigation software, like Google and Apple Maps, often lacks the ability to provide routes prioritizing safety. However, safety remains a paramount concern for many. Our aim is to strike a balance between safety and efficiency. To achieve this, we're devising an Integer Programming model that takes into account both the shortest path and the safest route. We will harness machine learnin…
▽ More
Mainstream navigation software, like Google and Apple Maps, often lacks the ability to provide routes prioritizing safety. However, safety remains a paramount concern for many. Our aim is to strike a balance between safety and efficiency. To achieve this, we're devising an Integer Programming model that takes into account both the shortest path and the safest route. We will harness machine learning to derive safety coefficients, employing methodologies such as generalized linear models, linear regression, and recurrent neural networks. Our evaluation will be based on the Root Mean Square Error (RMSE) across various subway stations, hel** us identify the most accurate model for safety coefficient estimation. Furthermore, we'll conduct a comprehensive review of different shortest-path algorithms, assessing them based on time complexity and real-world data to determine their appropriateness in merging both safety and time efficiency.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
Knowledge Crosswords: Geometric Knowledge Reasoning with Large Language Models
Authors:
Wenxuan Ding,
Shangbin Feng,
Yuhan Liu,
Zhaoxuan Tan,
Vidhisha Balachandran,
Tianxing He,
Yulia Tsvetkov
Abstract:
We propose Knowledge Crosswords, a geometric knowledge reasoning benchmark consisting of incomplete knowledge networks bounded by structured factual constraints, where LLMs are tasked with inferring the missing facts to meet all constraints. The novel setting of geometric knowledge reasoning necessitates new LM abilities beyond existing atomic/linear multi-hop QA, such as backtracking, verifying f…
▽ More
We propose Knowledge Crosswords, a geometric knowledge reasoning benchmark consisting of incomplete knowledge networks bounded by structured factual constraints, where LLMs are tasked with inferring the missing facts to meet all constraints. The novel setting of geometric knowledge reasoning necessitates new LM abilities beyond existing atomic/linear multi-hop QA, such as backtracking, verifying facts and constraints, reasoning with uncertainty, and more. Knowledge Crosswords contains 2,101 individual problems, covering diverse knowledge domains, and is further divided into three difficulty levels. We conduct extensive experiments to evaluate existing LLMs and approaches on Knowledge Crosswords. Results demonstrate that baseline approaches struggle with larger knowledge networks and semantically-equivalent entity distractors. In light of their limitations, we propose two new approaches, Staged Prompting and Verify-All, to augment LLMs' abilities for error-aware backtracking and constraint verification. Our Verify-All significantly outperforms prior methods and is more robust towards problems in the hard subset. Further analysis shows that geometric knowledge reasoning poses new challenges to LLMs' knowledge abilities, particularly in robustness towards varying option orders, complex structural constraints in knowledge networks, "none of the above" scenarios, and more.
△ Less
Submitted 25 June, 2024; v1 submitted 2 October, 2023;
originally announced October 2023.
-
Resolving Knowledge Conflicts in Large Language Models
Authors:
Yike Wang,
Shangbin Feng,
Heng Wang,
Weijia Shi,
Vidhisha Balachandran,
Tianxing He,
Yulia Tsvetkov
Abstract:
Large language models (LLMs) often encounter knowledge conflicts, scenarios where discrepancy arises between the internal parametric knowledge of LLMs and non-parametric information provided in the prompt context. In this work we ask what are the desiderata for LLMs when a knowledge conflict arises and whether existing LLMs fulfill them. We posit that LLMs should 1) identify knowledge conflicts, 2…
▽ More
Large language models (LLMs) often encounter knowledge conflicts, scenarios where discrepancy arises between the internal parametric knowledge of LLMs and non-parametric information provided in the prompt context. In this work we ask what are the desiderata for LLMs when a knowledge conflict arises and whether existing LLMs fulfill them. We posit that LLMs should 1) identify knowledge conflicts, 2) pinpoint conflicting information segments, and 3) provide distinct answers or viewpoints in conflicting scenarios. To this end, we introduce KNOWLEDGE CONFLICT, an evaluation framework for simulating contextual knowledge conflicts and quantitatively evaluating to what extent LLMs achieve these goals. KNOWLEDGE CONFLICT includes diverse and complex situations of knowledge conflict, knowledge from diverse entities and domains, two synthetic conflict creation methods, and settings with progressively increasing difficulty to reflect realistic knowledge conflicts. Extensive experiments with the KNOWLEDGE CONFLICT framework reveal that while LLMs perform well in identifying the existence of knowledge conflicts, they struggle to determine the specific conflicting knowledge and produce a response with distinct answers amidst conflicting information. To address these challenges, we propose new instruction-based approaches that augment LLMs to better achieve the three goals. Further analysis shows that abilities to tackle knowledge conflicts are greatly impacted by factors such as knowledge domain and prompt text, while generating robust responses to knowledge conflict scenarios remains an open research question.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
T-COL: Generating Counterfactual Explanations for General User Preferences on Variable Machine Learning Systems
Authors:
Ming Wang,
Daling Wang,
Wenfang Wu,
Shi Feng,
Yifei Zhang
Abstract:
To address the interpretability challenge in machine learning (ML) systems, counterfactual explanations (CEs) have emerged as a promising solution. CEs are unique as they provide workable suggestions to users, in addition to explaining why a certain outcome was predicted. The application of CEs encounters two main challenges: general user preferences and variable ML systems. User preferences tend…
▽ More
To address the interpretability challenge in machine learning (ML) systems, counterfactual explanations (CEs) have emerged as a promising solution. CEs are unique as they provide workable suggestions to users, in addition to explaining why a certain outcome was predicted. The application of CEs encounters two main challenges: general user preferences and variable ML systems. User preferences tend to be general rather than specific, and CEs need to be adaptable to variable ML models while maintaining robustness even as these models change. Facing these challenges, we present a solution rooted in validated general user preferences, which are derived from thorough user research. We map these preferences to the properties of CEs. Additionally, we introduce a novel method, \uline{T}ree-based \uline{C}onditions \uline{O}ptional \uline{L}inks (T-COL), which incorporates two optional structures and multiple condition groups for generating CEs adaptable to general user preferences. Meanwhile, we employ T-COL to enhance the robustness of CEs with specific conditions, making them more valid even when the ML model is replaced. Our experimental comparisons under different user preferences show that T-COL outperforms all baselines, including Large Language Models which are shown to be able to generate counterfactuals.
△ Less
Submitted 4 April, 2024; v1 submitted 27 September, 2023;
originally announced September 2023.
-
HPCR: Holistic Proxy-based Contrastive Replay for Online Continual Learning
Authors:
Huiwei Lin,
Shanshan Feng,
Baoquan Zhang,
Xutao Li,
Yew-soon Ong,
Yunming Ye
Abstract:
Online continual learning (OCL) aims to continuously learn new data from a single pass over the online data stream. It generally suffers from the catastrophic forgetting issue. Existing replay-based methods effectively alleviate this issue by replaying part of old data in a proxy-based or contrastive-based replay manner. In this paper, we conduct a comprehensive analysis of these two replay manner…
▽ More
Online continual learning (OCL) aims to continuously learn new data from a single pass over the online data stream. It generally suffers from the catastrophic forgetting issue. Existing replay-based methods effectively alleviate this issue by replaying part of old data in a proxy-based or contrastive-based replay manner. In this paper, we conduct a comprehensive analysis of these two replay manners and find they can be complementary. Inspired by this finding, we propose a novel replay-based method called proxy-based contrastive replay (PCR), which replaces anchor-to-sample pairs with anchor-to-proxy pairs in the contrastive-based loss to alleviate the phenomenon of forgetting. Based on PCR, we further develop a more advanced method named holistic proxy-based contrastive replay (HPCR), which consists of three components. The contrastive component conditionally incorporates anchor-to-sample pairs to PCR, learning more fine-grained semantic information with a large training batch. The second is a temperature component that decouples the temperature coefficient into two parts based on their impacts on the gradient and sets different values for them to learn more novel knowledge. The third is a distillation component that constrains the learning process to keep more historical knowledge. Experiments on four datasets consistently demonstrate the superiority of HPCR over various state-of-the-art methods.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
Affect Recognition in Conversations Using Large Language Models
Authors:
Shutong Feng,
Guangzhi Sun,
Nurul Lubis,
Chao Zhang,
Milica Gašić
Abstract:
Affect recognition, encompassing emotions, moods, and feelings, plays a pivotal role in human communication. In the realm of conversational artificial intelligence (AI), the ability to discern and respond to human affective cues is a critical factor for creating engaging and empathetic interactions. This study delves into the capacity of large language models (LLMs) to recognise human affect in co…
▽ More
Affect recognition, encompassing emotions, moods, and feelings, plays a pivotal role in human communication. In the realm of conversational artificial intelligence (AI), the ability to discern and respond to human affective cues is a critical factor for creating engaging and empathetic interactions. This study delves into the capacity of large language models (LLMs) to recognise human affect in conversations, with a focus on both open-domain chit-chat dialogues and task-oriented dialogues. Leveraging three diverse datasets, namely IEMOCAP, EmoWOZ, and DAIC-WOZ, covering a spectrum of dialogues from casual conversations to clinical interviews, we evaluated and compared LLMs' performance in affect recognition. Our investigation explores the zero-shot and few-shot capabilities of LLMs through in-context learning (ICL) as well as their model capacities through task-specific fine-tuning. Additionally, this study takes into account the potential impact of automatic speech recognition (ASR) errors on LLM predictions. With this work, we aim to shed light on the extent to which LLMs can replicate human-like affect recognition capabilities in conversations.
△ Less
Submitted 22 September, 2023;
originally announced September 2023.
-
Reset Controller Synthesis by Reach-avoid Analysis for Delay Hybrid Systems
Authors:
Han Su,
Jiyu Zhu,
Shenghua Feng,
Yunjun Bai,
Bin Gu,
Jiang Liu,
Mengfei Yang,
Naijun Zhan
Abstract:
A reset controller plays a crucial role in designing hybrid systems. It restricts the initial set and redefines the reset map associated with discrete transitions, in order to guarantee the system to achieve its objective. Reset controller synthesis, together with feedback controller synthesis and switching logic controller synthesis, provides a correct-by-construction approach to designing hybrid…
▽ More
A reset controller plays a crucial role in designing hybrid systems. It restricts the initial set and redefines the reset map associated with discrete transitions, in order to guarantee the system to achieve its objective. Reset controller synthesis, together with feedback controller synthesis and switching logic controller synthesis, provides a correct-by-construction approach to designing hybrid systems. However, time-delay is an inevitable factor in hybrid systems, which can degrade control performance and render verification certificates obtained by abstracting away time-delay invalid in practice. In this paper, we investigate this issue in a practical manner by taking time-delay into account. We propose an approach that reduces the synthesis of reset controllers to the generation of reach-avoid sets for the hybrid system under consideration, which can be efficiently solved using off-the-shell convex optimization solvers.
△ Less
Submitted 27 May, 2024; v1 submitted 11 September, 2023;
originally announced September 2023.
-
UER: A Heuristic Bias Addressing Approach for Online Continual Learning
Authors:
Huiwei Lin,
Shanshan Feng,
Baoquan Zhang,
Hongliang Qiao,
Xutao Li,
Yunming Ye
Abstract:
Online continual learning aims to continuously train neural networks from a continuous data stream with a single pass-through data. As the most effective approach, the rehearsal-based methods replay part of previous data. Commonly used predictors in existing methods tend to generate biased dot-product logits that prefer to the classes of current data, which is known as a bias issue and a phenomeno…
▽ More
Online continual learning aims to continuously train neural networks from a continuous data stream with a single pass-through data. As the most effective approach, the rehearsal-based methods replay part of previous data. Commonly used predictors in existing methods tend to generate biased dot-product logits that prefer to the classes of current data, which is known as a bias issue and a phenomenon of forgetting. Many approaches have been proposed to overcome the forgetting problem by correcting the bias; however, they still need to be improved in online fashion. In this paper, we try to address the bias issue by a more straightforward and more efficient method. By decomposing the dot-product logits into an angle factor and a norm factor, we empirically find that the bias problem mainly occurs in the angle factor, which can be used to learn novel knowledge as cosine logits. On the contrary, the norm factor abandoned by existing methods helps remember historical knowledge. Based on this observation, we intuitively propose to leverage the norm factor to balance the new and old knowledge for addressing the bias. To this end, we develop a heuristic approach called unbias experience replay (UER). UER learns current samples only by the angle factor and further replays previous samples by both the norm and angle factors. Extensive experiments on three datasets show that UER achieves superior performance over various state-of-the-art methods. The code is in https://github.com/FelixHuiweiLin/UER.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
Hidden subsystem symmetry protected states in competing topological orders
Authors:
Shi Feng
Abstract:
We reveal the connection between two-dimensional subsystem symmetry-protected topological (SSPT) states and two-dimensional topological orders via a self-dual frustrated toric code model. This model, an enrichment of the toric code (TC) with its dual interactions, can be mapped to a model defined on the dual lattice with subsystem symmetries and subextensive ground state degeneracy. The map connec…
▽ More
We reveal the connection between two-dimensional subsystem symmetry-protected topological (SSPT) states and two-dimensional topological orders via a self-dual frustrated toric code model. This model, an enrichment of the toric code (TC) with its dual interactions, can be mapped to a model defined on the dual lattice with subsystem symmetries and subextensive ground state degeneracy. The map connects exactly the frustrated TC to two copies of the topological plaquette Ising model (TPIM), as a strong SSPT model with linear subsystem symmetries. The membrane order parameter of the TPIM is exactly mapped to dual TC stabilizers as the order parameter of the frustrated TC model, SSPT gapless edge states of the TPIM are mapped to zero-energy dangling operators under open boundaries, and the transition from the SSPT-ordered TPIM to the trivial paramagnetic phase is mapped to the transition between two distinct topological orders. We also demonstrate that this map** can be used to elucidate the structure of other SSPT models, reflecting the subtle linkage between SSPT order and topological order in two dimensions.
△ Less
Submitted 27 February, 2024; v1 submitted 5 September, 2023;
originally announced September 2023.
-
Pseudo-magnetic fields in square lattices
Authors:
Junsong Sun,
Xingchuan Zhu,
Tianyu Liu,
Shi** Feng,
Huaiming Guo
Abstract:
We have investigated the effects of strain on two-dimensional square lattices and examined the methods for inducing pseudo-magnetic fields. In both the columnar and staggered $π$-flux square lattices, we have found that strain only modulates Fermi velocities rather than inducing pseudo-magnetic fields. However, spatially non-uniform on-site potentials (anisotropic hop**s) can create pseudo-magne…
▽ More
We have investigated the effects of strain on two-dimensional square lattices and examined the methods for inducing pseudo-magnetic fields. In both the columnar and staggered $π$-flux square lattices, we have found that strain only modulates Fermi velocities rather than inducing pseudo-magnetic fields. However, spatially non-uniform on-site potentials (anisotropic hop**s) can create pseudo-magnetic fields in columnar (staggered) $π$-flux square lattices. On the other hand, we demonstrate that strain does induce pseudo-magnetic fields in staggered zero-flux square lattices. By breaking a quarter of the bonds, we clarify that a staggered zero-flux square lattice is topologically equivalent to a honeycomb lattice and displays pseudo-vector potentials and pseudo-Landau levels at the Dirac points.
△ Less
Submitted 15 October, 2023; v1 submitted 31 August, 2023;
originally announced September 2023.
-
Observation of Flat Band and Van Hove Singularity in Non-superconducting Nitrogen-doped Lutetium Hydride
Authors:
Xin Liang,
Zihan Lin,
Jun Zhang,
Jianfa Zhao,
Shiyu Feng,
Wenlong Lu,
Guodong Wang,
Luchuan Shi,
Ningning Wang,
Pengfei Shan,
Zao Zhang,
Muntaser Naamneh,
Runzhe Liu,
Bastien Michon,
**guang Cheng,
Changqing **,
Yang Ren,
Junzhang Ma
Abstract:
Hydrogen-rich materials offer a compelling avenue towards room temperature superconductivity, albeit under ultra-high pressure. However, the experimental investigation of the electronic band structure remains elusive, due to the inherent instability of most of the hydrogen-rich materials upon pressure release. Very recently, nitrogen-doped lutetium hydride was claimed to host room temperature supe…
▽ More
Hydrogen-rich materials offer a compelling avenue towards room temperature superconductivity, albeit under ultra-high pressure. However, the experimental investigation of the electronic band structure remains elusive, due to the inherent instability of most of the hydrogen-rich materials upon pressure release. Very recently, nitrogen-doped lutetium hydride was claimed to host room temperature superconductivity under near ambient pressure but was disproven by following works. Upon decompression, nitrogen doped lutetium hydride manifests a stable metallic phase with dark blue color. Moreover, high temperature superconductivity has been reported in lutetium hydrides Lu4H23 (~71 K) under around 200 GPa. These properties engender an unprecedented opportunity, allowing for the experimental investigation of the electronic band structure intrinsic to hydrogen-rich material. In this work, using angle resolved photoemission spectroscopy to investigate the non-superconducting nitrogen doped lutetium hydride, we observed significant flat band and Van Hove singularity marginally below the Fermi level. These salient features, identified as critical elements, proffer potential amplifiers for the realization of heightened superconductivity, as evidenced by prior research. Our results not only unveil a confluence of potent strong correlation effects and anisotropy within the Lu-H-N compound, but also provide a prospect for engineering high temperature superconductivity through the strategic manipulation of flat band and the VHS, effectively tailoring their alignment with the Fermi energy.
△ Less
Submitted 8 September, 2023; v1 submitted 30 August, 2023;
originally announced August 2023.
-
IDVT: Interest-aware Denoising and View-guided Tuning for Social Recommendation
Authors:
Dezhao Yang,
Jianghong Ma,
Shanshan Feng,
Haijun Zhang,
Zhao Zhang
Abstract:
In the information age, recommendation systems are vital for efficiently filtering information and identifying user preferences. Online social platforms have enriched these systems by providing valuable auxiliary information. Socially connected users are assumed to share similar preferences, enhancing recommendation accuracy and addressing cold start issues. However, empirical findings challenge t…
▽ More
In the information age, recommendation systems are vital for efficiently filtering information and identifying user preferences. Online social platforms have enriched these systems by providing valuable auxiliary information. Socially connected users are assumed to share similar preferences, enhancing recommendation accuracy and addressing cold start issues. However, empirical findings challenge the assumption, revealing that certain social connections can actually harm system performance. Our statistical analysis indicates a significant amount of noise in the social network, where many socially connected users do not share common interests. To address this issue, we propose an innovative \underline{I}nterest-aware \underline{D}enoising and \underline{V}iew-guided \underline{T}uning (IDVT) method for the social recommendation. The first ID part effectively denoises social connections. Specifically, the denoising process considers both social network structure and user interaction interests in a global view. Moreover, in this global view, we also integrate denoised social information (social domain) into the propagation of the user-item interactions (collaborative domain) and aggregate user representations from two domains using a gating mechanism. To tackle potential user interest loss and enhance model robustness within the global view, our second VT part introduces two additional views (local view and dropout-enhanced view) for fine-tuning user representations in the global view through contrastive learning. Extensive evaluations on real-world datasets with varying noise ratios demonstrate the superiority of IDVT over state-of-the-art social recommendation methods.
△ Less
Submitted 17 June, 2024; v1 submitted 30 August, 2023;
originally announced August 2023.