-
ESP: Extro-Spective Prediction for Long-term Behavior Reasoning in Emergency Scenarios
Authors:
Dingrui Wang,
Zheyuan Lai,
Yuda Li,
Yi Wu,
Yuexin Ma,
Johannes Betz,
Ruigang Yang,
Wei Li
Abstract:
Emergent-scene safety is the key milestone for fully autonomous driving, and reliable on-time prediction is essential to maintain safety in emergency scenarios. However, these emergency scenarios are long-tailed and hard to collect, which restricts the system from getting reliable predictions. In this paper, we build a new dataset, which aims at the long-term prediction with the inconspicuous stat…
▽ More
Emergent-scene safety is the key milestone for fully autonomous driving, and reliable on-time prediction is essential to maintain safety in emergency scenarios. However, these emergency scenarios are long-tailed and hard to collect, which restricts the system from getting reliable predictions. In this paper, we build a new dataset, which aims at the long-term prediction with the inconspicuous state variation in history for the emergency event, named the Extro-Spective Prediction (ESP) problem. Based on the proposed dataset, a flexible feature encoder for ESP is introduced to various prediction methods as a seamless plug-in, and its consistent performance improvement underscores its efficacy. Furthermore, a new metric named clamped temporal error (CTE) is proposed to give a more comprehensive evaluation of prediction performance, especially in time-sensitive emergency events of subseconds. Interestingly, as our ESP features can be described in human-readable language naturally, the application of integrating into ChatGPT also shows huge potential. The ESP-dataset and all benchmarks are released at https://dingrui-wang.github.io/ESP-Dataset/.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Explainable Fake News Detection With Large Language Model via Defense Among Competing Wisdom
Authors:
Bo Wang,
**g Ma,
Hongzhan Lin,
Zhiwei Yang,
Ruichao Yang,
Yuan Tian,
Yi Chang
Abstract:
Most fake news detection methods learn latent feature representations based on neural networks, which makes them black boxes to classify a piece of news without giving any justification. Existing explainable systems generate veracity justifications from investigative journalism, which suffer from debunking delayed and low efficiency. Recent studies simply assume that the justification is equivalen…
▽ More
Most fake news detection methods learn latent feature representations based on neural networks, which makes them black boxes to classify a piece of news without giving any justification. Existing explainable systems generate veracity justifications from investigative journalism, which suffer from debunking delayed and low efficiency. Recent studies simply assume that the justification is equivalent to the majority opinions expressed in the wisdom of crowds. However, the opinions typically contain some inaccurate or biased information since the wisdom of crowds is uncensored. To detect fake news from a sea of diverse, crowded and even competing narratives, in this paper, we propose a novel defense-based explainable fake news detection framework. Specifically, we first propose an evidence extraction module to split the wisdom of crowds into two competing parties and respectively detect salient evidences. To gain concise insights from evidences, we then design a prompt-based module that utilizes a large language model to generate justifications by inferring reasons towards two possible veracities. Finally, we propose a defense-based inference module to determine veracity via modeling the defense among these justifications. Extensive experiments conducted on two real-world benchmarks demonstrate that our proposed method outperforms state-of-the-art baselines in terms of fake news detection and provides high-quality justifications.
△ Less
Submitted 20 June, 2024; v1 submitted 6 May, 2024;
originally announced May 2024.
-
SlotGAT: Slot-based Message Passing for Heterogeneous Graph Neural Network
Authors:
Ziang Zhou,
Jieming Shi,
Renchi Yang,
Yuanhang Zou,
Qing Li
Abstract:
Heterogeneous graphs are ubiquitous to model complex data. There are urgent needs on powerful heterogeneous graph neural networks to effectively support important applications. We identify a potential semantic mixing issue in existing message passing processes, where the representations of the neighbors of a node $v$ are forced to be transformed to the feature space of $v$ for aggregation, though…
▽ More
Heterogeneous graphs are ubiquitous to model complex data. There are urgent needs on powerful heterogeneous graph neural networks to effectively support important applications. We identify a potential semantic mixing issue in existing message passing processes, where the representations of the neighbors of a node $v$ are forced to be transformed to the feature space of $v$ for aggregation, though the neighbors are in different types. That is, the semantics in different node types are entangled together into node $v$'s representation. To address the issue, we propose SlotGAT with separate message passing processes in slots, one for each node type, to maintain the representations in their own node-type feature spaces. Moreover, in a slot-based message passing layer, we design an attention mechanism for effective slot-wise message aggregation. Further, we develop a slot attention technique after the last layer of SlotGAT, to learn the importance of different slots in downstream tasks. Our analysis indicates that the slots in SlotGAT can preserve different semantics in various feature spaces. The superiority of SlotGAT is evaluated against 13 baselines on 6 datasets for node classification and link prediction. Our code is at https://github.com/scottjiao/SlotGAT_ICML23/.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
S4: Self-Supervised Sensing Across the Spectrum
Authors:
Jayanth Shenoy,
Xingjian Davis Zhang,
Shlok Mehrotra,
Bill Tao,
Rem Yang,
Han Zhao,
Deepak Vasisht
Abstract:
Satellite image time series (SITS) segmentation is crucial for many applications like environmental monitoring, land cover map** and agricultural crop type classification. However, training models for SITS segmentation remains a challenging task due to the lack of abundant training data, which requires fine grained annotation. We propose S4 a new self-supervised pre-training approach that signif…
▽ More
Satellite image time series (SITS) segmentation is crucial for many applications like environmental monitoring, land cover map** and agricultural crop type classification. However, training models for SITS segmentation remains a challenging task due to the lack of abundant training data, which requires fine grained annotation. We propose S4 a new self-supervised pre-training approach that significantly reduces the requirement for labeled training data by utilizing two new insights: (a) Satellites capture images in different parts of the spectrum such as radio frequencies, and visible frequencies. (b) Satellite imagery is geo-registered allowing for fine-grained spatial alignment. We use these insights to formulate pre-training tasks in S4. We also curate m2s2-SITS, a large-scale dataset of unlabeled, spatially-aligned, multi-modal and geographic specific SITS that serves as representative pre-training data for S4. Finally, we evaluate S4 on multiple SITS segmentation datasets and demonstrate its efficacy against competing baselines while using limited labeled data.
△ Less
Submitted 27 June, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
Spectrally Pruned Gaussian Fields with Neural Compensation
Authors:
Runyi Yang,
Zhenxin Zhu,
Zhou Jiang,
Baijun Ye,
Xiaoxue Chen,
Yifei Zhang,
Yuantao Chen,
Jian Zhao,
Hao Zhao
Abstract:
Recently, 3D Gaussian Splatting, as a novel 3D representation, has garnered attention for its fast rendering speed and high rendering quality. However, this comes with high memory consumption, e.g., a well-trained Gaussian field may utilize three million Gaussian primitives and over 700 MB of memory. We credit this high memory footprint to the lack of consideration for the relationship between pri…
▽ More
Recently, 3D Gaussian Splatting, as a novel 3D representation, has garnered attention for its fast rendering speed and high rendering quality. However, this comes with high memory consumption, e.g., a well-trained Gaussian field may utilize three million Gaussian primitives and over 700 MB of memory. We credit this high memory footprint to the lack of consideration for the relationship between primitives. In this paper, we propose a memory-efficient Gaussian field named SUNDAE with spectral pruning and neural compensation. On one hand, we construct a graph on the set of Gaussian primitives to model their relationship and design a spectral down-sampling module to prune out primitives while preserving desired signals. On the other hand, to compensate for the quality loss of pruning Gaussians, we exploit a lightweight neural network head to mix splatted features, which effectively compensates for quality losses while capturing the relationship between primitives in its weights. We demonstrate the performance of SUNDAE with extensive results. For example, SUNDAE can achieve 26.80 PSNR at 145 FPS using 104 MB memory while the vanilla Gaussian splatting algorithm achieves 25.60 PSNR at 160 FPS using 523 MB memory, on the Mip-NeRF360 dataset. Codes are publicly available at https://runyiyang.github.io/projects/SUNDAE/.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
From Persona to Personalization: A Survey on Role-Playing Language Agents
Authors:
Jiangjie Chen,
Xintao Wang,
Rui Xu,
Siyu Yuan,
Yikai Zhang,
Wei Shi,
Jian Xie,
Shuang Li,
Ruihan Yang,
Tinghui Zhu,
Aili Chen,
Nianqi Li,
Lida Chen,
Caiyu Hu,
Siye Wu,
Scott Ren,
Ziquan Fu,
Yanghua Xiao
Abstract:
Recent advancements in large language models (LLMs) have significantly boosted the rise of Role-Playing Language Agents (RPLAs), i.e., specialized AI systems designed to simulate assigned personas. By harnessing multiple advanced abilities of LLMs, including in-context learning, instruction following, and social intelligence, RPLAs achieve a remarkable sense of human likeness and vivid role-playin…
▽ More
Recent advancements in large language models (LLMs) have significantly boosted the rise of Role-Playing Language Agents (RPLAs), i.e., specialized AI systems designed to simulate assigned personas. By harnessing multiple advanced abilities of LLMs, including in-context learning, instruction following, and social intelligence, RPLAs achieve a remarkable sense of human likeness and vivid role-playing performance. RPLAs can mimic a wide range of personas, ranging from historical figures and fictional characters to real-life individuals. Consequently, they have catalyzed numerous AI applications, such as emotional companions, interactive video games, personalized assistants and copilots, and digital clones. In this paper, we conduct a comprehensive survey of this field, illustrating the evolution and recent progress in RPLAs integrating with cutting-edge LLM technologies. We categorize personas into three types: 1) Demographic Persona, which leverages statistical stereotypes; 2) Character Persona, focused on well-established figures; and 3) Individualized Persona, customized through ongoing user interactions for personalized services. We begin by presenting a comprehensive overview of current methodologies for RPLAs, followed by the details for each persona type, covering corresponding data sourcing, agent construction, and evaluation. Afterward, we discuss the fundamental risks, existing limitations, and future prospects of RPLAs. Additionally, we provide a brief review of RPLAs in AI applications, which reflects practical user demands that shape and drive RPLA research. Through this work, we aim to establish a clear taxonomy of RPLA research and applications, and facilitate future research in this critical and ever-evolving field, and pave the way for a future where humans and RPLAs coexist in harmony.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
BotDGT: Dynamicity-aware Social Bot Detection with Dynamic Graph Transformers
Authors:
Buyun He,
Yingguang Yang,
Qi Wu,
Hao Liu,
Renyu Yang,
Hao Peng,
Xiang Wang,
Yong Liao,
Pengyuan Zhou
Abstract:
Detecting social bots has evolved into a pivotal yet intricate task, aimed at combating the dissemination of misinformation and preserving the authenticity of online interactions. While earlier graph-based approaches, which leverage topological structure of social networks, yielded notable outcomes, they overlooked the inherent dynamicity of social networks -- In reality, they largely depicted the…
▽ More
Detecting social bots has evolved into a pivotal yet intricate task, aimed at combating the dissemination of misinformation and preserving the authenticity of online interactions. While earlier graph-based approaches, which leverage topological structure of social networks, yielded notable outcomes, they overlooked the inherent dynamicity of social networks -- In reality, they largely depicted the social network as a static graph and solely relied on its most recent state. Due to the absence of dynamicity modeling, such approaches are vulnerable to evasion, particularly when advanced social bots interact with other users to camouflage identities and escape detection. To tackle these challenges, we propose BotDGT, a novel framework that not only considers the topological structure, but also effectively incorporates dynamic nature of social network. Specifically, we characterize a social network as a dynamic graph. A structural module is employed to acquire topological information from each historical snapshot. Additionally, a temporal module is proposed to integrate historical context and model the evolving behavior patterns exhibited by social bots and legitimate users. Experimental results demonstrate the superiority of BotDGT against the leading methods that neglected the dynamic nature of social networks in terms of accuracy, recall, and F1-score.
△ Less
Submitted 24 April, 2024; v1 submitted 23 April, 2024;
originally announced April 2024.
-
Possible signatures of higher dimension in thin accretion disk around brane world black hole
Authors:
Ailin Liu,
Tong-Yu He,
Ming Liu,
Zhan-Wen Han,
Rong-Jia Yang
Abstract:
We probe deeply into the characteristics of thin accretion disk surrounding black hole within the brane world paradigm. We investigate how model parameters affect the physical properties of the disk. Our findings indicate that as the tidal charge parameter inherited from the higher dimension increases, the energy flux, the radiation temperature, the spectral cutoff frequency, the spectral luminosi…
▽ More
We probe deeply into the characteristics of thin accretion disk surrounding black hole within the brane world paradigm. We investigate how model parameters affect the physical properties of the disk. Our findings indicate that as the tidal charge parameter inherited from the higher dimension increases, the energy flux, the radiation temperature, the spectral cutoff frequency, the spectral luminosity, and the conversion efficiency of the disk all increase, but the radius of the innermost stable circular orbit decreases. Compared to cases of the Kerr and Schwarzschild black holes, the disk is hotter and more luminous for positive tidal charge parameter, while it is cooler and less luminous for negative tidal charge parameter, which suggests the potential for probing possible signatures of higher dimension.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
AED-PADA:Improving Generalizability of Adversarial Example Detection via Principal Adversarial Domain Adaptation
Authors:
Heqi Peng,
Yunhong Wang,
Ruijie Yang,
Beichen Li,
Rui Wang,
Yuanfang Guo
Abstract:
Adversarial example detection, which can be conveniently applied in many scenarios, is important in the area of adversarial defense. Unfortunately, existing detection methods suffer from poor generalization performance, because their training process usually relies on the examples generated from a single known adversarial attack and there exists a large discrepancy between the training and unseen…
▽ More
Adversarial example detection, which can be conveniently applied in many scenarios, is important in the area of adversarial defense. Unfortunately, existing detection methods suffer from poor generalization performance, because their training process usually relies on the examples generated from a single known adversarial attack and there exists a large discrepancy between the training and unseen testing adversarial examples. To address this issue, we propose a novel method, named Adversarial Example Detection via Principal Adversarial Domain Adaptation (AED-PADA). Specifically, our approach identifies the Principal Adversarial Domains (PADs), i.e., a combination of features of the adversarial examples from different attacks, which possesses large coverage of the entire adversarial feature space. Then, we pioneer to exploit multi-source domain adaptation in adversarial example detection with PADs as source domains. Experiments demonstrate the superior generalization ability of our proposed AED-PADA. Note that this superiority is particularly achieved in challenging scenarios characterized by employing the minimal magnitude constraint for the perturbations.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
783-MHz fundamental repetition rate all-fiber ring laser mode-locked by carbon nanotubes
Authors:
Maolin Dai,
Bowen Liu,
Yifan Ma,
Takuma Shirahata,
Ruoao Yang,
Zhigang Zhang,
Sze Yun Set,
Shinji Yamashita
Abstract:
We demonstrate a 783-MHz fundamental repetition rate mode-locked Er-doped all-fiber ring laser with a pulse width of 623 fs. By using carbon nanotubes (CNT) saturable absorber (SA), a relatively low self-starting pump threshold of 108 mW is achieved. The laser has a very compact footprint less than 10 cm * 10 cm, benefiting from the all-active-fiber cavity design. The robust mode-locking is confir…
▽ More
We demonstrate a 783-MHz fundamental repetition rate mode-locked Er-doped all-fiber ring laser with a pulse width of 623 fs. By using carbon nanotubes (CNT) saturable absorber (SA), a relatively low self-starting pump threshold of 108 mW is achieved. The laser has a very compact footprint less than 10 cm * 10 cm, benefiting from the all-active-fiber cavity design. The robust mode-locking is confirmed by the low relative intensity noise (RIN) and a long-term stability test. We propose a new scheme for generating high repetition rate femtosecond optical pulses from a compact and stable all-active-fiber ring oscillator.
△ Less
Submitted 28 May, 2024; v1 submitted 17 April, 2024;
originally announced April 2024.
-
BDAN: Mitigating Temporal Difference Across Electrodes in Cross-Subject Motor Imagery Classification via Generative Bridging Domain
Authors:
Zhige Chen,
Rui Yang,
Mengjie Huang,
Chengxuan Qin,
Zidong Wang
Abstract:
Because of "the non-repeatability of the experiment settings and conditions" and "the variability of brain patterns among subjects", the data distributions across sessions and electrodes are different in cross-subject motor imagery (MI) studies, eventually reducing the performance of the classification model. Systematically summarised based on the existing studies, a novel temporal-electrode data…
▽ More
Because of "the non-repeatability of the experiment settings and conditions" and "the variability of brain patterns among subjects", the data distributions across sessions and electrodes are different in cross-subject motor imagery (MI) studies, eventually reducing the performance of the classification model. Systematically summarised based on the existing studies, a novel temporal-electrode data distribution problem is investigated under both intra-subject and inter-subject scenarios in this paper. Based on the presented issue, a novel bridging domain adaptation network (BDAN) is proposed, aiming to minimise the data distribution difference across sessions in the aspect of the electrode, thus improving and enhancing model performance. In the proposed BDAN, deep features of all the EEG data are extracted via a specially designed spatial feature extractor. With the obtained spatio-temporal features, a special generative bridging domain is established, bridging the data from all the subjects across sessions. The difference across sessions and electrodes is then minimized using the customized bridging loss functions, and the known knowledge is automatically transferred through the constructed bridging domain. To show the effectiveness of the proposed BDAN, comparison experiments and ablation studies are conducted on a public EEG dataset. The overall comparison results demonstrate the superior performance of the proposed BDAN compared with the other advanced deep learning and domain adaptation methods.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
HELLINGER-UCB: A novel algorithm for stochastic multi-armed bandit problem and cold start problem in recommender system
Authors:
Ruibo Yang,
Jiazhou Wang,
Andrew Mullhaupt
Abstract:
In this paper, we study the stochastic multi-armed bandit problem, where the reward is driven by an unknown random variable. We propose a new variant of the Upper Confidence Bound (UCB) algorithm called Hellinger-UCB, which leverages the squared Hellinger distance to build the upper confidence bound. We prove that the Hellinger-UCB reaches the theoretical lower bound. We also show that the Helling…
▽ More
In this paper, we study the stochastic multi-armed bandit problem, where the reward is driven by an unknown random variable. We propose a new variant of the Upper Confidence Bound (UCB) algorithm called Hellinger-UCB, which leverages the squared Hellinger distance to build the upper confidence bound. We prove that the Hellinger-UCB reaches the theoretical lower bound. We also show that the Hellinger-UCB has a solid statistical interpretation. We show that Hellinger-UCB is effective in finite time horizons with numerical experiments between Hellinger-UCB and other variants of the UCB algorithm. As a real-world example, we apply the Hellinger-UCB algorithm to solve the cold-start problem for a content recommender system of a financial app. With reasonable assumption, the Hellinger-UCB algorithm has a convenient but important lower latency feature. The online experiment also illustrates that the Hellinger-UCB outperforms both KL-UCB and UCB1 in the sense of a higher click-through rate (CTR).
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation
Authors:
Ruixin Yang,
Dheeraj Rajagopal,
Shirley Anugrah Hayati,
Bin Hu,
Dongyeop Kang
Abstract:
Uncertainty estimation is a significant issue for current large language models (LLMs) that are generally poorly calibrated and over-confident, especially with reinforcement learning from human feedback (RLHF). Unlike humans, whose decisions and confidences not only stem from intrinsic beliefs but can also be adjusted through daily observations, existing calibration methods for LLMs focus on estim…
▽ More
Uncertainty estimation is a significant issue for current large language models (LLMs) that are generally poorly calibrated and over-confident, especially with reinforcement learning from human feedback (RLHF). Unlike humans, whose decisions and confidences not only stem from intrinsic beliefs but can also be adjusted through daily observations, existing calibration methods for LLMs focus on estimating or eliciting individual confidence without taking full advantage of the "Collective Wisdom": the interaction among multiple LLMs that can collectively improve both accuracy and calibration. In this work, we propose Collaborative Calibration, a post-hoc training-free calibration strategy that leverages the collaborative and expressive capabilities of multiple tool-augmented LLM agents in a simulated group deliberation process. We demonstrate the effectiveness of Collaborative Calibration on generative QA tasks across various domains, showing its potential in harnessing the rationalization of collectively calibrated confidence assessments and improving the reliability of model predictions.
△ Less
Submitted 10 May, 2024; v1 submitted 13 April, 2024;
originally announced April 2024.
-
Fabricating Paper Circuits with Subtractive Processing
Authors:
Ruhan Yang,
Krithik Ranjan,
Ellen Yi-Luen Do
Abstract:
This paper introduces a new method of paper circuit fabrication that overcomes design barriers and increases flexibility in circuit design. Conventional circuit boards rely on thin traces, which limits the complexity and accuracy when applied to paper circuits. To address this issue, we propose a method that uses large conductive zones in paper circuits and performs subtractive processing during t…
▽ More
This paper introduces a new method of paper circuit fabrication that overcomes design barriers and increases flexibility in circuit design. Conventional circuit boards rely on thin traces, which limits the complexity and accuracy when applied to paper circuits. To address this issue, we propose a method that uses large conductive zones in paper circuits and performs subtractive processing during their fabrication. This approach eliminates design barriers and allows for more flexibility in circuit design. We introduce PaperCAD, a software tool that simplifies the design process by converting traditional circuit design to paper circuit design. We demonstrate our technique by creating two paper circuit boards. Our approach has the potential to promote the development of new applications for paper circuits.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Enhancing Accessibility in Soft Robotics: Exploring Magnet-Embedded Paper-Based Interactions
Authors:
Ruhan Yang,
Ellen Yi-Luen Do
Abstract:
This paper explores the implementation of embedded magnets to enhance paper-based interactions. The integration of magnets in paper-based interactions simplifies the fabrication process, making it more accessible for building soft robotics systems. We discuss various interaction patterns achievable through this approach and highlight their potential applications.
This paper explores the implementation of embedded magnets to enhance paper-based interactions. The integration of magnets in paper-based interactions simplifies the fabrication process, making it more accessible for building soft robotics systems. We discuss various interaction patterns achievable through this approach and highlight their potential applications.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Personality-affected Emotion Generation in Dialog Systems
Authors:
Zhiyuan Wen,
Jiannong Cao,
Jiaxing Shen,
Ruosong Yang,
Shuaiqi Liu,
Maosong Sun
Abstract:
Generating appropriate emotions for responses is essential for dialog systems to provide human-like interaction in various application scenarios. Most previous dialog systems tried to achieve this goal by learning empathetic manners from anonymous conversational data. However, emotional responses generated by those methods may be inconsistent, which will decrease user engagement and service qualit…
▽ More
Generating appropriate emotions for responses is essential for dialog systems to provide human-like interaction in various application scenarios. Most previous dialog systems tried to achieve this goal by learning empathetic manners from anonymous conversational data. However, emotional responses generated by those methods may be inconsistent, which will decrease user engagement and service quality. Psychological findings suggest that the emotional expressions of humans are rooted in personality traits. Therefore, we propose a new task, Personality-affected Emotion Generation, to generate emotion based on the personality given to the dialog system and further investigate a solution through the personality-affected mood transition. Specifically, we first construct a daily dialog dataset, Personality EmotionLines Dataset (PELD), with emotion and personality annotations. Subsequently, we analyze the challenges in this task, i.e., (1) heterogeneously integrating personality and emotional factors and (2) extracting multi-granularity emotional information in the dialog context. Finally, we propose to model the personality as the transition weight by simulating the mood transition process in the dialog system and solve the challenges above. We conduct extensive experiments on PELD for evaluation. Results suggest that by adopting our method, the emotion generation performance is improved by 13% in macro-F1 and 5% in weighted-F1 from the BERT-base model.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Uncovering quantum characteristics of incipient evolutions at the photosynthetic oxygen evolving complex
Authors:
Pei-Ying Huo,
Wei-Zhou Jiang,
Rong-Yao Yang,
Xiu-Rong Zhang
Abstract:
Water oxidation of photosynthesis at the oxygen evolving complex (OEC) is driven by the polarization field induced by the photoelectric hole. By highlighting the role of the polarization field in resha** the spin-orbit coupling deduced from the Dirac quantum mechanics, we reveal in this work the characteristics and underlying mechanism in the relatively simpler OEC evolutions within the states S…
▽ More
Water oxidation of photosynthesis at the oxygen evolving complex (OEC) is driven by the polarization field induced by the photoelectric hole. By highlighting the role of the polarization field in resha** the spin-orbit coupling deduced from the Dirac quantum mechanics, we reveal in this work the characteristics and underlying mechanism in the relatively simpler OEC evolutions within the states S0 - S2 prior to the water oxidation. The characteristic shifts of the density of states (DOS) of the electron donor Mn atom are observed in the vicinity of the Fermi surface to occur with the spin flips of Mn atoms and the change of the Mn oxidation states during the electron transfer. Notably, the spin flips of Mn atoms point to the resulting spin configuration of the next states. It is found that the electron transfer tend to stabilize the catalyst OEC itself, whereas the proton transfer pushes the evolution forward by preparing a new electron donor. Meanwhile, it shows that the Mn-O bonds around the candidate Mn atom of the electron donor undergo characteristic changes in the bond lengths during the electron transfer. These concomitant phenomena uncovered in first-principle calculations characterize the essential equilibrium of the OEC between the state evolution and stability that forms a ground of the dynamic OEC cycles.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Disguised Copyright Infringement of Latent Diffusion Models
Authors:
Yiwei Lu,
Matthew Y. R. Yang,
Zuoqiu Liu,
Gautam Kamath,
Yaoliang Yu
Abstract:
Copyright infringement may occur when a generative model produces samples substantially similar to some copyrighted data that it had access to during the training phase. The notion of access usually refers to including copyrighted samples directly in the training dataset, which one may inspect to identify an infringement. We argue that such visual auditing largely overlooks a concealed copyright i…
▽ More
Copyright infringement may occur when a generative model produces samples substantially similar to some copyrighted data that it had access to during the training phase. The notion of access usually refers to including copyrighted samples directly in the training dataset, which one may inspect to identify an infringement. We argue that such visual auditing largely overlooks a concealed copyright infringement, where one constructs a disguise that looks drastically different from the copyrighted sample yet still induces the effect of training Latent Diffusion Models on it. Such disguises only require indirect access to the copyrighted material and cannot be visually distinguished, thus easily circumventing the current auditing tools. In this paper, we provide a better understanding of such disguised copyright infringement by uncovering the disguises generation algorithm, the revelation of the disguises, and importantly, how to detect them to augment the existing toolbox. Additionally, we introduce a broader notion of acknowledgment for comprehending such indirect access. Our code is available at https://github.com/watml/disguised_copyright_infringement.
△ Less
Submitted 3 June, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
Experimental Demonstration of Controllable PT and anti-PT Coupling in a non-Hermitian Metamaterial
Authors:
Chang Li,
Ruisheng Yang,
Xinchao Huang,
Quanhong Fu,
Yuancheng Fan,
Fuli Zhang
Abstract:
Non-Hermiticity has recently emerged as a rapidly develo** field due to its exotic characteristics related to open systems, where the dissipation plays a critical role. In the presence of balanced energy gain and loss with environment, the system exhibits parity-time (PT) symmetry, meanwhile as the conjugate counterpart, anti-PT symmetry can be achieved with dissipative coupling within the syste…
▽ More
Non-Hermiticity has recently emerged as a rapidly develo** field due to its exotic characteristics related to open systems, where the dissipation plays a critical role. In the presence of balanced energy gain and loss with environment, the system exhibits parity-time (PT) symmetry, meanwhile as the conjugate counterpart, anti-PT symmetry can be achieved with dissipative coupling within the system. Here, we demonstrate the coherence of complex dissipative coupling can control the transition between PT and anti-PT symmetry in an electromagnetic metamaterial. Notably, the achievement of the anti-PT symmetric phase is independent of variations in dissipation. Furthermore, we observe phase transitions as the system crosses exceptional points in both anti-PT and PT symmetric metamaterial configurations, achieved by manipulating the frequency and dissipation of resonators. This work provides a promising metamaterial design for broader exploration of non-Hermitian physics and practical application with controllable Hamiltonian.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Quantum and Classical Two-photon Interference of Single Photons with Ultralong Coherence Time
Authors:
Manman Wang,
Yanfeng Li,
Hanqing Liu,
Haiqiao Ni,
Zhichuan Niu,
Xiaogang Wei,
Renfu Yang,
Chengyong Hu
Abstract:
Two-photon interference (TPI) is a fundamental phenomenon in quantum optics and plays a crucial role in quantum information science and technology. TPI is commonly considered as quantum interference with an upper bound of $100\%$ for both the TPI visibility and the beat visibility in contrast to its classical counterpart with a maximum visibility of $50\%$. However, this is not always the case. He…
▽ More
Two-photon interference (TPI) is a fundamental phenomenon in quantum optics and plays a crucial role in quantum information science and technology. TPI is commonly considered as quantum interference with an upper bound of $100\%$ for both the TPI visibility and the beat visibility in contrast to its classical counterpart with a maximum visibility of $50\%$. However, this is not always the case. Here we report a simultaneous observation of quantum and classical TPI of single photons with ultralong coherence time which is longer than the photon correlation time by five orders of magnitude. We observe a TPI visibility of $94.3\%\pm 0.2\%$ but a beat visibility of $50\%$. Besides an anti-bunching central dip due to single-photon statistics, we observe two bunching side peaks in cross-correlation curves for indistinguishable photons. Using either classical wave superposition theory or quantum field approach, we derive the same expressions for the cross-correlation functions which reproduce and explain the experiments well. We conclude that quantum TPI with a stream of single photons is equivalent to classical TPI, both of which are the fourth-order interference arising from the second-order interference occurring on the time scale of photon coherence time.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
LHAASO-KM2A detector simulation using Geant4
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (254 additional authors not shown)
Abstract:
KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with…
▽ More
KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with large altitude difference (30 m) and huge coverage (1.3 km^2). In this paper, the design of the KM2A simulation code G4KM2A based on Geant4 is introduced. The process of G4KM2A is optimized mainly in memory consumption to avoid memory overffow. Some simpliffcations are used to signiffcantly speed up the execution of G4KM2A. The running time is reduced by at least 30 times compared to full detector simulation. The particle distributions and the core/angle resolution comparison between simulation and experimental data of the full KM2A array are also presented, which show good agreement.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Affective-NLI: Towards Accurate and Interpretable Personality Recognition in Conversation
Authors:
Zhiyuan Wen,
Jiannong Cao,
Yu Yang,
Ruosong Yang,
Shuaiqi Liu
Abstract:
Personality Recognition in Conversation (PRC) aims to identify the personality traits of speakers through textual dialogue content. It is essential for providing personalized services in various applications of Human-Computer Interaction (HCI), such as AI-based mental therapy and companion robots for the elderly. Most recent studies analyze the dialog content for personality classification yet ove…
▽ More
Personality Recognition in Conversation (PRC) aims to identify the personality traits of speakers through textual dialogue content. It is essential for providing personalized services in various applications of Human-Computer Interaction (HCI), such as AI-based mental therapy and companion robots for the elderly. Most recent studies analyze the dialog content for personality classification yet overlook two major concerns that hinder their performance. First, crucial implicit factors contained in conversation, such as emotions that reflect the speakers' personalities are ignored. Second, only focusing on the input dialog content disregards the semantic understanding of personality itself, which reduces the interpretability of the results. In this paper, we propose Affective Natural Language Inference (Affective-NLI) for accurate and interpretable PRC. To utilize affectivity within dialog content for accurate personality recognition, we fine-tuned a pre-trained language model specifically for emotion recognition in conversations, facilitating real-time affective annotations for utterances. For interpretability of recognition results, we formulate personality recognition as an NLI problem by determining whether the textual description of personality labels is entailed by the dialog content. Extensive experiments on two daily conversation datasets suggest that Affective-NLI significantly outperforms (by 6%-7%) state-of-the-art approaches. Additionally, our Flow experiment demonstrates that Affective-NLI can accurately recognize the speaker's personality in the early stages of conversations by surpassing state-of-the-art methods with 22%-34%.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction
Authors:
Xiaolu Liu,
Song Wang,
Wentong Li,
Ruizi Yang,
Junbo Chen,
Jianke Zhu
Abstract:
Currently, high-definition (HD) map construction leans towards a lightweight online generation tendency, which aims to preserve timely and reliable road scene information. However, map elements contain strong shape priors. Subtle and sparse annotations make current detection-based frameworks ambiguous in locating relevant feature scopes and cause the loss of detailed structures in prediction. To a…
▽ More
Currently, high-definition (HD) map construction leans towards a lightweight online generation tendency, which aims to preserve timely and reliable road scene information. However, map elements contain strong shape priors. Subtle and sparse annotations make current detection-based frameworks ambiguous in locating relevant feature scopes and cause the loss of detailed structures in prediction. To alleviate these problems, we propose MGMap, a mask-guided approach that effectively highlights the informative regions and achieves precise map element localization by introducing the learned masks. Specifically, MGMap employs learned masks based on the enhanced multi-scale BEV features from two perspectives. At the instance level, we propose the Mask-activated instance (MAI) decoder, which incorporates global instance and structural information into instance queries by the activation of instance masks. At the point level, a novel position-guided mask patch refinement (PG-MPR) module is designed to refine point locations from a finer-grained perspective, enabling the extraction of point-specific patch information. Compared to the baselines, our proposed MGMap achieves a notable improvement of around 10 mAP for different input modalities. Extensive experiments also demonstrate that our approach showcases strong robustness and generalization capabilities. Our code can be found at https://github.com/xiaolul2/MGMap.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
PLoc: A New Evaluation Criterion Based on Physical Location for Autonomous Driving Datasets
Authors:
Ruining Yang,
Yuqi Peng
Abstract:
Autonomous driving has garnered significant attention as a key research area within artificial intelligence. In the context of autonomous driving scenarios, the varying physical locations of objects correspond to different levels of danger. However, conventional evaluation criteria for automatic driving object detection often overlook the crucial aspect of an object's physical location, leading to…
▽ More
Autonomous driving has garnered significant attention as a key research area within artificial intelligence. In the context of autonomous driving scenarios, the varying physical locations of objects correspond to different levels of danger. However, conventional evaluation criteria for automatic driving object detection often overlook the crucial aspect of an object's physical location, leading to evaluation results that may not accurately reflect the genuine threat posed by the object to the autonomous driving vehicle. To enhance the safety of autonomous driving, this paper introduces a novel evaluation criterion based on physical location information, termed PLoc. This criterion transcends the limitations of traditional criteria by acknowledging that the physical location of pedestrians in autonomous driving scenarios can provide valuable safety-related information. Furthermore, this paper presents a newly re-annotated dataset (ApolloScape-R) derived from ApolloScape. ApolloScape-R involves the relabeling of pedestrians based on the significance of their physical location. The dataset is utilized to assess the performance of various object detection models under the proposed PLoc criterion. Experimental results demonstrate that the average accuracy of all object detection models in identifying a person situated in the travel lane of an autonomous vehicle is lower than that for a person on a sidewalk. The dataset is publicly available at https://github.com/lnyrlyed/ApolloScape-R.git
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
GeV gamma-ray emission in the field of young massive star cluster RCW 38
Authors:
Ting-Ting Ge,
Xiao-Na Sun,
Rui-Zhi Yang,
Pak-Hin Thomas Tam,
Ming-Xuan Lu,
En-Wei Liang
Abstract:
We report the detection of gamma-ray emission by the Fermi Large Area Telescope (Fermi-LAT) towards the young massive star cluster RCW 38 in the 1-500 GeV photon energy range. We found spatially extended GeV emission towards the direction of RCW 38, which is best modelled by a Gaussian disc of 0.23$°$ radius with a significance of the extension is $\sim 11.4 σ$. Furthermore, the spatial correlatio…
▽ More
We report the detection of gamma-ray emission by the Fermi Large Area Telescope (Fermi-LAT) towards the young massive star cluster RCW 38 in the 1-500 GeV photon energy range. We found spatially extended GeV emission towards the direction of RCW 38, which is best modelled by a Gaussian disc of 0.23$°$ radius with a significance of the extension is $\sim 11.4 σ$. Furthermore, the spatial correlation with the ionized and molecular gas content favors the hadronic origin of the gamma-ray emission. The gamma-ray spectrum of RCW 38 has a relatively hard photon index of $2.44 \pm 0.03$, which is similar to other young massive star clusters. We argue that the diffuse GeV gamma-ray emission in this region likely originates from the interaction of accelerated protons in the stellar cluster with the ambient gas.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Interpretable Machine Learning for Weather and Climate Prediction: A Survey
Authors:
Ruyi Yang,
**gyu Hu,
Zihao Li,
Jianli Mu,
Tingzhao Yu,
Jiangjiang Xia,
Xuhong Li,
Aritra Dasgupta,
Haoyi Xiong
Abstract:
Advanced machine learning models have recently achieved high predictive accuracy for weather and climate prediction. However, these complex models often lack inherent transparency and interpretability, acting as "black boxes" that impede user trust and hinder further model improvements. As such, interpretable machine learning techniques have become crucial in enhancing the credibility and utility…
▽ More
Advanced machine learning models have recently achieved high predictive accuracy for weather and climate prediction. However, these complex models often lack inherent transparency and interpretability, acting as "black boxes" that impede user trust and hinder further model improvements. As such, interpretable machine learning techniques have become crucial in enhancing the credibility and utility of weather and climate modeling. In this survey, we review current interpretable machine learning approaches applied to meteorological predictions. We categorize methods into two major paradigms: 1) Post-hoc interpretability techniques that explain pre-trained models, such as perturbation-based, game theory based, and gradient-based attribution methods. 2) Designing inherently interpretable models from scratch using architectures like tree ensembles and explainable neural networks. We summarize how each technique provides insights into the predictions, uncovering novel meteorological relationships captured by machine learning. Lastly, we discuss research challenges around achieving deeper mechanistic interpretations aligned with physical principles, develo** standardized evaluation benchmarks, integrating interpretability into iterative model development workflows, and providing explainability for large foundation models.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Thin accretion disk and shadow of Kerr-Sen black hole in Einstein-Maxwell-dilaton-axion gravity
Authors:
Haiyuan Feng,
Rong-Jia Yang,
Wei-Qiang Chen
Abstract:
We investigate the accretion process in a thin disk surrounding a supermassive black hole in Einstein-Maxwell-dilaton-axion (EMDA) gravity by using the Novikov-Thorne model. The results reveal that as the dilaton parameter $r_2$ increase, the energy flux, radiation temperature, spectra luminosity, and radiative efficiency of the disk all increases. By narrowing down the dilaton parameter range to…
▽ More
We investigate the accretion process in a thin disk surrounding a supermassive black hole in Einstein-Maxwell-dilaton-axion (EMDA) gravity by using the Novikov-Thorne model. The results reveal that as the dilaton parameter $r_2$ increase, the energy flux, radiation temperature, spectra luminosity, and radiative efficiency of the disk all increases. By narrowing down the dilaton parameter range to $0\leqslant \frac{r_2}{M}\leqslant0.4$, we discover that in the high-frequency region, the Kerr-Sen black hole demonstrates higher energy output compared to the Kerr black hole. We also investigated the Kerr-Sen black hole shadow in a uniform plasma environment. For a fixed inclination angle of $θ_0=90^\circ$, $\frac{r_2}{M} = \frac{a}{M} = 0.5$, the shadow increase as the homogeneous plasma parameter $k$ increase. Conversely, when $k = 0.1$ and $\frac{a}{M} = 0.5$, an increase in $r_2$ leads to a decrease in the shadow. Furthermore, by using observational data from M87* and Sgr A*, we compared the theoretical and observed shadow diameters to constrain the range of the model parameters.
△ Less
Submitted 14 June, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
Visual Whole-Body Control for Legged Loco-Manipulation
Authors:
Minghuan Liu,
Zixuan Chen,
Xuxin Cheng,
Yandong Ji,
Ri-Zhao Qiu,
Ruihan Yang,
Xiaolong Wang
Abstract:
We study the problem of mobile manipulation using legged robots equipped with an arm, namely legged loco-manipulation. The robot legs, while usually utilized for mobility, offer an opportunity to amplify the manipulation capabilities by conducting whole-body control. That is, the robot can control the legs and the arm at the same time to extend its workspace. We propose a framework that can conduc…
▽ More
We study the problem of mobile manipulation using legged robots equipped with an arm, namely legged loco-manipulation. The robot legs, while usually utilized for mobility, offer an opportunity to amplify the manipulation capabilities by conducting whole-body control. That is, the robot can control the legs and the arm at the same time to extend its workspace. We propose a framework that can conduct the whole-body control autonomously with visual observations. Our approach, namely Visual Whole-Body Control(VBC), is composed of a low-level policy using all degrees of freedom to track the body velocities along with the end-effector position, and a high-level policy proposing the velocities and end-effector position based on visual inputs. We train both levels of policies in simulation and perform Sim2Real transfer for real robot deployment. We perform extensive experiments and show significant improvements over baselines in picking up diverse objects in different configurations (heights, locations, orientations) and environments.
△ Less
Submitted 14 May, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
OceanPlan: Hierarchical Planning and Replanning for Natural Language AUV Piloting in Large-scale Unexplored Ocean Environments
Authors:
Ruochu Yang,
Fumin Zhang,
Mengxue Hou
Abstract:
We develop a hierarchical LLM-task-motion planning and replanning framework to efficiently ground an abstracted human command into tangible Autonomous Underwater Vehicle (AUV) control through enhanced representations of the world. We also incorporate a holistic replanner to provide real-world feedback with all planners for robust AUV operation. While there has been extensive research in bridging t…
▽ More
We develop a hierarchical LLM-task-motion planning and replanning framework to efficiently ground an abstracted human command into tangible Autonomous Underwater Vehicle (AUV) control through enhanced representations of the world. We also incorporate a holistic replanner to provide real-world feedback with all planners for robust AUV operation. While there has been extensive research in bridging the gap between LLMs and robotic missions, they are unable to guarantee success of AUV applications in the vast and unknown ocean environment. To tackle specific challenges in marine robotics, we design a hierarchical planner to compose executable motion plans, which achieves planning efficiency and solution quality by decomposing long-horizon missions into sub-tasks. At the same time, real-time data stream is obtained by a replanner to address environmental uncertainties during plan execution. Experiments validate that our proposed framework delivers successful AUV performance of long-duration missions through natural language piloting.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection
Authors:
Junbo Yin,
Jianbing Shen,
Runnan Chen,
Wei Li,
Ruigang Yang,
Pascal Frossard,
Wenguan Wang
Abstract:
Bird's eye view (BEV) representation has emerged as a dominant solution for describing 3D space in autonomous driving scenarios. However, objects in the BEV representation typically exhibit small sizes, and the associated point cloud context is inherently sparse, which leads to great challenges for reliable 3D perception. In this paper, we propose IS-Fusion, an innovative multimodal fusion framewo…
▽ More
Bird's eye view (BEV) representation has emerged as a dominant solution for describing 3D space in autonomous driving scenarios. However, objects in the BEV representation typically exhibit small sizes, and the associated point cloud context is inherently sparse, which leads to great challenges for reliable 3D perception. In this paper, we propose IS-Fusion, an innovative multimodal fusion framework that jointly captures the Instance- and Scene-level contextual information. IS-Fusion essentially differs from existing approaches that only focus on the BEV scene-level fusion by explicitly incorporating instance-level multimodal information, thus facilitating the instance-centric tasks like 3D object detection. It comprises a Hierarchical Scene Fusion (HSF) module and an Instance-Guided Fusion (IGF) module. HSF applies Point-to-Grid and Grid-to-Region transformers to capture the multimodal scene context at different granularities. IGF mines instance candidates, explores their relationships, and aggregates the local multimodal context for each instance. These instances then serve as guidance to enhance the scene feature and yield an instance-aware BEV representation. On the challenging nuScenes benchmark, IS-Fusion outperforms all the published multimodal works to date. Code is available at: https://github.com/yinjunbo/IS-Fusion.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
ACCESS: Assurance Case Centric Engineering of Safety-critical Systems
Authors:
Ran Wei,
Simon Foster,
Haitao Mei,
Fang Yan,
Ruizhe Yang,
Ibrahim Habli,
Colin O'Halloran,
Nick Tudor,
Tim Kelly,
Yakoub Nemouchi
Abstract:
Assurance cases are used to communicate and assess confidence in critical system properties such as safety and security. Historically, assurance cases have been manually created documents, which are evaluated by system stakeholders through lengthy and complicated processes. In recent years, model-based system assurance approaches have gained popularity to improve the efficiency and quality of syst…
▽ More
Assurance cases are used to communicate and assess confidence in critical system properties such as safety and security. Historically, assurance cases have been manually created documents, which are evaluated by system stakeholders through lengthy and complicated processes. In recent years, model-based system assurance approaches have gained popularity to improve the efficiency and quality of system assurance activities. This becomes increasingly important, as systems becomes more complex, it is a challenge to manage their development life-cycles, including coordination of development, verification and validation activities, and change impact analysis in inter-connected system assurance artifacts. Moreover, there is a need for assurance cases that support evolution during the operational life of the system, to enable continuous assurance in the face of an uncertain environment, as Robotics and Autonomous Systems (RAS) are adopted into society. In this paper, we contribute ACCESS - Assurance Case Centric Engineering of Safety-critical Systems, an engineering methodology, together with its tool support, for the development of safety critical systems around evolving model-based assurance cases. We show how model-based system assurance cases can trace to heterogeneous engineering artifacts (e.g. system architectural models, system safety analysis, system behaviour models, etc.), and how formal methods can be integrated during the development process. We demonstrate how assurance cases can be automatically evaluated both at development and runtime. We apply our approach to a case study based on an Autonomous Underwater Vehicle (AUV).
△ Less
Submitted 16 April, 2024; v1 submitted 22 March, 2024;
originally announced March 2024.
-
Stable multivariate Narayana polynomials and labeled plane trees
Authors:
Harold R. L. Yang,
Philip B. Zhang
Abstract:
In this paper, we introduce stable multivariate generalizations of Narayana polynomials of type A and type B. We give an insertion algorithm for labeled plane trees and introduce the notion of improper edges. Our polynomials are multivariate generating polynomials of labeled plane trees and can be generated by a grammatical labeling based on a context-free grammar. Our proof of real stability uses…
▽ More
In this paper, we introduce stable multivariate generalizations of Narayana polynomials of type A and type B. We give an insertion algorithm for labeled plane trees and introduce the notion of improper edges. Our polynomials are multivariate generating polynomials of labeled plane trees and can be generated by a grammatical labeling based on a context-free grammar. Our proof of real stability uses a characterization of stable-preserving linear operators due to Borcea and Brändén. In particular, we get an alternative multivariate stable refinement of the second-order Eulerian polynomials, which is different from the one given by Haglund and Visontai.
△ Less
Submitted 8 April, 2024; v1 submitted 22 March, 2024;
originally announced March 2024.
-
Statistic Vectorial Complex Ray Model and its Application to Three-Dimension Scattering of a Non-spherical Particle
Authors:
Rui** Yang,
Bing Wei,
Claude Rozé,
Saïd Idlahcen,
Kuan Ffang Ren
Abstract:
A Statistic Vectorial Complex Ray Model (SVCRM) is proposed for the scattering of a plane wave by a non-spherical dielectric particle in three dimensions. This method counts the complex amplitudes of all rays arriving in a tiny box in the observation direction. It avoids the two-dimensional interpolation necessary in the Vectorial Complex Ray Model (VCRM) for the calculation of the total field. So…
▽ More
A Statistic Vectorial Complex Ray Model (SVCRM) is proposed for the scattering of a plane wave by a non-spherical dielectric particle in three dimensions. This method counts the complex amplitudes of all rays arriving in a tiny box in the observation direction. It avoids the two-dimensional interpolation necessary in the Vectorial Complex Ray Model (VCRM) for the calculation of the total field. So, it is more flexible and suitable to deal with the particle of complex shape. The algorithm has been carefully designed for the calculation of the phases due to the optical path, the reflection and the focal lines/points as well as the amplitude variation caused by the reflection, the refraction and the divergence of the wave on the particle surface. This model is then applied, as an example, to simulate the three-dimensional scattering patterns of a pendent drop. The scattering mechanism is analyzed in details and the special attention has been paid to the scattering patterns around the rainbow angles where the caustics occur. The simulated results have been compared to the experimental results for a pendent drop of two typical sizes and shapes. It is shown that the simulated and the experimental results are in good agreement. This method opens a promising potential in the development of optical measurement techniques in fluid mechanics.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Giant electrode effect on tunneling magnetoresistance and electroresistance in van der Waals intrinsic multiferroic tunnel junctions using VS2
Authors:
Zhi Yan,
Ruixia Yang,
Cheng Fang,
Wentian Lu,
Xiaohong Xu
Abstract:
Van der Waals multiferroic tunnel junctions (vdW-MFTJs) with multiple nonvolatile resistive states are highly suitable for new physics and next-generation storage electronics. However, currently reported vdW-MFTJs are based on two types of materials, i.e., vdW ferromagnetic and ferroelectric materials, forming a multiferroic system. This undoubtedly introduces additional interfaces, increasing the…
▽ More
Van der Waals multiferroic tunnel junctions (vdW-MFTJs) with multiple nonvolatile resistive states are highly suitable for new physics and next-generation storage electronics. However, currently reported vdW-MFTJs are based on two types of materials, i.e., vdW ferromagnetic and ferroelectric materials, forming a multiferroic system. This undoubtedly introduces additional interfaces, increasing the complexity of experimental preparation. Herein, we engineer vdW intrinsic MFTJs utilizing bilayer VS$_2$. By employing the nonequilibrium Green's function combined with density functional theory, we systematically investigate the influence of three types of electrodes (including non-vdW pure metal Ag/Au, vdW metallic 1T-MoS$_2$/2H-PtTe$_2$, and vdW ferromagnetic metallic Fe$_3$GaTe$_2$/Fe$_3$GeTe$_2$) on the electronic transport properties of VS$_2$-based intrinsic MFTJs. We demonstrate that these MFTJs manifest a giant electrode-dependent electronic transport characteristic effect. Comprehensively comparing these electrode pairs, the Fe$_3$GaTe$_2$/Fe$_3$GeTe$_2$ electrode combination exhibits optimal transport properties, the maximum TMR (TER) can reach 10949\% (69\%) and the minimum resistance-area product (RA) is 0.45 $Ω$$μ$m$^{2}$, as well as the perfect spin filtering and negative differential resistance effects. More intriguingly, TMR (TER) can be further enhanced to 34000\% (380\%) by applying an external bias voltage (0.1 V), while RA can be reduced to 0.16 $Ω$$μ$m$^{2}$ under the influence of biaxial stress (-3\%). Our proposed concept of designing vdW-MFTJs using intrinsic multiferroic materials points towards new avenues in experimental exploration.
△ Less
Submitted 7 May, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
Algorithmic Complexity Attacks on Dynamic Learned Indexes
Authors:
Rui Yang,
Evgenios M. Kornaropoulos,
Yue Cheng
Abstract:
Learned Index Structures (LIS) view a sorted index as a model that learns the data distribution, takes a data element key as input, and outputs the predicted position of the key. The original LIS can only handle lookup operations with no support for updates, rendering it impractical to use for typical workloads. To address this limitation, recent studies have focused on designing efficient dynamic…
▽ More
Learned Index Structures (LIS) view a sorted index as a model that learns the data distribution, takes a data element key as input, and outputs the predicted position of the key. The original LIS can only handle lookup operations with no support for updates, rendering it impractical to use for typical workloads. To address this limitation, recent studies have focused on designing efficient dynamic learned indexes. ALEX, as the pioneering dynamic learned index structures, enables dynamism by incorporating a series of design choices, including adaptive key space partitioning, dynamic model retraining, and sophisticated engineering and policies that prioritize read/write performance. While these design choices offer improved average-case performance, the emphasis on flexibility and performance increases the attack surface by allowing adversarial behaviors that maximize ALEX's memory space and time complexity in worst-case scenarios. In this work, we present the first systematic investigation of algorithmic complexity attacks (ACAs) targeting the worst-case scenarios of ALEX. We introduce new ACAs that fall into two categories, space ACAs and time ACAs, which target the memory space and time complexity, respectively. First, our space ACA on data nodes exploits ALEX's gapped array layout and uses Multiple-Choice Knapsack (MCK) to generate an optimal adversarial insertion plan for maximizing the memory consumption at the data node level. Second, our space ACA on internal nodes exploits ALEX's catastrophic cost mitigation mechanism, causing an out-of-memory error with only a few hundred adversarial insertions. Third, our time ACA generates pathological insertions to increase the disparity between the actual key distribution and the linear models of data nodes, deteriorating the runtime performance by up to 1,641X compared to ALEX operating under legitimate workloads.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Driving Style Alignment for LLM-powered Driver Agent
Authors:
Ruoxuan Yang,
Xinyue Zhang,
Anais Fernandez-Laaksonen,
Xin Ding,
Jiangtao Gong
Abstract:
Recently, LLM-powered driver agents have demonstrated considerable potential in the field of autonomous driving, showcasing human-like reasoning and decision-making abilities.However, current research on aligning driver agent behaviors with human driving styles remains limited, partly due to the scarcity of high-quality natural language data from human driving behaviors.To address this research ga…
▽ More
Recently, LLM-powered driver agents have demonstrated considerable potential in the field of autonomous driving, showcasing human-like reasoning and decision-making abilities.However, current research on aligning driver agent behaviors with human driving styles remains limited, partly due to the scarcity of high-quality natural language data from human driving behaviors.To address this research gap, we propose a multi-alignment framework designed to align driver agents with human driving styles through demonstrations and feedback. Notably, we construct a natural language dataset of human driver behaviors through naturalistic driving experiments and post-driving interviews, offering high-quality human demonstrations for LLM alignment. The framework's effectiveness is validated through simulation experiments in the CARLA urban traffic simulator and further corroborated by human evaluations. Our research offers valuable insights into designing driving agents with diverse driving styles.The implementation of the framework and details of the dataset can be found at the link.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Spectrum conversion and pattern preservation of Airy beams in fractional systems with a dynamical harmonic-oscillator potential
Authors:
Xiaoqin Bai,
Juan Bai,
Boris A. Malomed,
Rongcao Yang
Abstract:
We investigate the dynamics of optical Airy beams in the one-dimensional fractional Schrödinger equation with a harmonic-oscillator (HO) potential subjected to modulation along the propagation distance. Deriving general solutions for propagating beams and particular solutions for Airy waves with/without chirp, we study analytically the spectrum conversion and pattern preservation for the chirp-fre…
▽ More
We investigate the dynamics of optical Airy beams in the one-dimensional fractional Schrödinger equation with a harmonic-oscillator (HO) potential subjected to modulation along the propagation distance. Deriving general solutions for propagating beams and particular solutions for Airy waves with/without chirp, we study analytically the spectrum conversion and pattern preservation for the chirp-free and chirped Airy beams in the fractional system including the HO potential with moiré-lattice, hyperbolic-secant, and delta-functional modulation formats. For the HO-moiré-lattice potential, it is found that the chirp-free Airy beam experiences multiple spectrum conversions between the Airy and Gaussian patterns in the momentum space, preserving the Airy pattern in the coordinate space. The chirp magnitude of the chirped Airy beam determines whether the spectrum conversion occurs in the momentum space, and the splitting and evolution direction of the beam in the coordinate space. For the HO-hyperbolic-secant potential, the chirp-free Airy beam undergoes spectrum conversion and tunneling, with the positions of the spectrum conversion and tunneling significantly depending on parameters of the hyperbolic-secant potential; however, the spectrum conversion and pattern preservation of the chirped Airy beam occurs only under a certain relation of the chirp and parameters of the potential. In the case of the HO-delta-functional potential, the chirp-free Airy beam experiences abrupt spectrum conversion and a two-step spectrum shift; however, for the chirped Airy beam, the spectrum conversion is affected by the relation between the chirp and height of the potential. Effects of the fractional Lévy index on the spectrum conversion and pattern preservation of the Airy beams under the action of the three modulation patterns considered here are also explored in detail.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Frequency-Reactive Power Optimization Strategy of Grid-forming Offshore Wind Farm Using DRU-HVDC Transmission
Authors:
Zhekai Li,
Kun Han,
Xu Cai,
Renxin Yang,
Haotian Yu,
Kepeng Xia,
Lulu Liu
Abstract:
The diode rectifier unit-based high voltage direct current (DRU-HVDC) transmission with grid-forming (GFM) wind turbine is becoming a promising scheme for offshore wind farm(OWF) integration due to its high reliability and low cost. In this scheme, the AC network of the OWF and the DRU has completely different synchronization mechanisms and power flow characteristics from the traditional power sys…
▽ More
The diode rectifier unit-based high voltage direct current (DRU-HVDC) transmission with grid-forming (GFM) wind turbine is becoming a promising scheme for offshore wind farm(OWF) integration due to its high reliability and low cost. In this scheme, the AC network of the OWF and the DRU has completely different synchronization mechanisms and power flow characteristics from the traditional power system. To optimize the power flow and reduce the net loss, this paper carries out the power flow modeling and optimization analysis for the DRU-HVDC transmission system with grid-forming OWFs. The influence of the DRU and the GFM wind turbines on the power flow of the system is analyzed. On this basis, improved constraint conditions are proposed and an optimal power flow (OPF) method is established. This method can minimize the power loss by adjusting the reactive power output of each wind turbine and internal network frequency. Finally, based on MATLAB, this paper uses YALMIP toolkit and CPLEX mathematical solver to realize the programming solution of the OPF model proposed in this paper. The results show that the proposed optimization strategy can effectively reduce the power loss of the entire OWF and the transmission system with an optimization ratio of network losses exceeding 25.3%.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Measurements of All-Particle Energy Spectrum and Mean Logarithmic Mass of Cosmic Rays from 0.3 to 30 PeV with LHAASO-KM2A
Authors:
The LHAASO Collaboration,
Zhen Cao,
F. Aharonian,
Q. An,
A. Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen
, et al. (256 additional authors not shown)
Abstract:
We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at…
▽ More
We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at $3.67 \pm 0.05 \pm 0.15$ PeV. Below the knee, the spectral index is found to be -$2.7413 \pm 0.0004 \pm 0.0050$, while above the knee, it is -$3.128 \pm 0.005 \pm 0.027$, with the sharpness of the transition measured with a statistical error of 2%. The mean logarithmic mass of cosmic rays is almost heavier than helium in the whole measured energy range. It decreases from 1.7 at 0.3 PeV to 1.3 at 3 PeV, representing a 24% decline following a power law with an index of -$0.1200 \pm 0.0003 \pm 0.0341$. This is equivalent to an increase in abundance of light components. Above the knee, the mean logarithmic mass exhibits a power law trend towards heavier components, which is reversal to the behavior observed in the all-particle energy spectrum. Additionally, the knee position and the change in power-law index are approximately the same. These findings suggest that the knee observed in the all-particle spectrum corresponds to the knee of the light component, rather than the medium-heavy components.
△ Less
Submitted 26 March, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Gras**
Authors:
Yuhang Zheng,
Xiangyu Chen,
Yupeng Zheng,
Songen Gu,
Runyi Yang,
Bu **,
Pengfei Li,
Chengliang Zhong,
Zengmao Wang,
Lina Liu,
Chao Yang,
Dawei Wang,
Zhen Chen,
Xiaoxiao Long,
Meiqing Wang
Abstract:
Constructing a 3D scene capable of accommodating open-ended language queries, is a pivotal pursuit, particularly within the domain of robotics. Such technology facilitates robots in executing object manipulations based on human language directives. To tackle this challenge, some research efforts have been dedicated to the development of language-embedded implicit fields. However, implicit fields (…
▽ More
Constructing a 3D scene capable of accommodating open-ended language queries, is a pivotal pursuit, particularly within the domain of robotics. Such technology facilitates robots in executing object manipulations based on human language directives. To tackle this challenge, some research efforts have been dedicated to the development of language-embedded implicit fields. However, implicit fields (e.g. NeRF) encounter limitations due to the necessity of processing a large number of input views for reconstruction, coupled with their inherent inefficiencies in inference. Thus, we present the GaussianGrasper, which utilizes 3D Gaussian Splatting to explicitly represent the scene as a collection of Gaussian primitives. Our approach takes a limited set of RGB-D views and employs a tile-based splatting technique to create a feature field. In particular, we propose an Efficient Feature Distillation (EFD) module that employs contrastive learning to efficiently and accurately distill language embeddings derived from foundational models. With the reconstructed geometry of the Gaussian field, our method enables the pre-trained gras** model to generate collision-free grasp pose candidates. Furthermore, we propose a normal-guided grasp module to select the best grasp pose. Through comprehensive real-world experiments, we demonstrate that GaussianGrasper enables robots to accurately query and grasp objects with language instructions, providing a new solution for language-guided manipulation tasks. Data and codes can be available at https://github.com/MrSecant/GaussianGrasper.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
LDPRecover: Recovering Frequencies from Poisoning Attacks against Local Differential Privacy
Authors:
Xinyue Sun,
Qingqing Ye,
Haibo Hu,
Jiawei Duan,
Tianyu Wo,
Jie Xu,
Renyu Yang
Abstract:
Local differential privacy (LDP), which enables an untrusted server to collect aggregated statistics from distributed users while protecting the privacy of those users, has been widely deployed in practice. However, LDP protocols for frequency estimation are vulnerable to poisoning attacks, in which an attacker can poison the aggregated frequencies by manipulating the data sent from malicious user…
▽ More
Local differential privacy (LDP), which enables an untrusted server to collect aggregated statistics from distributed users while protecting the privacy of those users, has been widely deployed in practice. However, LDP protocols for frequency estimation are vulnerable to poisoning attacks, in which an attacker can poison the aggregated frequencies by manipulating the data sent from malicious users. Therefore, it is an open challenge to recover the accurate aggregated frequencies from poisoned ones.
In this work, we propose LDPRecover, a method that can recover accurate aggregated frequencies from poisoning attacks, even if the server does not learn the details of the attacks. In LDPRecover, we establish a genuine frequency estimator that theoretically guides the server to recover the frequencies aggregated from genuine users' data by eliminating the impact of malicious users' data in poisoned frequencies. Since the server has no idea of the attacks, we propose an adaptive attack to unify existing attacks and learn the statistics of the malicious data within this adaptive attack by exploiting the properties of LDP protocols. By taking the estimator and the learning statistics as constraints, we formulate the problem of recovering aggregated frequencies to approach the genuine ones as a constraint inference (CI) problem. Consequently, the server can obtain accurate aggregated frequencies by solving this problem optimally. Moreover, LDPRecover can serve as a frequency recovery paradigm that recovers more accurate aggregated frequencies by integrating attack details as new constraints in the CI problem. Our evaluation on two real-world datasets, three LDP protocols, and untargeted and targeted poisoning attacks shows that LDPRecover is both accurate and widely applicable against various poisoning attacks.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Learning Generalizable Feature Fields for Mobile Manipulation
Authors:
Ri-Zhao Qiu,
Yafei Hu,
Ge Yang,
Yuchen Song,
Yang Fu,
Jianglong Ye,
Jiteng Mu,
Ruihan Yang,
Nikolay Atanasov,
Sebastian Scherer,
Xiaolong Wang
Abstract:
An open problem in mobile manipulation is how to represent objects and scenes in a unified manner, so that robots can use it both for navigating in the environment and manipulating objects. The latter requires capturing intricate geometry while understanding fine-grained semantics, whereas the former involves capturing the complexity inherit to an expansive physical scale. In this work, we present…
▽ More
An open problem in mobile manipulation is how to represent objects and scenes in a unified manner, so that robots can use it both for navigating in the environment and manipulating objects. The latter requires capturing intricate geometry while understanding fine-grained semantics, whereas the former involves capturing the complexity inherit to an expansive physical scale. In this work, we present GeFF (Generalizable Feature Fields), a scene-level generalizable neural feature field that acts as a unified representation for both navigation and manipulation that performs in real-time. To do so, we treat generative novel view synthesis as a pre-training task, and then align the resulting rich scene priors with natural language via CLIP feature distillation. We demonstrate the effectiveness of this approach by deploying GeFF on a quadrupedal robot equipped with a manipulator. We evaluate GeFF's ability to generalize to open-set objects as well as running time, when performing open-vocabulary mobile manipulation in dynamic scenes.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques
Authors:
Rui Yang,
Haoran Liu,
Edison Marrese-Taylor,
Qingcheng Zeng,
Yu He Ke,
Wanxin Li,
Lechao Cheng,
Qingyu Chen,
James Caverlee,
Yutaka Matsuo,
Irene Li
Abstract:
Large Language Models (LLMs) have significantly advanced healthcare innovation on generation capabilities. However, their application in real clinical settings is challenging due to potential deviations from medical facts and inherent biases. In this work, we develop an augmented LLM framework, KG-Rank, which leverages a medical knowledge graph (KG) with ranking and re-ranking techniques, aiming t…
▽ More
Large Language Models (LLMs) have significantly advanced healthcare innovation on generation capabilities. However, their application in real clinical settings is challenging due to potential deviations from medical facts and inherent biases. In this work, we develop an augmented LLM framework, KG-Rank, which leverages a medical knowledge graph (KG) with ranking and re-ranking techniques, aiming to improve free-text question-answering (QA) in the medical domain. Specifically, upon receiving a question, we initially retrieve triplets from a medical KG to gather factual information. Subsequently, we innovatively apply ranking methods to refine the ordering of these triplets, aiming to yield more precise answers. To the best of our knowledge, KG-Rank is the first application of ranking models combined with KG in medical QA specifically for generating long answers. Evaluation of four selected medical QA datasets shows that KG-Rank achieves an improvement of over 18% in the ROUGE-L score. Moreover, we extend KG-Rank to open domains, where it realizes a 14% improvement in ROUGE-L, showing the effectiveness and potential of KG-Rank.
△ Less
Submitted 18 March, 2024; v1 submitted 9 March, 2024;
originally announced March 2024.
-
Low-Resource Court Judgment Summarization for Common Law Systems
Authors:
Shuaiqi Liu,
Jiannong Cao,
Yicong Li,
Ruosong Yang,
Zhiyuan Wen
Abstract:
Common law courts need to refer to similar precedents' judgments to inform their current decisions. Generating high-quality summaries of court judgment documents can facilitate legal practitioners to efficiently review previous cases and assist the general public in accessing how the courts operate and how the law is applied. Previous court judgment summarization research focuses on civil law or a…
▽ More
Common law courts need to refer to similar precedents' judgments to inform their current decisions. Generating high-quality summaries of court judgment documents can facilitate legal practitioners to efficiently review previous cases and assist the general public in accessing how the courts operate and how the law is applied. Previous court judgment summarization research focuses on civil law or a particular jurisdiction's judgments. However, judges can refer to the judgments from all common law jurisdictions. Current summarization datasets are insufficient to satisfy the demands of summarizing precedents across multiple jurisdictions, especially when labeled data are scarce for many jurisdictions. To address the lack of datasets, we present CLSum, the first dataset for summarizing multi-jurisdictional common law court judgment documents. Besides, this is the first court judgment summarization work adopting large language models (LLMs) in data augmentation, summary generation, and evaluation. Specifically, we design an LLM-based data augmentation method incorporating legal knowledge. We also propose a legal knowledge enhanced evaluation metric based on LLM to assess the quality of generated judgment summaries. Our experimental results verify that the LLM-based summarization methods can perform well in the few-shot and zero-shot settings. Our LLM-based data augmentation method can mitigate the impact of low data resources. Furthermore, we carry out comprehensive comparative experiments to find essential model components and settings that are capable of enhancing summarization performance.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Incremental Bayesian Learning for Fail-Operational Control in Autonomous Driving
Authors:
Lei Zheng,
Rui Yang,
Zengqi Peng,
Wei Yan,
Michael Yu Wang,
Jun Ma
Abstract:
Abrupt maneuvers by surrounding vehicles (SVs) can typically lead to safety concerns and affect the task efficiency of the ego vehicle (EV), especially with model uncertainties stemming from environmental disturbances. This paper presents a real-time fail-operational controller that ensures the asymptotic convergence of an uncertain EV to a safe state, while preserving task efficiency in dynamic e…
▽ More
Abrupt maneuvers by surrounding vehicles (SVs) can typically lead to safety concerns and affect the task efficiency of the ego vehicle (EV), especially with model uncertainties stemming from environmental disturbances. This paper presents a real-time fail-operational controller that ensures the asymptotic convergence of an uncertain EV to a safe state, while preserving task efficiency in dynamic environments. An incremental Bayesian learning approach is developed to facilitate online learning and inference of changing environmental disturbances. Leveraging disturbance quantification and constraint transformation, we develop a stochastic fail-operational barrier based on the control barrier function (CBF). With this development, the uncertain EV is able to converge asymptotically from an unsafe state to a defined safe state with probabilistic stability. Subsequently, the stochastic fail-operational barrier is integrated into an efficient fail-operational controller based on quadratic programming (QP). This controller is tailored for the EV operating under control constraints in the presence of environmental disturbances, with both safety and efficiency objectives taken into consideration. We validate the proposed framework in connected cruise control (CCC) tasks, where SVs perform aggressive driving maneuvers. The simulation results demonstrate that our method empowers the EV to swiftly return to a safe state while upholding task efficiency in real time, even under time-varying environmental disturbances.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Enhancing Magnetocaloric Material Discovery: A Machine Learning Approach Using an Autogenerated Database by Large Language Models
Authors:
Jiaoyue Yuan,
Runqing Yang,
Lokanath Patra,
Bolin Liao
Abstract:
Magnetic cooling based on the magnetocaloric effect is a promising solid-state refrigeration technology for a wide range of applications in different temperature ranges. Previous studies have mostly focused on near room temperature (300 K) and cryogenic temperature (< 10 K) ranges, while important applications such as hydrogen liquefaction call for efficient magnetic refrigerants for the intermedi…
▽ More
Magnetic cooling based on the magnetocaloric effect is a promising solid-state refrigeration technology for a wide range of applications in different temperature ranges. Previous studies have mostly focused on near room temperature (300 K) and cryogenic temperature (< 10 K) ranges, while important applications such as hydrogen liquefaction call for efficient magnetic refrigerants for the intermediate temperature 10K to 100 K. For efficient use in this range, new magnetocaloric materials with matching Curie temperatures need to be discovered, while conventional experimental approaches are typically time-consuming and expensive. Here, we report a computational material discovery pipeline based on a materials database containing more than 6000 entries auto-generated by extracting reported material properties from literature using a large language model. We then use this database to train a machine learning model that can efficiently predict magnetocaloric properties of materials based on their chemical composition. We further verify the magnetocaloric properties of predicted compounds using ab initio atomistic spin dynamics simulations to close the loop for computational material discovery. Using this approach, we identify 11 new promising magnetocaloric materials for the target temperature range. Our work demonstrates the potential of combining large language models, machine learning, and ab initio simulations to efficiently discover new functional materials.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Observation of Quantized Klein Tunneling in a Dielectric Resonator Chain
Authors:
Rui-Jie Zhang,
Xiao-Zhen Peng,
Ri-Zhen Yang,
Rui-Hua Ni,
Yong-Yin Hu,
Hong-Ya Xu,
Liang Huang
Abstract:
We present the first experimental observation of quantized Klein tunneling in a bounded Dirac system, implemented by a dimer chain of dielectric microwave resonators. Both the unusual quantized levels and corresponding spinor states hybridized from distinct particle and hole wavefunctions are measured. All observations are in quantitative agreement with the hitherto-untested prediction of Dirac eq…
▽ More
We present the first experimental observation of quantized Klein tunneling in a bounded Dirac system, implemented by a dimer chain of dielectric microwave resonators. Both the unusual quantized levels and corresponding spinor states hybridized from distinct particle and hole wavefunctions are measured. All observations are in quantitative agreement with the hitherto-untested prediction of Dirac equation. Our results make an important step to realize and understand the particle-hole physics of Klein tunneling in bounded Dirac systems, and also shed light on potential applications for manipulating particle-hole hybridized spinor waves.
△ Less
Submitted 29 June, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Enhancing Retinal Vascular Structure Segmentation in Images With a Novel Design Two-Path Interactive Fusion Module Model
Authors:
Rui Yang,
Shunpu Zhang
Abstract:
Precision in identifying and differentiating micro and macro blood vessels in the retina is crucial for the diagnosis of retinal diseases, although it poses a significant challenge. Current autoencoding-based segmentation approaches encounter limitations as they are constrained by the encoder and undergo a reduction in resolution during the encoding stage. The inability to recover lost information…
▽ More
Precision in identifying and differentiating micro and macro blood vessels in the retina is crucial for the diagnosis of retinal diseases, although it poses a significant challenge. Current autoencoding-based segmentation approaches encounter limitations as they are constrained by the encoder and undergo a reduction in resolution during the encoding stage. The inability to recover lost information in the decoding phase further impedes these approaches. Consequently, their capacity to extract the retinal microvascular structure is restricted. To address this issue, we introduce Swin-Res-Net, a specialized module designed to enhance the precision of retinal vessel segmentation. Swin-Res-Net utilizes the Swin transformer which uses shifted windows with displacement for partitioning, to reduce network complexity and accelerate model convergence. Additionally, the model incorporates interactive fusion with a functional module in the Res2Net architecture. The Res2Net leverages multi-scale techniques to enlarge the receptive field of the convolutional kernel, enabling the extraction of additional semantic information from the image. This combination creates a new module that enhances the localization and separation of micro vessels in the retina. To improve the efficiency of processing vascular information, we've added a module to eliminate redundant information between the encoding and decoding steps.
Our proposed architecture produces outstanding results, either meeting or surpassing those of other published models. The AUC reflects significant enhancements, achieving values of 0.9956, 0.9931, and 0.9946 in pixel-wise segmentation of retinal vessels across three widely utilized datasets: CHASE-DB1, DRIVE, and STARE, respectively. Moreover, Swin-Res-Net outperforms alternative architectures, demonstrating superior performance in both IOU and F1 measure metrics.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision
Authors:
Yunyi Zhang,
Ruozhen Yang,
Xueqiang Xu,
Rui Li,
**feng Xiao,
Jiaming Shen,
Jiawei Han
Abstract:
Hierarchical text classification aims to categorize each document into a set of classes in a label taxonomy. Most earlier works focus on fully or semi-supervised methods that require a large amount of human annotated data which is costly and time-consuming to acquire. To alleviate human efforts, in this paper, we work on hierarchical text classification with the minimal amount of supervision: usin…
▽ More
Hierarchical text classification aims to categorize each document into a set of classes in a label taxonomy. Most earlier works focus on fully or semi-supervised methods that require a large amount of human annotated data which is costly and time-consuming to acquire. To alleviate human efforts, in this paper, we work on hierarchical text classification with the minimal amount of supervision: using the sole class name of each node as the only supervision. Recently, large language models (LLM) show competitive performance on various tasks through zero-shot prompting, but this method performs poorly in the hierarchical setting, because it is ineffective to include the large and structured label space in a prompt. On the other hand, previous weakly-supervised hierarchical text classification methods only utilize the raw taxonomy skeleton and ignore the rich information hidden in the text corpus that can serve as additional class-indicative features. To tackle the above challenges, we propose TELEClass, Taxonomy Enrichment and LLM-Enhanced weakly-supervised hierarchical text Classification, which (1) automatically enriches the label taxonomy with class-indicative terms to facilitate classifier training and (2) utilizes LLMs for both data annotation and creation tailored for the hierarchical label space. Experiments show that TELEClass can outperform previous weakly-supervised methods and LLM-based zero-shot prompting methods on two public datasets.
△ Less
Submitted 16 June, 2024; v1 submitted 29 February, 2024;
originally announced March 2024.
-
Aligning Knowledge Graph with Visual Perception for Object-goal Navigation
Authors:
Nuo Xu,
Wen Wang,
Rong Yang,
Mengjie Qin,
Zheyuan Lin,
Wei Song,
Chunlong Zhang,
Jason Gu,
Chao Li
Abstract:
Object-goal navigation is a challenging task that requires guiding an agent to specific objects based on first-person visual observations. The ability of agent to comprehend its surroundings plays a crucial role in achieving successful object finding. However, existing knowledge-graph-based navigators often rely on discrete categorical one-hot vectors and vote counting strategy to construct graph…
▽ More
Object-goal navigation is a challenging task that requires guiding an agent to specific objects based on first-person visual observations. The ability of agent to comprehend its surroundings plays a crucial role in achieving successful object finding. However, existing knowledge-graph-based navigators often rely on discrete categorical one-hot vectors and vote counting strategy to construct graph representation of the scenes, which results in misalignment with visual images. To provide more accurate and coherent scene descriptions and address this misalignment issue, we propose the Aligning Knowledge Graph with Visual Perception (AKGVP) method for object-goal navigation. Technically, our approach introduces continuous modeling of the hierarchical scene architecture and leverages visual-language pre-training to align natural language description with visual perception. The integration of a continuous knowledge graph architecture and multimodal feature alignment empowers the navigator with a remarkable zero-shot navigation capability. We extensively evaluate our method using the AI2-THOR simulator and conduct a series of experiments to demonstrate the effectiveness and efficiency of our navigator. Code available: https://github.com/nuoxu/AKGVP.
△ Less
Submitted 25 April, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.