Search | arXiv e-print repository

When is the Four-phonon Effect in Half-Heusler Materials more Pronounced?

Authors: Yu Wu, Shengnan Dai, Linxuan Ji, Yimin Ding, Jiong Yang, Liujiang Zhou

Abstract: Suppressed three-phonon scattering processes have been considered to be the direct cause of materials exhibiting significant higher-order four-phonon interactions. However, after calculating the phonon-phonon interactions of 128 Half-Heusler materials by high-throughput, we find that the acoustic phonon bandwidth dominates the three-phonon and four-phonon scattering channels and keeps them roughly… ▽ More Suppressed three-phonon scattering processes have been considered to be the direct cause of materials exhibiting significant higher-order four-phonon interactions. However, after calculating the phonon-phonon interactions of 128 Half-Heusler materials by high-throughput, we find that the acoustic phonon bandwidth dominates the three-phonon and four-phonon scattering channels and keeps them roughly in a co-increasing or decreasing behavior. The $aao$ and $aaa$ three-phonon scattering channels in Half-Heusler materials are weakly affected by the acoustic-optical gap and acoustic bunched features respectively only when acoustic phonon bandwidths are close. Finally, we found that Half-Heusler materials with smaller acoustic bandwidths tend to have a more pronounced four-phonon effect, although three-phonon scattering may not be significantly suppressed at this time. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2407.00501 [pdf, other]

Aeroengine performance prediction using a physical-embedded data-driven method

Authors: Tong Mo, Shiran Dai, An Fu, Xiaomeng Zhu, Shuxiao Li

Abstract: Accurate and efficient prediction of aeroengine performance is of paramount importance for engine design, maintenance, and optimization endeavours. However, existing methodologies often struggle to strike an optimal balance among predictive accuracy, computational efficiency, modelling complexity, and data dependency. To address these challenges, we propose a strategy that synergistically combines… ▽ More Accurate and efficient prediction of aeroengine performance is of paramount importance for engine design, maintenance, and optimization endeavours. However, existing methodologies often struggle to strike an optimal balance among predictive accuracy, computational efficiency, modelling complexity, and data dependency. To address these challenges, we propose a strategy that synergistically combines domain knowledge from both the aeroengine and neural network realms to enable real-time prediction of engine performance parameters. Leveraging aeroengine domain knowledge, we judiciously design the network structure and regulate the internal information flow. Concurrently, drawing upon neural network domain expertise, we devise four distinct feature fusion methods and introduce an innovative loss function formulation. To rigorously evaluate the effectiveness and robustness of our proposed strategy, we conduct comprehensive validation across two distinct datasets. The empirical results demonstrate :(1) the evident advantages of our tailored loss function; (2) our model's ability to maintain equal or superior performance with a reduced parameter count; (3) our model's reduced data dependency compared to generalized neural network architectures; (4)Our model is more interpretable than traditional black box machine learning methods. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00308 [pdf]

The role of lattice thermal conductivity suppression by dopants from a holistic perspective

Authors: Shengnan Dai, Shijie Zhang, Ye Sheng, Erting Dong, Sheng Sun, Lili Xi, G. Jeffrey Snyder, **yang Xi, Jiong Yang

Abstract: Dopants play an important role in improving electrical and thermal transport. In the traditional perspective, a dopant suppresses lattice thermal conductivity kL by adding point defect (PD) scattering term to the phonon relaxation time, which has been adopted for decades. In this study, we propose an innovative perspective to solve the kL of defective systems-the holistic approach, i.e., treating… ▽ More Dopants play an important role in improving electrical and thermal transport. In the traditional perspective, a dopant suppresses lattice thermal conductivity kL by adding point defect (PD) scattering term to the phonon relaxation time, which has been adopted for decades. In this study, we propose an innovative perspective to solve the kL of defective systems-the holistic approach, i.e., treating dopant and matrix as a holism. This approach allows us to handle the influences from defects explicitly by the calculations of defective systems, about their changed phonon dispersion, phonon-phonon and electron-phonon interaction, etc, due to the existence of dopants. The kL reduction between defective MxNb1-xFeSb (M=V, Ti) and NbFeSb is used as an example for the holistic approach, and comparable results with experiments are obtained. It is notable that light elemental dopants also induced the avoided-crossing behavior. It can be further rationalized by a one-dimensional atomic chain model. The mass and force constant imbalance generally generates the avoided-crossing phonons, mathematically in a similar way as the coefficients in traditional PD scattering, but along a different direction in kL reduction. Our work provides another perspective for understanding the mechanism of dopants influence in material's thermal transport. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2406.18148 [pdf, other]

Searching anomalies using nonlinear dimensionality reduction techniques

Authors: X. Yang, G. Hobbs, S. -B. Zhang, A. Zic, Lawrence Toomey, Y. Li, J. -S. Wang, S. Dai, X. -F. Wu

Abstract: We have searched for anomalous events using 2,520 hours of archival observations from Murriyang, CSIRO's Parkes radio telescope. These observations were originally undertaken to search for pulsars. We used a machine-learning algorithm based on ResNet and Uniform Manifold Approximation and Projection (UMAP) in order to identify parts of the data stream that potentially contain anomalous signals. Ma… ▽ More We have searched for anomalous events using 2,520 hours of archival observations from Murriyang, CSIRO's Parkes radio telescope. These observations were originally undertaken to search for pulsars. We used a machine-learning algorithm based on ResNet and Uniform Manifold Approximation and Projection (UMAP) in order to identify parts of the data stream that potentially contain anomalous signals. Many of these anomalous events are radio frequency interference, which were subsequently filtered using multibeam information. We detected 202 anomalous events and provide their positions and event times. We discuss the possibility that one of the events comes from radio emission from a white dwarf star.The other events are currently of unknown type. △ Less

Submitted 1 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

Comments: 9 pages, 7 figures

arXiv:2406.17393 [pdf, ps, other]

Timely and Painless Breakups: Off-the-grid Blind Message Recovery and Users' Demixing

Authors: Sajad Daei, Saeed Razavikia, Mikael Skoglund, Gabor Fodor, Carlo Fischione

Abstract: In the near future, the Internet of Things will interconnect billions of devices, forming a vast network where users sporadically transmit short messages through multi-path wireless channels. These channels are characterized by the superposition of a small number of scaled and delayed copies of Dirac spikes. At the receiver, the observed signal is a sum of these convolved signals, and the task is… ▽ More In the near future, the Internet of Things will interconnect billions of devices, forming a vast network where users sporadically transmit short messages through multi-path wireless channels. These channels are characterized by the superposition of a small number of scaled and delayed copies of Dirac spikes. At the receiver, the observed signal is a sum of these convolved signals, and the task is to find the amplitudes, continuous-indexed delays, and transmitted messages from a single signal. This task is inherently ill-posed without additional assumptions on the channel or messages. In this work, we assume the channel exhibits sparsity in the delay domain and that i.i.d. random linear encoding is applied to the messages at the devices. Leveraging these assumptions, we propose a semidefinite programming optimization capable of simultaneously recovering both messages and the delay parameters of the channels from only a single received signal. Our theoretical analysis establishes that the required number of samples at the receiver scales proportionally to the sum-product of sparsity and message length of all users, aligning with the degrees of freedom in the proposed convex optimization framework. Numerical experiments confirm the efficacy of the proposed method in accurately estimating closely-spaced delay parameters and recovering messages. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.15766 [pdf, ps, other]

Continual Learning with Diffusion-based Generative Replay for Industrial Streaming Data

Authors: Jiayi He, Jiao Chen, Qianmiao Liu, Suyan Dai, Jianhua Tang, Dongpo Liu

Abstract: The Industrial Internet of Things (IIoT) integrates interconnected sensors and devices to support industrial applications, but its dynamic environments pose challenges related to data drift. Considering the limited resources and the need to effectively adapt models to new data distributions, this paper introduces a Continual Learning (CL) approach, i.e., Distillation-based Self-Guidance (DSG), to… ▽ More The Industrial Internet of Things (IIoT) integrates interconnected sensors and devices to support industrial applications, but its dynamic environments pose challenges related to data drift. Considering the limited resources and the need to effectively adapt models to new data distributions, this paper introduces a Continual Learning (CL) approach, i.e., Distillation-based Self-Guidance (DSG), to address challenges presented by industrial streaming data via a novel generative replay mechanism. DSG utilizes knowledge distillation to transfer knowledge from the previous diffusion-based generator to the updated one, improving both the stability of the generator and the quality of reproduced data, thereby enhancing the mitigation of catastrophic forgetting. Experimental results on CWRU, DSA, and WISDM datasets demonstrate the effectiveness of DSG. DSG outperforms the state-of-the-art baseline in accuracy, demonstrating improvements ranging from 2.9% to 5.0% on key datasets, showcasing its potential for practical industrial applications. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: 2024 IEEE/CIC International Conference on Communications in China (ICCC)

arXiv:2406.15478 [pdf]

Impact of the Top SiO2 Interlayer Thickness on Memory Window of Si Channel FeFET with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) Gate Structure

Authors: Tao Hu, Xianzhou Shao, Mingkai Bai, Xinpei Jia, Saifei Dai, Xiaoqing Sun, Runhao Han, Jia Yang, Xiaoyu Ke, Fengbin Tian, Shuai Yang, Junshuai Chai, Hao Xu, Xiaolei Wang, Wenwu Wang, Tianchun Ye

Abstract: We study the impact of top SiO2 interlayer thickness on the memory window (MW) of Si channel ferroelectric field-effect transistor (FeFET) with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) gate structure. We find that the MW increases with the increasing thickness of the top SiO2 interlayer, and such an increase exhibits a two-stage linear dependence. The physical origin is the presence of the different… ▽ More We study the impact of top SiO2 interlayer thickness on the memory window (MW) of Si channel ferroelectric field-effect transistor (FeFET) with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) gate structure. We find that the MW increases with the increasing thickness of the top SiO2 interlayer, and such an increase exhibits a two-stage linear dependence. The physical origin is the presence of the different interfacial charges trapped at the top SiO2/Hf0.5Zr0.5O2 interface. Moreover, we investigate the dependence of endurance characteristics on initial MW. We find that the endurance characteristic degrades with increasing the initial MW. By inserting a 3.4 nm SiO2 dielectric interlayer between the gate metal TiN and the ferroelectric Hf0.5Zr0.5O2, we achieve a MW of 6.3 V and retention over 10 years. Our work is helpful in the device design of FeFET. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 6 pages, 12 figures. arXiv admin note: substantial text overlap with arXiv:2404.15825

arXiv:2406.11105 [pdf, other]

Exploiting Diffusion Prior for Out-of-Distribution Detection

Authors: Armando Zhu, Jiabei Liu, Keqin Li, Shuying Dai, Bo Hong, Peng Zhao, Changsong Wei

Abstract: Out-of-distribution (OOD) detection is crucial for deploying robust machine learning models, especially in areas where security is critical. However, traditional OOD detection methods often fail to capture complex data distributions from large scale date. In this paper, we present a novel approach for OOD detection that leverages the generative ability of diffusion models and the powerful feature… ▽ More Out-of-distribution (OOD) detection is crucial for deploying robust machine learning models, especially in areas where security is critical. However, traditional OOD detection methods often fail to capture complex data distributions from large scale date. In this paper, we present a novel approach for OOD detection that leverages the generative ability of diffusion models and the powerful feature extraction capabilities of CLIP. By using these features as conditional inputs to a diffusion model, we can reconstruct the images after encoding them with CLIP. The difference between the original and reconstructed images is used as a signal for OOD identification. The practicality and scalability of our method is increased by the fact that it does not require class-specific labeled ID data, as is the case with many other methods. Extensive experiments on several benchmark datasets demonstrates the robustness and effectiveness of our method, which have significantly improved the detection accuracy. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10744

Technique Report of CVPR 2024 PBDL Challenges

Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Sheng** Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou , et al. (77 additional authors not shown)

Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, and medium properties from images. In recent years, deep learning has shown promising improvements for various vision tasks, and when combined with physics-based vision, these approaches can enhance the robustness and accuracy of vision systems. This technical report summarizes the outcomes of the Physics-Based Vision Meets Deep Learning (PBDL) 2024 challenge, held in CVPR 2024 workshop. The challenge consisted of eight tracks, focusing on Low-Light Enhancement and Detection as well as High Dynamic Range (HDR) Imaging. This report details the objectives, methodologies, and results of each track, highlighting the top-performing solutions and their innovative approaches. △ Less

Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Comments: The author list and contents need to be verified by all authors

arXiv:2406.08875 [pdf, other]

doi 10.1145/3658230

NICER: A New and Improved Consumed Endurance and Recovery Metric to Quantify Muscle Fatigue of Mid-Air Interactions

Authors: Yi Li, Benjamin Tag, Shaozhang Dai, Robert Crowther, Tim Dwyer, Pourang Irani, Barrett Ens

Abstract: Natural gestures are crucial for mid-air interaction, but predicting and managing muscle fatigue is challenging. Existing torque-based models are limited in their ability to model above-shoulder interactions and to account for fatigue recovery. We introduce a new hybrid model, NICER, which combines a torque-based approach with a new term derived from the empirical measurement of muscle contraction… ▽ More Natural gestures are crucial for mid-air interaction, but predicting and managing muscle fatigue is challenging. Existing torque-based models are limited in their ability to model above-shoulder interactions and to account for fatigue recovery. We introduce a new hybrid model, NICER, which combines a torque-based approach with a new term derived from the empirical measurement of muscle contraction and a recovery factor to account for decreasing fatigue during rest. We evaluated NICER in a mid-air selection task using two interaction methods with different degrees of perceived fatigue. Results show that NICER can accurately model above-shoulder interactions as well as reflect fatigue recovery during rest periods. Moreover, both interaction methods show a stronger correlation with subjective fatigue measurement (r = 0.978/0.976) than a previous model, Cumulative Fatigue (r = 0.966/ 0.923), confirming that NICER is a powerful analytical tool to predict fatigue across a variety of gesture-based interactive applications. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08807 [pdf, other]

doi 10.1017/pasa.2024.48

Optimising an Array of Cherenkov Telescopes in Australia for the Detection of TeV Gamma-Ray Transients

Authors: Simon Lee, Sabrina Einecke, Gavin Rowell, Csaba Balazs, Jose A. Bellido, Shi Dai, Miroslav Filipović, Violet M. Harvey, Padric McGee, Peter Marinos, Nicholas Tothill, Martin White

Abstract: As TeV gamma-ray astronomy progresses into the era of the Cherenkov Telescope Array (CTA), instantaneously following up on gamma-ray transients is becoming more important than ever. To this end, a worldwide network of Imaging Atmospheric Cherenkov Telescopes has been proposed. Australia is ideally suited to provide coverage of part of the Southern Hemisphere sky inaccessible to H.E.S.S. in Namibia… ▽ More As TeV gamma-ray astronomy progresses into the era of the Cherenkov Telescope Array (CTA), instantaneously following up on gamma-ray transients is becoming more important than ever. To this end, a worldwide network of Imaging Atmospheric Cherenkov Telescopes has been proposed. Australia is ideally suited to provide coverage of part of the Southern Hemisphere sky inaccessible to H.E.S.S. in Namibia and the upcoming CTA-South in Chile. This study assesses the sources detectable by a small, transient-focused array in Australia based on CTA telescope designs. The TeV emission of extragalactic sources (including the majority of gamma-ray transients) can suffer significant absorption by the extragalactic background light. As such, we explored the improvements possible by implementing stereoscopic and topological triggers, as well as lowered image cleaning thresholds, to access lower energies. We modelled flaring gamma-ray sources based on past measurements from the satellite-based gamma-ray telescope Fermi-LAT. We estimate that an array of four Medium-Sized Telescopes (MSTs) would detect $\sim$24 active galactic nucleus flares >5$σ$ per year, up to a redshift of $z\approx1.5$. Two MSTs achieved $\sim$80-90% of the detections of four MSTs. The modelled Galactic transients were detectable within the observation time of one night, 11 of the 21 modelled gamma-ray bursts were detectable, as were $\sim$10% of unidentified transients. An array of MST-class telescopes would thus be a valuable complementary telescope array for transient TeV gamma-ray astronomy. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 13 pages, 13 figures, 4 tables, accepted for publication in PASA

arXiv:2405.18955 [pdf, other]

RGB-T Object Detection via Group Shuffled Multi-receptive Attention and Multi-modal Supervision

Authors: **zhong Wang, Xuetao Tian, Shun Dai, Tao Zhuo, Haorui Zeng, Hongjuan Liu, Jiaqi Liu, Xiuwei Zhang, Yanning Zhang

Abstract: Multispectral object detection, utilizing both visible (RGB) and thermal infrared (T) modals, has garnered significant attention for its robust performance across diverse weather and lighting conditions. However, effectively exploiting the complementarity between RGB-T modals while maintaining efficiency remains a critical challenge. In this paper, a very simple Group Shuffled Multi-receptive Atte… ▽ More Multispectral object detection, utilizing both visible (RGB) and thermal infrared (T) modals, has garnered significant attention for its robust performance across diverse weather and lighting conditions. However, effectively exploiting the complementarity between RGB-T modals while maintaining efficiency remains a critical challenge. In this paper, a very simple Group Shuffled Multi-receptive Attention (GSMA) module is proposed to extract and combine multi-scale RGB and thermal features. Then, the extracted multi-modal features are directly integrated with a multi-level path aggregation neck, which significantly improves the fusion effect and efficiency. Meanwhile, multi-modal object detection often adopts union annotations for both modals. This kind of supervision is not sufficient and unfair, since objects observed in one modal may not be seen in the other modal. To solve this issue, Multi-modal Supervision (MS) is proposed to sufficiently supervise RGB-T object detection. Comprehensive experiments on two challenging benchmarks, KAIST and DroneVehicle, demonstrate the proposed model achieves the state-of-the-art accuracy while maintaining competitive efficiency. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18666 [pdf]

On-Chip Vectorial Structured Light Manipulation via Inverse Design

Authors: Xiaobin Lin, Maoliang Wei, Kunhao Lei, Zijia Wang, Chi Wang, Hui Ma, Yuting Ye, Qiwei Zhan, Da Li, Shixun Dai, Baile Zhang, Xiaoyong Hu, Lan Li, Er** Li, Hongtao Lin

Abstract: On-chip structured light, with potentially infinite complexity, has emerged as a linchpin in the realm of integrated photonics. However, the realization of arbitrarily tailoring a multitude of light field dimensions in complex media remains a challenge1, Through associating physical light fields and mathematical function spaces by introducing a map** operator, we proposed a data-driven inverse d… ▽ More On-chip structured light, with potentially infinite complexity, has emerged as a linchpin in the realm of integrated photonics. However, the realization of arbitrarily tailoring a multitude of light field dimensions in complex media remains a challenge1, Through associating physical light fields and mathematical function spaces by introducing a map** operator, we proposed a data-driven inverse design method to precisely manipulate between any two structured light fields in the on-chip high-dimensional Hilbert space. To illustrate, light field conversion in on-chip topological photonics was achieved. High-performance topological coupling devices with minimal insertion loss and customizable topological routing devices were designed and realized. Our method provides a new paradigm to enable precise manipulation over the on-chip vectorial structured light and paves the way for the realization of complex photonic functions. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 50 pages, 18 figures

arXiv:2405.17998 [pdf, other]

Source Echo Chamber: Exploring the Escalation of Source Bias in User, Data, and Recommender System Feedback Loop

Authors: Yuqi Zhou, Sunhao Dai, Liang Pang, Gang Wang, Zhenhua Dong, Jun Xu, Ji-Rong Wen

Abstract: Recently, researchers have uncovered that neural retrieval models prefer AI-generated content (AIGC), called source bias. Compared to active search behavior, recommendation represents another important means of information acquisition, where users are more prone to source bias. Furthermore, delving into the recommendation scenario, as AIGC becomes integrated within the feedback loop involving user… ▽ More Recently, researchers have uncovered that neural retrieval models prefer AI-generated content (AIGC), called source bias. Compared to active search behavior, recommendation represents another important means of information acquisition, where users are more prone to source bias. Furthermore, delving into the recommendation scenario, as AIGC becomes integrated within the feedback loop involving users, data, and the recommender system, it progressively contaminates the candidate items, the user interaction history, and ultimately, the data used to train the recommendation models. How and to what extent the source bias affects the neural recommendation models within feedback loop remains unknown. In this study, we extend the investigation of source bias into the realm of recommender systems, specifically examining its impact across different phases of the feedback loop. We conceptualize the progression of AIGC integration into the recommendation content ecosystem in three distinct phases-HGC dominate, HGC-AIGC coexist, and AIGC dominance-each representing past, present, and future states, respectively. Through extensive experiments across three datasets from diverse domains, we demonstrate the prevalence of source bias and reveal a potential digital echo chamber with source bias amplification throughout the feedback loop. This trend risks creating a recommender ecosystem with limited information source, such as AIGC, being disproportionately recommended. To counteract this bias and prevent its escalation in the feedback loop, we introduce a black-box debiasing method that maintains model impartiality towards both HGC and AIGC. Our experimental results validate the effectiveness of the proposed debiasing method, confirming its potential to disrupt the feedback loop. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17935 [pdf, other]

Tool Learning with Large Language Models: A Survey

Authors: Changle Qu, Sunhao Dai, Xiaochi Wei, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Jun Xu, Ji-Rong Wen

Abstract: Recently, tool learning with large language models (LLMs) has emerged as a promising paradigm for augmenting the capabilities of LLMs to tackle highly complex problems. Despite growing attention and rapid advancements in this field, the existing literature remains fragmented and lacks systematic organization, posing barriers to entry for newcomers. This gap motivates us to conduct a comprehensive… ▽ More Recently, tool learning with large language models (LLMs) has emerged as a promising paradigm for augmenting the capabilities of LLMs to tackle highly complex problems. Despite growing attention and rapid advancements in this field, the existing literature remains fragmented and lacks systematic organization, posing barriers to entry for newcomers. This gap motivates us to conduct a comprehensive survey of existing works on tool learning with LLMs. In this survey, we focus on reviewing existing literature from the two primary aspects (1) why tool learning is beneficial and (2) how tool learning is implemented, enabling a comprehensive understanding of tool learning with LLMs. We first explore the "why" by reviewing both the benefits of tool integration and the inherent benefits of the tool learning paradigm from six specific aspects. In terms of "how", we systematically review the literature according to a taxonomy of four key stages in the tool learning workflow: task planning, tool selection, tool calling, and response generation. Additionally, we provide a detailed summary of existing benchmarks and evaluation methods, categorizing them according to their relevance to different stages. Finally, we discuss current challenges and outline potential future directions, aiming to inspire both researchers and industrial developers to further explore this emerging and promising area. We also maintain a GitHub repository to continually keep track of the relevant papers and resources in this rising area at \url{https://github.com/quchangle1/LLM-Tool-Survey}. △ Less

Submitted 30 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17596 [pdf, other]

GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane

Authors: Yansong Qu, Shaohui Dai, Xinyang Li, Jianghang Lin, Liujuan Cao, Shengchuan Zhang, Rongrong Ji

Abstract: 3D open-vocabulary scene understanding, crucial for advancing augmented reality and robotic applications, involves interpreting and locating specific regions within a 3D space as directed by natural language instructions. To this end, we introduce GOI, a framework that integrates semantic features from 2D vision-language foundation models into 3D Gaussian Splatting (3DGS) and identifies 3D Gaussia… ▽ More 3D open-vocabulary scene understanding, crucial for advancing augmented reality and robotic applications, involves interpreting and locating specific regions within a 3D space as directed by natural language instructions. To this end, we introduce GOI, a framework that integrates semantic features from 2D vision-language foundation models into 3D Gaussian Splatting (3DGS) and identifies 3D Gaussians of Interest using an Optimizable Semantic-space Hyperplane. Our approach includes an efficient compression method that utilizes scene priors to condense noisy high-dimensional semantic features into compact low-dimensional vectors, which are subsequently embedded in 3DGS. During the open-vocabulary querying process, we adopt a distinct approach compared to existing methods, which depend on a manually set fixed empirical threshold to select regions based on their semantic feature distance to the query text embedding. This traditional approach often lacks universal accuracy, leading to challenges in precisely identifying specific target areas. Instead, our method treats the feature selection process as a hyperplane division within the feature space, retaining only those features that are highly relevant to the query. We leverage off-the-shelf 2D Referring Expression Segmentation (RES) models to fine-tune the semantic-space hyperplane, enabling a more precise distinction between target regions and others. This fine-tuning substantially improves the accuracy of open-vocabulary queries, ensuring the precise localization of pertinent 3D Gaussians. Extensive experiments demonstrate GOI's superiority over previous state-of-the-art methods. Our project page is available at https://goi-hyperplane.github.io/ . △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: Our project page is available at https://goi-hyperplane.github.io/

arXiv:2405.16550 [pdf, other]

doi 10.1145/3626772.3657936

ReCODE: Modeling Repeat Consumption with Neural ODE

Authors: Sunhao Dai, Changle Qu, Sirui Chen, Xiao Zhang, Jun Xu

Abstract: In real-world recommender systems, such as in the music domain, repeat consumption is a common phenomenon where users frequently listen to a small set of preferred songs or artists repeatedly. The key point of modeling repeat consumption is capturing the temporal patterns between a user's repeated consumption of the items. Existing studies often rely on heuristic assumptions, such as assuming an e… ▽ More In real-world recommender systems, such as in the music domain, repeat consumption is a common phenomenon where users frequently listen to a small set of preferred songs or artists repeatedly. The key point of modeling repeat consumption is capturing the temporal patterns between a user's repeated consumption of the items. Existing studies often rely on heuristic assumptions, such as assuming an exponential distribution for the temporal gaps. However, due to the high complexity of real-world recommender systems, these pre-defined distributions may fail to capture the intricate dynamic user consumption patterns, leading to sub-optimal performance. Drawing inspiration from the flexibility of neural ordinary differential equations (ODE) in capturing the dynamics of complex systems, we propose ReCODE, a novel model-agnostic framework that utilizes neural ODE to model repeat consumption. ReCODE comprises two essential components: a user's static preference prediction module and the modeling of user dynamic repeat intention. By considering both immediate choices and historical consumption patterns, ReCODE offers comprehensive modeling of user preferences in the target context. Moreover, ReCODE seamlessly integrates with various existing recommendation models, including collaborative-based and sequential-based models, making it easily applicable in different scenarios. Experimental results on two real-world datasets consistently demonstrate that ReCODE significantly improves the performance of base models and outperforms other baseline methods. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: Accepted by SIGIR 2024 (Short Paper)

arXiv:2405.16546 [pdf, other]

Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration

Authors: Sunhao Dai, Weihao Liu, Yuqi Zhou, Liang Pang, Rongju Ruan, Gang Wang, Zhenhua Dong, Jun Xu, Ji-Rong Wen

Abstract: The proliferation of Large Language Models (LLMs) has led to an influx of AI-generated content (AIGC) on the internet, transforming the corpus of Information Retrieval (IR) systems from solely human-written to a coexistence with LLM-generated content. The impact of this surge in AIGC on IR systems remains an open question, with the primary challenge being the lack of a dedicated benchmark for rese… ▽ More The proliferation of Large Language Models (LLMs) has led to an influx of AI-generated content (AIGC) on the internet, transforming the corpus of Information Retrieval (IR) systems from solely human-written to a coexistence with LLM-generated content. The impact of this surge in AIGC on IR systems remains an open question, with the primary challenge being the lack of a dedicated benchmark for researchers. In this paper, we introduce Cocktail, a comprehensive benchmark tailored for evaluating IR models in this mixed-sourced data landscape of the LLM era. Cocktail consists of 16 diverse datasets with mixed human-written and LLM-generated corpora across various text retrieval tasks and domains. Additionally, to avoid the potential bias from previously included dataset information in LLMs, we also introduce an up-to-date dataset, named NQ-UTD, with queries derived from recent events. Through conducting over 1,000 experiments to assess state-of-the-art retrieval models against the benchmarked datasets in Cocktail, we uncover a clear trade-off between ranking performance and source bias in neural retrieval models, highlighting the necessity for a balanced approach in designing future IR systems. We hope Cocktail can serve as a foundational resource for IR research in the LLM era, with all data and code publicly available at \url{https://github.com/KID-22/Cocktail}. △ Less

Submitted 2 July, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

Comments: Accepted by Findings of ACL 2024; Datasets Link: https://huggingface.co/IR-Cocktail

arXiv:2405.16089 [pdf, other]

COLT: Towards Completeness-Oriented Tool Retrieval for Large Language Models

Authors: Changle Qu, Sunhao Dai, Xiaochi Wei, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Jun Xu, Ji-Rong Wen

Abstract: Recently, the integration of external tools with Large Language Models (LLMs) has emerged as a promising approach to overcome the inherent constraints of their pre-training data. However, realworld applications often involve a diverse range of tools, making it infeasible to incorporate all tools directly into LLMs due to constraints on input length and response time. Therefore, to fully exploit th… ▽ More Recently, the integration of external tools with Large Language Models (LLMs) has emerged as a promising approach to overcome the inherent constraints of their pre-training data. However, realworld applications often involve a diverse range of tools, making it infeasible to incorporate all tools directly into LLMs due to constraints on input length and response time. Therefore, to fully exploit the potential of tool-augmented LLMs, it is crucial to develop an effective tool retrieval system. Existing tool retrieval methods techniques mainly rely on semantic matching between user queries and tool descriptions, which often results in the selection of redundant tools. As a result, these methods fail to provide a complete set of diverse tools necessary for addressing the multifaceted problems encountered by LLMs. In this paper, we propose a novel modelagnostic COllaborative Learning-based Tool Retrieval approach, COLT, which captures not only the semantic similarities between user queries and tool descriptions but also takes into account the collaborative information of tools. Specifically, we first fine-tune the PLM-based retrieval models to capture the semantic relationships between queries and tools in the semantic learning stage. Subsequently, we construct three bipartite graphs among queries, scenes, and tools and introduce a dual-view graph collaborative learning framework to capture the intricate collaborative relationships among tools during the collaborative learning stage. Extensive experiments on both the open benchmark and the newly introduced ToolLens dataset show that COLT achieves superior performance. Notably, the performance of BERT-mini (11M) with our proposed model framework outperforms BERT-large (340M), which has 30 times more parameters. Additionally, we plan to publicly release the ToolLens dataset to support further research in tool retrieval. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.13303 [pdf]

Giant Real-time Strain-Induced Anisotropy Field Tuning in Suspended Yttrium Iron Garnet Thin Films

Authors: Renyuan Wang, Sudhanshu Tiwari, Yiyang Feng, Sen Dai, Sunil A. Bhave

Abstract: Yttrium Iron Garnet based tunable magnetostatic wave and spin wave devices are poised to revolutionize the fields of Magnonics, Spintronics, Microwave devices, and quantum information science. The magnetic bias required for operating and tuning these devices is traditionally achieved through large power-hungry electromagnets, which significantly restraints the integration scalability, energy effic… ▽ More Yttrium Iron Garnet based tunable magnetostatic wave and spin wave devices are poised to revolutionize the fields of Magnonics, Spintronics, Microwave devices, and quantum information science. The magnetic bias required for operating and tuning these devices is traditionally achieved through large power-hungry electromagnets, which significantly restraints the integration scalability, energy efficiency and individual resonator addressability. While controlling the magnetism of YIG mediated through its magnetostrictive/magnetoelastic interaction would address this constraint and enable novel strain/stress coupled magnetostatic wave (MSW) and spin wave (SW) devices, effective real-time strain-induced magnetism change in YIG remains elusive due to its weak magnetoelastic coupling efficiency and substrate clam** effect. We demonstrate a heterogeneous YIG-on-Si MSW resonator with a suspended thin-film device structure, which allows significant straining of YIG to generate giant magnetism change in YIG. By straining the YIG thin-film in real-time up to 1.06%, we show, for the first time, a 1.837 GHz frequency-strain tuning in MSW/SW resonators, which is equivalent to an effective strain-induced magnetocrystalline anisotropy field of 642 Oe. This is significantly higher than the previous state-of-the-art of 0.27 GHz of strain tuning in YIG. The unprecedented strain tunability of these YIG resonators paves the way for novel energy-efficient integrated on-chip solutions for tunable microwave, photonic, magnonic, and spintronic devices. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.13190 [pdf, other]

Interpretable Spatio-Temporal Embedding for Brain Structural-Effective Network with Ordinary Differential Equation

Authors: Haoteng Tang, Guodong Liu, Siyuan Dai, Kai Ye, Kun Zhao, Wenlu Wang, Carl Yang, Lifang He, Alex Leow, Paul Thompson, Heng Huang, Liang Zhan

Abstract: The MRI-derived brain network serves as a pivotal instrument in elucidating both the structural and functional aspects of the brain, encompassing the ramifications of diseases and developmental processes. However, prevailing methodologies, often focusing on synchronous BOLD signals from functional MRI (fMRI), may not capture directional influences among brain regions and rarely tackle temporal fun… ▽ More The MRI-derived brain network serves as a pivotal instrument in elucidating both the structural and functional aspects of the brain, encompassing the ramifications of diseases and developmental processes. However, prevailing methodologies, often focusing on synchronous BOLD signals from functional MRI (fMRI), may not capture directional influences among brain regions and rarely tackle temporal functional dynamics. In this study, we first construct the brain-effective network via the dynamic causal model. Subsequently, we introduce an interpretable graph learning framework termed Spatio-Temporal Embedding ODE (STE-ODE). This framework incorporates specifically designed directed node embedding layers, aiming at capturing the dynamic interplay between structural and effective networks via an ordinary differential equation (ODE) model, which characterizes spatial-temporal brain dynamics. Our framework is validated on several clinical phenotype prediction tasks using two independent publicly available datasets (HCP and OASIS). The experimental results clearly demonstrate the advantages of our model compared to several state-of-the-art methods. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.08984 [pdf]

doi 10.1063/5.0173562

Charge-Transfer Hyperbolic Polaritons in $α$-MoO$_3$/graphene heterostructures

Authors: J. Shen, M. Chen, V. Korostelev, H. Kim, P. Fathi-Hafshejani, M. Mahjouri-Samani, K. Klyukin, G-H. Lee, S. Dai

Abstract: Charge transfer is a fundamental interface process that can be harnessed for light detection, photovoltaics, and photosynthesis. Recently, charge transfer was exploited in nanophotonics to alter plasmon polaritons by involving additional non-polaritonic materials to activate the charge transfer. Yet, direct charge transfer between polaritonic materials hasn't been demonstrated. We report the direc… ▽ More Charge transfer is a fundamental interface process that can be harnessed for light detection, photovoltaics, and photosynthesis. Recently, charge transfer was exploited in nanophotonics to alter plasmon polaritons by involving additional non-polaritonic materials to activate the charge transfer. Yet, direct charge transfer between polaritonic materials hasn't been demonstrated. We report the direct charge transfer in pure polaritonic van der Waals (vdW) heterostructures of $α$-MoO$_3$/graphene. We extracted the Fermi energy of 0.6 eV for graphene by infrared nano-imaging of charge transfer hyperbolic polaritons in the vdW heterostructure. This unusually high Fermi energy is attributed to the charge transfer between graphene and $α$-MoO$_3$. Moreover, we have observed charge transfer hyperbolic polaritons in multiple energy-momentum dispersion branches with a wavelength elongation of up to 150%. With support from the DFT calculation, we find that the charge transfer between graphene and $α$-MoO$_3$, absent in mechanically assembled vdW heterostructures, is attributed to the relatively pristine heterointerface preserved in the epitaxially grown vdW heterostructure. The direct charge transfer and charge transfer hyperbolic polaritons demonstrated in our work hold great promise for develo** nano-optical circuits, computational devices, communication systems, and light and energy manipulation devices. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Journal ref: Applied Physics Reviews 11, 021409 (2024)

arXiv:2405.08709 [pdf, other]

Multi-Task Private Semantic Communication

Authors: Amirreza Zamani, Sajad Daei, Tobias J. Oechtering, Mikael Skoglund

Abstract: We study a multi-task private semantic communication problem, in which an encoder has access to an information source arbitrarily correlated with some latent private data. A user has $L$ tasks with priorities. The encoder designs a message to be revealed which is called the semantic of the information source. Due to the privacy constraints the semantic can not be disclosed directly and the encoder… ▽ More We study a multi-task private semantic communication problem, in which an encoder has access to an information source arbitrarily correlated with some latent private data. A user has $L$ tasks with priorities. The encoder designs a message to be revealed which is called the semantic of the information source. Due to the privacy constraints the semantic can not be disclosed directly and the encoder adds noise to produce disclosed data. The goal is to design the disclosed data that maximizes the weighted sum of the utilities achieved by the user while satisfying a privacy constraint on the private data. In this work, we first consider a single-task scenario and design the added noise utilizing various methods including the extended versions of the Functional Representation Lemma, Strong Functional Representation Lemma, and separation technique. We then study the multi-task scenario and derive a simple design of the source semantics. We show that in the multi-task scenario the main problem can be divided into multiple parallel single-task problems. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.07906 [pdf, other]

Improved Downlink Channel Estimation in Time-Varying FDD Massive MIMO Systems

Authors: Sajad Daei, Mikael Skoglund, Gabor Fodor

Abstract: In this work, we address the challenge of accurately obtaining channel state information at the transmitter (CSIT) for frequency division duplexing (FDD) multiple input multiple output systems. Although CSIT is vital for maximizing spatial multiplexing gains, traditional CSIT estimation methods often suffer from impracticality due to the substantial training and feedback overhead they require. To… ▽ More In this work, we address the challenge of accurately obtaining channel state information at the transmitter (CSIT) for frequency division duplexing (FDD) multiple input multiple output systems. Although CSIT is vital for maximizing spatial multiplexing gains, traditional CSIT estimation methods often suffer from impracticality due to the substantial training and feedback overhead they require. To address this challenge, we leverage two sources of prior information simultaneously: the presence of limited local scatterers at the base station (BS) and the time-varying characteristics of the channel. The former results in a redundant angular sparsity of users' channels exceeding the spatial dimension (i.e., the number of BS antennas), while the latter provides a prior non-uniform distribution in the angular domain. We propose a weighted optimization framework that simultaneously reflects both of these features. The optimal weights are then obtained by minimizing the expected recovery error of the optimization problem. This establishes an analytical closed-form relationship between the optimal weights and the angular domain characteristics. Numerical experiments verify the effectiveness of our proposed approach in reducing the recovery error and consequently resulting in decreased training and feedback overhead. △ Less

Submitted 24 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.07895 [pdf, other]

Optimal Transmitter Design and Pilot Spacing in MIMO Non-Stationary Aging Channels

Authors: Sajad Daei, Gabor Fodor, Mikael Skoglund

Abstract: This work considers an uplink wireless communication system where multiple users with multiple antennas transmit data frames over dynamic channels. Previous studies have shown that multiple transmit and receive antennas can substantially enhance the sum-capacity of all users when the channel is known at the transmitter and in the case of uncorrelated transmit and receive antennas. However, spatial… ▽ More This work considers an uplink wireless communication system where multiple users with multiple antennas transmit data frames over dynamic channels. Previous studies have shown that multiple transmit and receive antennas can substantially enhance the sum-capacity of all users when the channel is known at the transmitter and in the case of uncorrelated transmit and receive antennas. However, spatial correlations stemming from close proximity of transmit antennas and channel variation between pilot and data time slots, known as channel aging, can substantially degrade the transmission rate if they are not properly into account. In this work, we provide an analytical framework to concurrently exploit both of these features. Specifically, we first propose a beamforming framework to capture spatial correlations. Then, based on random matrix theory tools, we introduce a deterministic expression that approximates the average sum-capacity of all users. Subsequently, we obtain the optimal values of pilot spacing and beamforming vectors upon maximizing this expression. Simulation results show the impacts of path loss, velocity of mobile users and Rician factor on the resulting sum-capacity and underscore the efficacy of our methodology compared to prior works. △ Less

Submitted 24 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.07890 [pdf, other]

Subspace-Informed Matrix Completion

Authors: Hamideh. Sadat Fazael Ardakani, Sajad Daei, Arash Amini, Mikael Skoglund, Gabor Fodor

Abstract: In this work, we consider the matrix completion problem, where the objective is to reconstruct a low-rank matrix from a few observed entries. A commonly employed approach involves nuclear norm minimization. For this method to succeed, the number of observed entries needs to scale at least proportional to both the rank of the ground-truth matrix and the coherence parameter. While the only prior inf… ▽ More In this work, we consider the matrix completion problem, where the objective is to reconstruct a low-rank matrix from a few observed entries. A commonly employed approach involves nuclear norm minimization. For this method to succeed, the number of observed entries needs to scale at least proportional to both the rank of the ground-truth matrix and the coherence parameter. While the only prior information is oftentimes the low-rank nature of the ground-truth matrix, in various real-world scenarios, additional knowledge about the ground-truth low-rank matrix is available. For instance, in collaborative filtering, Netflix problem, and dynamic channel estimation in wireless communications, we have partial or full knowledge about the signal subspace in advance. Specifically, we are aware of some subspaces that form multiple angles with the column and row spaces of the ground-truth matrix. Leveraging this valuable information has the potential to significantly reduce the required number of observations. To this end, we introduce a multi-weight nuclear norm optimization problem that concurrently promotes the low-rank property as well the information about the available subspaces. The proposed weights are tailored to penalize each angle corresponding to each basis of the prior subspace independently. We further propose an optimal weight selection strategy by minimizing the coherence parameter of the ground-truth matrix, which is equivalent to minimizing the required number of observations. Simulation results validate the advantages of incorporating multiple weights in the completion procedure. Specifically, our proposed multi-weight optimization problem demonstrates a substantial reduction in the required number of observations compared to the state-of-the-art methods. △ Less

Submitted 24 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:2111.00235

arXiv:2405.07882 [pdf, other]

Exploiting Spatial and Temporal Correlations in Massive MIMO Systems Over Non-Stationary Aging Channels

Authors: Sajad Daei, Gabor Fodor, Mikael Skoglund

Abstract: This work investigates a multi-user, multi-antenna uplink wireless system, where multiple users transmit signals to a base station. Previous research has explored the potential for linear growth in spectral efficiency by employing multiple transmit and receive antennas. This gain depends on the quality of channel state information and uncorrelated antennas. However, spatial correlations, arising f… ▽ More This work investigates a multi-user, multi-antenna uplink wireless system, where multiple users transmit signals to a base station. Previous research has explored the potential for linear growth in spectral efficiency by employing multiple transmit and receive antennas. This gain depends on the quality of channel state information and uncorrelated antennas. However, spatial correlations, arising from closely-spaced antennas, and channel aging effects, stemming from the difference between the channel at pilot and data time instances, can substantially counteract these benefits and degrade the transmission rate, especially in non-stationary environments. To address these challenges, this work introduces a real-time beamforming framework to compensate for the spatial correlation effect. A channel estimation scheme is then developed, leveraging temporal channel correlations and considering mobile device velocity and antenna spacing. Subsequently, an expression approximating the average spectral efficiency is obtained, dependent on pilot spacing, pilot and data powers, and beamforming vectors. By maximizing this expression, optimal parameters are identified. Numerical results reveal the effectiveness of the proposed approach compared to prior works. Moreover, optimal pilot spacing remains unaffected by interference components such as path loss and the velocity of interference users. The impact of interference components also diminishes with an increasing number of transmit antennas. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:2401.13368 by other authors

arXiv:2405.06985 [pdf, other]

RoTHP: Rotary Position Embedding-based Transformer Hawkes Process

Authors: Anningzhe Gao, Shan Dai

Abstract: Temporal Point Processes (TPPs), especially Hawkes Process are commonly used for modeling asynchronous event sequences data such as financial transactions and user behaviors in social networks. Due to the strong fitting ability of neural networks, various neural Temporal Point Processes are proposed, among which the Neural Hawkes Processes based on self-attention such as Transformer Hawkes Process… ▽ More Temporal Point Processes (TPPs), especially Hawkes Process are commonly used for modeling asynchronous event sequences data such as financial transactions and user behaviors in social networks. Due to the strong fitting ability of neural networks, various neural Temporal Point Processes are proposed, among which the Neural Hawkes Processes based on self-attention such as Transformer Hawkes Process (THP) achieve distinct performance improvement. Although the THP has gained increasing studies, it still suffers from the {sequence prediction issue}, i.e., training on history sequences and inferencing about the future, which is a prevalent paradigm in realistic sequence analysis tasks. What's more, conventional THP and its variants simply adopt initial sinusoid embedding in transformers, which shows performance sensitivity to temporal change or noise in sequence data analysis by our empirical study. To deal with the problems, we propose a new Rotary Position Embedding-based THP (RoTHP) architecture in this paper. Notably, we show the translation invariance property and {sequence prediction flexibility} of our RoTHP induced by the {relative time embeddings} when coupled with Hawkes process theoretically. Furthermore, we demonstrate empirically that our RoTHP can be better generalized in sequence data scenarios with timestamp translations and in sequence prediction tasks. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2405.06984 [pdf, ps, other]

A Complete 16 $μ$m selected Galaxy Sample at $z \sim 1$. II: Morphological Analysis

Authors: Piaoran Liang, Y. Sophia Dai, Jia-Sheng Huang, Cheng Cheng, Shi Yaru

Abstract: We present morphological analysis of the 16$μ$m flux-density-limited galaxy sample at 0.8$<z<$1.3 from arXiv:2103.04585. At the targeted redshift, the 16$μ$m emission corresponds to the Polycyclic aromatic hydrocarbon (PAH) feature from intense star formation, or dust heated by AGN (Active galactic nuclei). Our sample of 479 galaxies are dominated by Luminous Infrared Galaxies (LIRGs, 67\%) in thr… ▽ More We present morphological analysis of the 16$μ$m flux-density-limited galaxy sample at 0.8$<z<$1.3 from arXiv:2103.04585. At the targeted redshift, the 16$μ$m emission corresponds to the Polycyclic aromatic hydrocarbon (PAH) feature from intense star formation, or dust heated by AGN (Active galactic nuclei). Our sample of 479 galaxies are dominated by Luminous Infrared Galaxies (LIRGs, 67\%) in three CANDLES fields (EGS, GOODS-N, and GOODS-S), and are further divided into AGN dominated, star-forming dominated, composite, and blue compact galaxies by their spectral energy distribution (SED) types. The majority of our sample (71\%) have disky morphologies, with the few AGN dominated galaxies being more bulge-dominanted than the star-forming dominated and composite galaxies. The distribution of our sample on the Gini vs. M$_{\text{20}}$ plane is consistent with previous studies, where the Sérsic index $n$ shows an increasing trend towards the smaller M$_{\text{20}}$ and higher Gini region below the dividing line for mergers. The subsample of ULIRGs follow a steep size-mass relation that is closer to the early-type galaxies. In addition, as the 4.5 $μ$m luminosity excess ($L_{4.5}^{Exc}$, proxy for AGN strength) increases, our sample appear to be more bulge-dominated (i.e. higher $n$). Based on the sSFR and compactness ($log_{10}Σ_{1.5}, Σ_{1.5}=M_*/R_e^{1.5}$) diagram, the majority of our LIRG-dominated galaxy sample follow a secular evolution track, and their distribution can be explained without involving any merging activities. Out of the 16 ULIRGs in our sample, six are compact with strong AGN contributions, likely evolving along the fast-track from more violent activities. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: 21 pages, 8 figures, 3 tables

arXiv:2405.01349 [pdf, other]

Position Paper: Beyond Robustness Against Single Attack Types

Authors: Sihui Dai, Chong Xiang, Tong Wu, Prateek Mittal

Abstract: Current research on defending against adversarial examples focuses primarily on achieving robustness against a single attack type such as $\ell_2$ or $\ell_{\infty}$-bounded attacks. However, the space of possible perturbations is much larger and currently cannot be modeled by a single attack type. The discrepancy between the focus of current defenses and the space of attacks of interest calls to… ▽ More Current research on defending against adversarial examples focuses primarily on achieving robustness against a single attack type such as $\ell_2$ or $\ell_{\infty}$-bounded attacks. However, the space of possible perturbations is much larger and currently cannot be modeled by a single attack type. The discrepancy between the focus of current defenses and the space of attacks of interest calls to question the practicality of existing defenses and the reliability of their evaluation. In this position paper, we argue that the research community should look beyond single attack robustness, and we draw attention to three potential directions involving robustness against multiple attacks: simultaneous multiattack robustness, unforeseen attack robustness, and a newly defined problem setting which we call continual adaptive robustness. We provide a unified framework which rigorously defines these problem settings, synthesize existing research in these fields, and outline open directions. We hope that our position paper inspires more research in simultaneous multiattack, unforeseen attack, and continual adaptive robustness. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.15825 [pdf]

Impact of Top SiO2 interlayer Thickness on Memory Window of Si Channel FeFET with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) Gate Structure

Authors: Tao Hu, Xianzhou Shao, Mingkai Bai, Xinpei Jia, Saifei Dai, Xiaoqing Sun, Runhao Han, Jia Yang, Xiaoyu Ke, Fengbin Tian, Shuai Yang, Junshuai Chai, Hao Xu, Xiaolei Wang, Wenwu Wang, Tianchun Ye

Abstract: We study the impact of top SiO2 interlayer thickness on memory window of Si channel FeFET with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) gate structure. The memory window increases with thicker top SiO2. We realize the memory window of 6.3 V for 3.4 nm top SiO2. Moreover, we find that the endurance characteristic degrades with increasing the initial memory window. We study the impact of top SiO2 interlayer thickness on memory window of Si channel FeFET with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) gate structure. The memory window increases with thicker top SiO2. We realize the memory window of 6.3 V for 3.4 nm top SiO2. Moreover, we find that the endurance characteristic degrades with increasing the initial memory window. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 4 page 7 figures

arXiv:2404.11934 [pdf]

doi 10.1103/PhysRevB.109.L161102

Quantum simulation of honeycomb lattice model by high-order moiré pattern

Authors: Qiang Wan, Chunlong Wu, Xun-Jiang Luo, Shenghao Dai, Cao Peng, Renzhe Li, Shangkun Mo, Keming Zhao, Wen-Xuan Qiu, Hao Zhong, Yiwei Li, Chendong Zhang, Fengcheng Wu, Nan Xu

Abstract: Moiré superlattices have become an emergent solid-state platform for simulating quantum lattice models. However, in single moiré device, Hamiltonians parameters like lattice constant, hop** and interaction terms can hardly be manipulated, limiting the controllability and accessibility of moire quantum simulator. Here, by combining angle-resolved photoemission spectroscopy and theoretical analysi… ▽ More Moiré superlattices have become an emergent solid-state platform for simulating quantum lattice models. However, in single moiré device, Hamiltonians parameters like lattice constant, hop** and interaction terms can hardly be manipulated, limiting the controllability and accessibility of moire quantum simulator. Here, by combining angle-resolved photoemission spectroscopy and theoretical analysis, we demonstrate that high-order moiré patterns in graphene-monolayered xenon/krypton heterostructures can simulate honeycomb model in mesoscale, with in-situ tunable Hamiltonians parameters. The length scale of simulated lattice constant can be tuned by annealing processes, which in-situ adjusts intervalley interaction and hop** parameters in the simulated honeycomb lattice. The sign of the lattice constant can be switched by choosing xenon or krypton monolayer deposited on graphene, which controls sublattice degree of freedom and valley arrangment of Dirac fermions. Our work establishes a novel path for experimentally simulating the honeycomb model with tunable parameters by high-order moiré patterns. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 19 pages, 5 figure

Journal ref: Phy. Rev. B 109, L161102 (2024)

arXiv:2404.11457 [pdf, other]

Unifying Bias and Unfairness in Information Retrieval: A Survey of Challenges and Opportunities with Large Language Models

Authors: Sunhao Dai, Chen Xu, Shicheng Xu, Liang Pang, Zhenhua Dong, Jun Xu

Abstract: With the rapid advancement of large language models (LLMs), information retrieval (IR) systems, such as search engines and recommender systems, have undergone a significant paradigm shift. This evolution, while heralding new opportunities, introduces emerging challenges, particularly in terms of biases and unfairness, which may threaten the information ecosystem. In this paper, we present a compre… ▽ More With the rapid advancement of large language models (LLMs), information retrieval (IR) systems, such as search engines and recommender systems, have undergone a significant paradigm shift. This evolution, while heralding new opportunities, introduces emerging challenges, particularly in terms of biases and unfairness, which may threaten the information ecosystem. In this paper, we present a comprehensive survey of existing works on emerging and pressing bias and unfairness issues in IR systems when the integration of LLMs. We first unify bias and unfairness issues as distribution mismatch problems, providing a groundwork for categorizing various mitigation strategies through distribution alignment. Subsequently, we systematically delve into the specific bias and unfairness issues arising from three critical stages of LLMs integration into IR systems: data collection, model development, and result evaluation. In doing so, we meticulously review and analyze recent literature, focusing on the definitions, characteristics, and corresponding mitigation strategies associated with these issues. Finally, we identify and highlight some open problems and challenges for future work, aiming to inspire researchers and stakeholders in the IR field and beyond to better understand and mitigate bias and unfairness issues of IR in this LLM era. We also consistently maintain a GitHub repository for the relevant papers and resources in this rising direction at https://github.com/KID-22/LLM-IR-Bias-Fairness-Survey. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.11447 [pdf]

doi 10.5281/zenodo.10864022

Research on emotionally intelligent dialogue generation based on automatic dialogue system

Authors: ** Wang, **Fei Wang, Shuying Dai, Jiqiang Yu, Keqin Li

Abstract: Automated dialogue systems are important applications of artificial intelligence, and traditional systems struggle to understand user emotions and provide empathetic feedback. This study integrates emotional intelligence technology into automated dialogue systems and creates a dialogue generation model with emotional intelligence through deep learning and natural language processing techniques. Th… ▽ More Automated dialogue systems are important applications of artificial intelligence, and traditional systems struggle to understand user emotions and provide empathetic feedback. This study integrates emotional intelligence technology into automated dialogue systems and creates a dialogue generation model with emotional intelligence through deep learning and natural language processing techniques. The model can detect and understand a wide range of emotions and specific pain signals in real time, enabling the system to provide empathetic interaction. By integrating the results of the study "Can artificial intelligence detect pain and express pain empathy?", the model's ability to understand the subtle elements of pain empathy has been enhanced, setting higher standards for emotional intelligence dialogue systems. The project aims to provide theoretical understanding and practical suggestions to integrate advanced emotional intelligence capabilities into dialogue systems, thereby improving user experience and interaction quality. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.09528 [pdf, other]

Overfitting Reduction in Convex Regression

Authors: Zhiqiang Liao, Sheng Dai, Eunji Lim, Timo Kuosmanen

Abstract: Convex regression is a method for estimating an unknown function $f_0$ from a data set of $n$ noisy observations when $f_0$ is known to be convex. This method has played an important role in operations research, economics, machine learning, and many other areas. It has been empirically observed that the convex regression estimator produces inconsistent estimates of $f_0$ and extremely large subgra… ▽ More Convex regression is a method for estimating an unknown function $f_0$ from a data set of $n$ noisy observations when $f_0$ is known to be convex. This method has played an important role in operations research, economics, machine learning, and many other areas. It has been empirically observed that the convex regression estimator produces inconsistent estimates of $f_0$ and extremely large subgradients near the boundary of the domain of $f_0$ as $n$ increases. In this paper, we provide theoretical evidence of this overfitting behaviour. We also prove that the penalised convex regression estimator, one of the variants of the convex regression estimator, exhibits overfitting behaviour. To eliminate this behaviour, we propose two new estimators by placing a bound on the subgradients of the estimated function. We further show that our proposed estimators do not exhibit the overfitting behaviour by proving that (a) they converge to $f_0$ and (b) their subgradients converge to the gradient of $f_0$, both uniformly over the domain of $f_0$ with probability one as $n \rightarrow \infty$. We apply the proposed methods to compute the cost frontier function for Finnish electricity distribution firms and confirm their superior performance in predictive power over some existing methods. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.09098 [pdf, other]

A Millisecond Pulsar Binary Embedded in a Galactic Center Radio Filament

Authors: Marcus E. Lower, Shi Dai, Simon Johnston, Ewan D. Barr

Abstract: The Galactic Center is host to a population of extraordinary radio filaments, thin linear structures that trace out magnetic field lines running perpendicular to the Galactic plane. Using Murriyang, the 64 m Parkes radio telescope, we conducted a search for pulsars centered on the position of a compact source in the filament G359.0$-$0.2. We discovered a millisecond pulsar (MSP), PSR J1744$-$2946,… ▽ More The Galactic Center is host to a population of extraordinary radio filaments, thin linear structures that trace out magnetic field lines running perpendicular to the Galactic plane. Using Murriyang, the 64 m Parkes radio telescope, we conducted a search for pulsars centered on the position of a compact source in the filament G359.0$-$0.2. We discovered a millisecond pulsar (MSP), PSR J1744$-$2946, with a period $P = 8.4$ ms, that is bound in a 4.8 hr circular orbit around a $M_{\rm c} > 0.05\,M_{\odot}$ companion. The pulsar dispersion measure of $673.7 \pm 0.1$ pc cm$^{-3}$ and Faraday rotation measure of $3011 \pm 3$ rad m$^{-2}$ are the largest of any known MSP. Its radio pulses are moderately scattered due to multi-path propagation through the interstellar medium, with a scattering timescale of $0.87 \pm 0.08$ ms at 2.6 GHz. Using MeerKAT, we localized the pulsar to a point source embedded in a low-luminosity radio filament, the "Sunfish", that is unrelated to G359.0$-$0.2. Our discovery of the first MSP within 1$^{\circ}$ of the Galactic Center hints at a large population of these objects detectable via high frequency surveys. The association with a filament points to pulsars as the energy source responsible for illuminating the Galactic Center radio filaments. △ Less

Submitted 7 May, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

Comments: 6 pages, 4 figures, 1 table. Accepted for publication in ApJ Letters

arXiv:2404.06764 [pdf]

A mid-infrared Brillouin laser using ultra-high-Q on-chip resonators

Authors: Kiyoung Ko, Daewon Suk, Dohyeong Kim, Soobong Park, Betul Sen, Dae-Gon Kim, Yingying Wang, Shixun Dai, Xunsi Wang, Rong** Wang, Byung Jae Chun, Kwang-Hoon Ko, Peter T. Rakich, Duk-Yong Choi, Hansuek Lee

Abstract: Ultra-high-Q optical resonators have facilitated recent advancements in on-chip photonics by effectively harnessing nonlinear phenomena providing useful functionalities. While these breakthroughs, primarily focused on the near-infrared region, have extended interest to longer wavelengths holding importance for monitoring and manipulating molecules, the absence of ultra-high-Q resonators in this re… ▽ More Ultra-high-Q optical resonators have facilitated recent advancements in on-chip photonics by effectively harnessing nonlinear phenomena providing useful functionalities. While these breakthroughs, primarily focused on the near-infrared region, have extended interest to longer wavelengths holding importance for monitoring and manipulating molecules, the absence of ultra-high-Q resonators in this region remains a significant challenge. Here, we have developed on-chip microresonators with a remarkable Q-factor of 38 million, surpassing previous mid-infrared records by over 30 times. Employing innovative fabrication techniques, including the spontaneous formation of light-guiding geometries during material deposition, resonators with internal multilayer structures have been seamlessly created and passivated with chalcogenide glasses within a single chamber. Major loss factors, especially airborne-chemical absorption, were thoroughly investigated and mitigated by extensive optimization of resonator geometries and fabrication procedures. This allowed us to access the fundamental loss performance offered by doubly purified chalcogenide glass sources, as demonstrated in their fiber form. Exploiting this ultra-high-Q resonator, we successfully demonstrated Brillouin lasing on a chip for the first time in the mid-infrared, with a threshold power of 91.9 μW and a theoretical Schawlow-Townes linewidth of 83.45 Hz, far surpassing carrier phase noise. Our results showcase the effective integration of cavity-enhanced optical nonlinearities into on-chip mid-infrared photonics. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 10 pages, 5 figures in main script, and 1 figure in methods

arXiv:2404.04762 [pdf, other]

WFC3 Infrared Spectroscopic Parallel (WISP) Survey: Photometric and Emission Line Data Release

Authors: A. J. Battisti, M. B. Bagley, M. Rafelski, I. Baronchelli, Y. S. Dai, A. L. Henry, H. Atek, J. Colbert, M. A. Malkan, P. J. McCarthy, C. Scarlata, B. Siana, H. I. Teplitz, A. Alavi, K. Boyett, A. J. Bunker, J. P. Gardner, N. P. Hathi, D. Masters, V. Mehta, M. Rutkowski, K. Shahinyan, B. Sunnquist, X. Wang

Abstract: We present reduced images and catalogues of photometric and emission line data ($\sim$230,000 and $\sim$8,000 sources, respectively) for the WFC3 Infrared Spectroscopic Parallel (WISP) Survey. These data are made publicly available on the Mikulski Archive for Space Telescopes (MAST) and include reduced images from various facilities: ground-based $ugri$, HST WFC3, and Spitzer IRAC (Infrared Array… ▽ More We present reduced images and catalogues of photometric and emission line data ($\sim$230,000 and $\sim$8,000 sources, respectively) for the WFC3 Infrared Spectroscopic Parallel (WISP) Survey. These data are made publicly available on the Mikulski Archive for Space Telescopes (MAST) and include reduced images from various facilities: ground-based $ugri$, HST WFC3, and Spitzer IRAC (Infrared Array Camera). Coverage in at least one additional filter beyond the WFC3/IR data are available for roughly half of the fields (227 out of 483), with $\sim$20% (86) having coverage in six or more filters from $u$-band to IRAC 3.6$μ$m (0.35-3.6$μ$m). For the lower spatial resolution (and shallower) ground-based and IRAC data, we perform PSF-matched, prior-based, deconfusion photometry (i.e., forced-photometry) using the TPHOT software to optimally extract measurements or upper limits. We present the methodology and software used for the WISP emission line detection and visual inspection. The former adopts a continuous wavelet transformation that significantly reduces the number of spurious sources as candidates before the visual inspection stage. We combine both WISP catalogues and perform SED fitting on galaxies with reliable spectroscopic redshifts and multi-band photometry to measure their stellar masses. We stack WISP spectra as functions of stellar mass and redshift and measure average emission line fluxes and ratios. We find that WISP emission line sources are typically `normal' star-forming galaxies based on the Mass-Excitation diagram ([OIII]/H$β$ vs. $M_\star$; $0.74<z_\mathrm{grism}<2.31$), the galaxy main sequence (SFR vs. $M_\star$; $0.30<z_\mathrm{grism}<1.45$), $S_{32}$ ratio vs. $M_\star$ ($0.30<z_\mathrm{grism}<0.73$), and $O_{32}$ and $R_{23}$ ratios vs. $M_\star$ ($1.27<z_\mathrm{grism}<1.45$). △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: 36 pages, 21 figures, 17 tables. Accepted for publication in MNRAS. The WISP Photometric and Emission Line catalogues and reduced images are in the process of being added as HLSPs to the WISP MAST website (https://archive.stsci.edu/prepds/wisp/). Please email the first-author (provided in paper) to request access to files prior to the MAST release

arXiv:2404.00462 [pdf, other]

Zero-shot Safety Prediction for Autonomous Robots with Foundation World Models

Authors: Zhenjiang Mao, Siqi Dai, Yuang Geng, Ivan Ruchkin

Abstract: A world model creates a surrogate world to train a controller and predict safety violations by learning the internal dynamic model of systems. However, the existing world models rely solely on statistical learning of how observations change in response to actions, lacking precise quantification of how accurate the surrogate dynamics are, which poses a significant challenge in safety-critical syste… ▽ More A world model creates a surrogate world to train a controller and predict safety violations by learning the internal dynamic model of systems. However, the existing world models rely solely on statistical learning of how observations change in response to actions, lacking precise quantification of how accurate the surrogate dynamics are, which poses a significant challenge in safety-critical systems. To address this challenge, we propose foundation world models that embed observations into meaningful and causally latent representations. This enables the surrogate dynamics to directly predict causal future states by leveraging a training-free large language model. In two common benchmarks, this novel model outperforms standard world models in the safety prediction task and has a performance comparable to supervised learning despite not using any data. We evaluate its performance with a more specialized and system-relevant metric by comparing estimated states instead of aggregating observation-wide error. △ Less

Submitted 2 May, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

Comments: Presented at the Back to the Future-Robot Learning Going Probabilistic Workshop, co-located with ICRA 2024. https://openreview.net/forum?id=gHhBNIq9Cs

arXiv:2404.00021 [pdf, other]

Evaluatology: The Science and Engineering of Evaluation

Authors: Jianfeng Zhan, Lei Wang, Wanling Gao, Hongxiao Li, Chenxi Wang, Yunyou Huang, Yatao Li, Zhengxin Yang, Guoxin Kang, Chunjie Luo, Hainan Ye, Shaopeng Dai, Zhifei Zhang

Abstract: Evaluation is a crucial aspect of human existence and plays a vital role in various fields. However, it is often approached in an empirical and ad-hoc manner, lacking consensus on universal concepts, terminologies, theories, and methodologies. This lack of agreement has significant repercussions. This article aims to formally introduce the discipline of evaluatology, which encompasses the science… ▽ More Evaluation is a crucial aspect of human existence and plays a vital role in various fields. However, it is often approached in an empirical and ad-hoc manner, lacking consensus on universal concepts, terminologies, theories, and methodologies. This lack of agreement has significant repercussions. This article aims to formally introduce the discipline of evaluatology, which encompasses the science and engineering of evaluation. We propose a universal framework for evaluation, encompassing concepts, terminologies, theories, and methodologies that can be applied across various disciplines. Our research reveals that the essence of evaluation lies in conducting experiments that intentionally apply a well-defined evaluation condition to diverse subjects and infer the impact of different subjects by measuring and/or testing. Derived from the essence of evaluation, we propose five axioms focusing on key aspects of evaluation outcomes as the foundational evaluation theory. These axioms serve as the bedrock upon which we build universal evaluation theories and methodologies. When evaluating a single subject, it is crucial to create evaluation conditions with different levels of equivalency. By applying these conditions to diverse subjects, we can establish reference evaluation models. These models allow us to alter a single independent variable at a time while kee** all other variables as controls. When evaluating complex scenarios, the key lies in establishing a series of evaluation models that maintain transitivity. Building upon the science of evaluation, we propose a formal definition of a benchmark as a simplified and sampled evaluation condition that guarantees different levels of equivalency. This concept serves as the cornerstone for a universal benchmark-based engineering approach to evaluation across various disciplines, which we refer to as benchmarkology. △ Less

Submitted 19 March, 2024; originally announced April 2024.

Comments: 29 pages, 16 figures, and 2 tables

arXiv:2403.19212 [pdf, ps, other]

Close Major-merger Pairs at $z=0$: Star-forming Galaxies with Pseudobulges

Authors: Chuan He, Cong Kevin Xu, Ute Lisenfeld, Y Sophia Dai, Taotao Fang, Jia-Sheng Huang, Wei Wang, Qingzheng Yu

Abstract: We present a study of star-forming galaxies (SFGs) with pseudobulges (bulges with Sérsic index $\rm n < 2$) in a local close major-merger galaxy pair sample (H-KPAIR). With data from new aperture photometries in the optical and near-infrared bands (aperture size of 7\;kpc) and from the literature, we find that the mean Age of central stellar populations in Spirals with pseudobulges is consistent w… ▽ More We present a study of star-forming galaxies (SFGs) with pseudobulges (bulges with Sérsic index $\rm n < 2$) in a local close major-merger galaxy pair sample (H-KPAIR). With data from new aperture photometries in the optical and near-infrared bands (aperture size of 7\;kpc) and from the literature, we find that the mean Age of central stellar populations in Spirals with pseudobulges is consistent with that of disky galaxies and is nearly constant against the bulge-to-total ratio (B/T). Paired Spirals have a slightly lower fraction of pure disk galaxies ($\rm B/T \leq 0.1$) than their counterparts in the control sample. Compared to SFGs with classical bulges, those with pseudobulges have a higher ($>2\;σ$) mean of specific star formation rate (sSFR) enhancement ($\rm sSFR_{enh} = 0.33\pm0.07$ vs $\rm sSFR_{enh} = 0.12\pm0.06$) and broader scatter (by $\sim 1$\;dex). The eight SFGs that have the highest $\rm sSFR_{enh}$ in the sample all have pseudobulges. A majority (69\%) of paired SFGs with strong enhancement (having sSFR more than 5 times the median of the control galaxies) have pseudobulges. The Spitzer data show that the pseudobulges in these galaxies are tightly linked to nuclear/circum-nuclear starbursts. Pseudobulge SFGs in S+S and in S+E pairs have significantly ($>3\;σ$) different sSFR enhancement, with the means of $\rm sSFR_{enh} = 0.45\pm0.08$ and $-0.04\pm0.11$, respectively. We find a decrease in the sSFR enhancements with the density of the environment for SFGs with pseudobulges. Since a high fraction (5/11) of pseudobulge SFGs in S+E pairs are in rich groups/clusters (local density $\rm N_{1Mpc} \geq 7$), the dense environment might be the cause for their low $\rm sSFR_{enh}$. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: Accepted for publication in RAA, ?? pages, 10 figures, 4 tables

arXiv:2403.15612 [pdf, other]

InterFusion: Text-Driven Generation of 3D Human-Object Interaction

Authors: Sisi Dai, Wenhao Li, Haowen Sun, Haibin Huang, Chongyang Ma, Hui Huang, Kai Xu, Ruizhen Hu

Abstract: In this study, we tackle the complex task of generating 3D human-object interactions (HOI) from textual descriptions in a zero-shot text-to-3D manner. We identify and address two key challenges: the unsatisfactory outcomes of direct text-to-3D methods in HOI, largely due to the lack of paired text-interaction data, and the inherent difficulties in simultaneously generating multiple concepts with c… ▽ More In this study, we tackle the complex task of generating 3D human-object interactions (HOI) from textual descriptions in a zero-shot text-to-3D manner. We identify and address two key challenges: the unsatisfactory outcomes of direct text-to-3D methods in HOI, largely due to the lack of paired text-interaction data, and the inherent difficulties in simultaneously generating multiple concepts with complex spatial relationships. To effectively address these issues, we present InterFusion, a two-stage framework specifically designed for HOI generation. InterFusion involves human pose estimations derived from text as geometric priors, which simplifies the text-to-3D conversion process and introduces additional constraints for accurate object generation. At the first stage, InterFusion extracts 3D human poses from a synthesized image dataset depicting a wide range of interactions, subsequently map** these poses to interaction descriptions. The second stage of InterFusion capitalizes on the latest developments in text-to-3D generation, enabling the production of realistic and high-quality 3D HOI scenes. This is achieved through a local-global optimization process, where the generation of human body and object is optimized separately, and jointly refined with a global optimization of the entire scene, ensuring a seamless and contextually coherent integration. Our experimental results affirm that InterFusion significantly outperforms existing state-of-the-art methods in 3D HOI generation. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.11901 [pdf, other]

Larimar: Large Language Models with Episodic Memory Control

Authors: Payel Das, Subhajit Chaudhury, Elliot Nelson, Igor Melnyk, Sarath Swaminathan, Sihui Dai, Aurélie Lozano, Georgios Kollias, Vijil Chenthamarakshan, Jiří, Navrátil, Soham Dan, Pin-Yu Chen

Abstract: Efficient and accurate updating of knowledge stored in Large Language Models (LLMs) is one of the most pressing research challenges today. This paper presents Larimar - a novel, brain-inspired architecture for enhancing LLMs with a distributed episodic memory. Larimar's memory allows for dynamic, one-shot updates of knowledge without the need for computationally expensive re-training or fine-tunin… ▽ More Efficient and accurate updating of knowledge stored in Large Language Models (LLMs) is one of the most pressing research challenges today. This paper presents Larimar - a novel, brain-inspired architecture for enhancing LLMs with a distributed episodic memory. Larimar's memory allows for dynamic, one-shot updates of knowledge without the need for computationally expensive re-training or fine-tuning. Experimental results on multiple fact editing benchmarks demonstrate that Larimar attains accuracy comparable to most competitive baselines, even in the challenging sequential editing setup, but also excels in speed - yielding speed-ups of 8-10x depending on the base LLM - as well as flexibility due to the proposed architecture being simple, LLM-agnostic, and hence general. We further provide mechanisms for selective fact forgetting, information leakage prevention, and input context length generalization with Larimar and show their effectiveness. Our code is available at https://github.com/IBM/larimar △ Less

Submitted 11 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: ICML 2024

arXiv:2403.11416 [pdf]

doi 10.1103/PhysRevB.109.115415

Surface region band enhancement in noble gas adsorption assisted ARPES on kagome superconductor RbV3Sb5

Authors: Cao Peng, Yiwei Li, Xu Chen, Shenghao Dai, Zewen Wu, Chunlong Wu, Qiang Wan, Keming Zhao, Renzhe Li, Shangkun Mo, Dingkun Qin, Shuming Yu, Hao Zhong, Shengjun Yuan, Jiangang Guo, Nan Xu

Abstract: Electronic states near surface regions can be distinct from bulk states, which are paramount in understanding various physical phenomena occurring at surfaces and in applications in semiconductors, energy, and catalysis. Here, we report an abnormal surface region band enhancement effect in angle-resolved photoemission spectroscopy on kagome superconductor RbV3Sb5, by depositing noble gases with fi… ▽ More Electronic states near surface regions can be distinct from bulk states, which are paramount in understanding various physical phenomena occurring at surfaces and in applications in semiconductors, energy, and catalysis. Here, we report an abnormal surface region band enhancement effect in angle-resolved photoemission spectroscopy on kagome superconductor RbV3Sb5, by depositing noble gases with fine control. In contrast to conventional surface contamination, the intensity of surface region Sb band can be enhanced more than three times with noble gas adsorption. In the meantime, a hole-dope effect is observed for the enhanced surface region band, with other bands hardly changing. The do** effect is more pronounced with heavier noble gases. We propose that noble gas atoms selectively fill into alkali metal vacancy sites on the surface, which improves the surface condition, boosts surface region bands, and effectively dopes it with the Pauli repulsion mechanism. Our results provide a novel and reversible way to improve surface conditions and tune surface region bands by controlled surface noble gas deposition. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: 17 pages,4 figures

Journal ref: Phys. Rev. B 109, 115415 (2024)

arXiv:2403.08191 [pdf, other]

Synchronized Dual-arm Rearrangement via Cooperative mTSP

Authors: Wenhao Li, Shishun Zhang, Sisi Dai, Hui Huang, Ruizhen Hu, Xiaohong Chen, Kai Xu

Abstract: Synchronized dual-arm rearrangement is widely studied as a common scenario in industrial applications. It often faces scalability challenges due to the computational complexity of robotic arm rearrangement and the high-dimensional nature of dual-arm planning. To address these challenges, we formulated the problem as cooperative mTSP, a variant of mTSP where agents share cooperative costs, and util… ▽ More Synchronized dual-arm rearrangement is widely studied as a common scenario in industrial applications. It often faces scalability challenges due to the computational complexity of robotic arm rearrangement and the high-dimensional nature of dual-arm planning. To address these challenges, we formulated the problem as cooperative mTSP, a variant of mTSP where agents share cooperative costs, and utilized reinforcement learning for its solution. Our approach involved representing rearrangement tasks using a task state graph that captured spatial relationships and a cooperative cost matrix that provided details about action costs. Taking these representations as observations, we designed an attention-based network to effectively combine them and provide rational task scheduling. Furthermore, a cost predictor is also introduced to directly evaluate actions during both training and planning, significantly expediting the planning process. Our experimental results demonstrate that our approach outperforms existing methods in terms of both performance and planning efficiency. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.06745 [pdf, other]

ACT-MNMT Auto-Constriction Turning for Multilingual Neural Machine Translation

Authors: Shaojie Dai, Xin Liu, ** Luo, Yue Yu

Abstract: Large language model (LLM) has achieved promising performance in multilingual machine translation tasks through zero/few-shot prompts or prompt-tuning. However, due to the mixture of multilingual data during the pre-training of LLM, the LLM-based translation models face the off-target issue in both prompt-based methods, including a series of phenomena, namely instruction misunderstanding, translat… ▽ More Large language model (LLM) has achieved promising performance in multilingual machine translation tasks through zero/few-shot prompts or prompt-tuning. However, due to the mixture of multilingual data during the pre-training of LLM, the LLM-based translation models face the off-target issue in both prompt-based methods, including a series of phenomena, namely instruction misunderstanding, translation with wrong language and over-generation. For this issue, this paper introduces an \textbf{\underline{A}}uto-\textbf{\underline{C}}onstriction \textbf{\underline{T}}urning mechanism for \textbf{\underline{M}}ultilingual \textbf{\underline{N}}eural \textbf{\underline{M}}achine \textbf{\underline{T}}ranslation (\model), which is a novel supervised fine-tuning mechanism and orthogonal to the traditional prompt-based methods. In this method, \model automatically constructs a constrained template in the target side by adding trigger tokens ahead of the ground truth. Furthermore, trigger tokens can be arranged and combined freely to represent different task semantics, and they can be iteratively updated to maximize the label likelihood. Experiments are performed on WMT test sets with multiple metrics, and the experimental results demonstrate that \model achieves substantially improved performance across multiple translation directions and reduce the off-target phenomena in the translation. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.04857 [pdf, other]

Dark Matter Line Searches with the Cherenkov Telescope Array

Authors: S. Abe, J. Abhir, A. Abhishek, F. Acero, A. Acharyya, R. Adam, A. Aguasca-Cabot, I. Agudo, A. Aguirre-Santaella, J. Alfaro, R. Alfaro, N. Alvarez-Crespo, R. Alves Batista, J. -P. Amans, E. Amato, G. Ambrosi, L. Angel, C. Aramo, C. Arcaro, T. T. H. Arnesen, L. Arrabito, K. Asano, Y. Ascasibar, J. Aschersleben, H. Ashkar , et al. (540 additional authors not shown)

Abstract: Monochromatic gamma-ray signals constitute a potential smoking gun signature for annihilating or decaying dark matter particles that could relatively easily be distinguished from astrophysical or instrumental backgrounds. We provide an updated assessment of the sensitivity of the Cherenkov Telescope Array (CTA) to such signals, based on observations of the Galactic centre region as well as of sele… ▽ More Monochromatic gamma-ray signals constitute a potential smoking gun signature for annihilating or decaying dark matter particles that could relatively easily be distinguished from astrophysical or instrumental backgrounds. We provide an updated assessment of the sensitivity of the Cherenkov Telescope Array (CTA) to such signals, based on observations of the Galactic centre region as well as of selected dwarf spheroidal galaxies. We find that current limits and detection prospects for dark matter masses above 300 GeV will be significantly improved, by up to an order of magnitude in the multi-TeV range. This demonstrates that CTA will set a new standard for gamma-ray astronomy also in this respect, as the world's largest and most sensitive high-energy gamma-ray observatory, in particular due to its exquisite energy resolution at TeV energies and the adopted observational strategy focussing on regions with large dark matter densities. Throughout our analysis, we use up-to-date instrument response functions, and we thoroughly model the effect of instrumental systematic uncertainties in our statistical treatment. We further present results for other potential signatures with sharp spectral features, e.g.~box-shaped spectra, that would likewise very clearly point to a particle dark matter origin. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 43 pages JCAP style (excluding author list and references), 19 figures

arXiv:2402.16722 [pdf]

All-optical polarization scrambler based on polarization beam splitting with amplified fiber ring

Authors: Yuanjie Yu, Shiyun Dai, Qiang Wu, Yu Long, Ai Liu, Peng Cai, Ligang Huang, Lei Gao, Tao Zhu

Abstract: Optical-fiber-based polarization scramblers can reduce the impact of polarization sensitive performance of various optical fiber systems. Here, we propose a simple and efficient polarization scrambler based on an all optical Mach-Zehnder structure by combining polarization beam splitter and amplified fiber ring. To totally decoherence one polarization splitted beam, a fiber ring together with an a… ▽ More Optical-fiber-based polarization scramblers can reduce the impact of polarization sensitive performance of various optical fiber systems. Here, we propose a simple and efficient polarization scrambler based on an all optical Mach-Zehnder structure by combining polarization beam splitter and amplified fiber ring. To totally decoherence one polarization splitted beam, a fiber ring together with an amplifier are incorporated. The ratio of two orthogonal beams can be controlled by varying the amplification factor, and we observe different evolution trajectories of the output state of polarizations on Poincare sphere. When the amplification factor exceeds a certain threshold, the scrambler system exhibits chaotical behavior. A commercial single wavelength laser with linewidth of 3 MHz is utilized to characterize the scrambling performance. We found that when the sampling rate is 1.6 MSa/s, a scrambling speed up to 2000 krad/s can be obtained for the average degree of polarization being less than 0.1. We also exploit these chaotic polarization fluctuations to generate random binary number, indicating that the proposed technique is a good candidate for random bit generator. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.14755 [pdf, other]

Entanglement Detection by Approximate Entanglement Witnesses

Authors: Samuel Dai, Ning Bao

Abstract: The problem of determining whether a given quantum state is separable is known to be computationally difficult. We develop an approach to this problem based on approximations of convex polytopes in high dimensions. By showing that a convex polytope constructed from a polynomial number of hyperplanes approximates the Euclidean ball arbitrarily well in high dimensions, we find evidence that a polyno… ▽ More The problem of determining whether a given quantum state is separable is known to be computationally difficult. We develop an approach to this problem based on approximations of convex polytopes in high dimensions. By showing that a convex polytope constructed from a polynomial number of hyperplanes approximates the Euclidean ball arbitrarily well in high dimensions, we find evidence that a polynomial-sized set of approximate entanglement witnesses is potentially sufficient to determine the entanglement of a state with high probability. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 7 pages, 2 figures

arXiv:2402.08968 [pdf, other]

GrounDial: Human-norm Grounded Safe Dialog Response Generation

Authors: Siwon Kim, Shuyang Dai, Mohammad Kachuee, Shayan Ray, Tara Taghavi, Sungroh Yoon

Abstract: Current conversational AI systems based on large language models (LLMs) are known to generate unsafe responses, agreeing to offensive user input or including toxic content. Previous research aimed to alleviate the toxicity, by fine-tuning LLM with manually annotated safe dialogue histories. However, the dependency on additional tuning requires substantial costs. To remove the dependency, we propos… ▽ More Current conversational AI systems based on large language models (LLMs) are known to generate unsafe responses, agreeing to offensive user input or including toxic content. Previous research aimed to alleviate the toxicity, by fine-tuning LLM with manually annotated safe dialogue histories. However, the dependency on additional tuning requires substantial costs. To remove the dependency, we propose GrounDial, where response safety is achieved by grounding responses to commonsense social rules without requiring fine-tuning. A hybrid approach of in-context learning and human-norm-guided decoding of GrounDial enables the response to be quantitatively and qualitatively safer even without additional data or tuning. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: Accepted to findings of EACL 2024

Showing 1–50 of 492 results for author: Daei, S