Search | arXiv e-print repository

MMBee: Live Streaming Gift-Sending Recommendations via Multi-Modal Fusion and Behaviour Expansion

Authors: Jiaxin Deng, Shiyao Wang, Yuchen Wang, Jiansong Qi, Liqin Zhao, Guorui Zhou, Gaofeng Meng

Abstract: Live streaming services are becoming increasingly popular due to real-time interactions and entertainment. Viewers can chat and send comments or virtual gifts to express their preferences for the streamers. Accurately modeling the gifting interaction not only enhances users' experience but also increases streamers' revenue. Previous studies on live streaming gifting prediction treat this task as a… ▽ More Live streaming services are becoming increasingly popular due to real-time interactions and entertainment. Viewers can chat and send comments or virtual gifts to express their preferences for the streamers. Accurately modeling the gifting interaction not only enhances users' experience but also increases streamers' revenue. Previous studies on live streaming gifting prediction treat this task as a conventional recommendation problem, and model users' preferences using categorical data and observed historical behaviors. However, it is challenging to precisely describe the real-time content changes in live streaming using limited categorical information. Moreover, due to the sparsity of gifting behaviors, capturing the preferences and intentions of users is quite difficult. In this work, we propose MMBee based on real-time Multi-Modal Fusion and Behaviour Expansion to address these issues. Specifically, we first present a Multi-modal Fusion Module with Learnable Query (MFQ) to perceive the dynamic content of streaming segments and process complex multi-modal interactions, including images, text comments and speech. To alleviate the sparsity issue of gifting behaviors, we present a novel Graph-guided Interest Expansion (GIE) approach that learns both user and streamer representations on large-scale gifting graphs with multi-modal attributes. Comprehensive experiment results show that MMBee achieves significant performance improvements on both public datasets and Kuaishou real-world streaming datasets and the effectiveness has been further validated through online A/B experiments. MMBee has been deployed and is serving hundreds of millions of users at Kuaishou. △ Less

Submitted 15 June, 2024; originally announced July 2024.

Comments: Accepted at KDD 2024

arXiv:2406.19999 [pdf, other]

The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models

Authors: Xinyi Chen, Baohao Liao, Jirui Qi, Panagiotis Eustratiadis, Christof Monz, Arianna Bisazza, Maarten de Rijke

Abstract: Following multiple instructions is a crucial ability for large language models (LLMs). Evaluating this ability comes with significant challenges: (i) limited coherence between multiple instructions, (ii) positional bias where the order of instructions affects model performance, and (iii) a lack of objectively verifiable tasks. To address these issues, we introduce a benchmark designed to evaluate… ▽ More Following multiple instructions is a crucial ability for large language models (LLMs). Evaluating this ability comes with significant challenges: (i) limited coherence between multiple instructions, (ii) positional bias where the order of instructions affects model performance, and (iii) a lack of objectively verifiable tasks. To address these issues, we introduce a benchmark designed to evaluate models' abilities to follow multiple instructions through sequential instruction following (SIFo) tasks. In SIFo, the successful completion of multiple instructions is verifiable by examining only the final instruction. Our benchmark evaluates instruction following using four tasks (text modification, question answering, mathematics, and security rule following), each assessing different aspects of sequential instruction following. Our evaluation of popular LLMs, both closed-source and open-source, shows that more recent and larger models significantly outperform their older and smaller counterparts on the SIFo tasks, validating the benchmark's effectiveness. All models struggle with following sequences of instructions, hinting at an important lack of robustness of today's language models. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.16824 [pdf, other]

Exploring Simple-Population and Multiple-Population Globular Clusters in the Outer Galactic Halo using the Hubble Space Telescope

Authors: E. P. Lagioia, A. P. Milone, M. V. Legnardi, G. Cordoni, E. Dondoglio, A. Renzini, M. Tailo, T. Ziliotto, M. Carlos, S. Jang, A. F. Marino, A. Mohandasan, J. Qi, G. Rangwal, E. Bortolan, F. Muratore

Abstract: The pseudo two-color diagram, known as chromosome map (ChM), is a valuable tool for identifying globular clusters (GCs) that consist of single or multiple stellar populations (MPs). Recent surveys of Galactic GCs using the ChM have provided stringent observational constraints on the formation of GCs and their stellar populations. However, these surveys have primarily focused on GCs at moderate dis… ▽ More The pseudo two-color diagram, known as chromosome map (ChM), is a valuable tool for identifying globular clusters (GCs) that consist of single or multiple stellar populations (MPs). Recent surveys of Galactic GCs using the ChM have provided stringent observational constraints on the formation of GCs and their stellar populations. However, these surveys have primarily focused on GCs at moderate distances from the Galactic center and composed of MPs. In this paper, we present the first detailed study of the stellar composition of four GCs in the outer halo of the Milky Way: Arp 2, Ruprecht 106, Terzan 7, and Terzan 8. Our analysis is based on highprecision photometry obtained from images collected with the Hubble Space Telescope in the F275W, F336W, F438W, F606W, and F814W bands. We find that Ruprecht 106 and Terzan 7 are composed solely of a single stellar population, whereas Arp 2 and Terzan 8 host both first- and second-population stars. In these clusters, the second population comprises about half and one-third of the total number of GC stars, respectively. The results from this paper and the literature suggest that the threshold in the initial GC mass, if present, should be smaller than approximately $10^{5}$ $M_{\odot}$. The first-population stars of Arp 2 and Terzan 8, along with the stars of the simple-population GCs Ruprecht 106 and Terzan 7, exhibit intrinsic F275W - F814W color spreads corresponding to [Fe/H] variations of approximately 0.05 - 0.30 dex. This indicates that star-to-star metallicity variations are a common feature of star clusters, regardless of the presence of MPs. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: Submitted to ApJ, 17 pages, 7 figures

arXiv:2406.14709 [pdf, other]

Factual Dialogue Summarization via Learning from Large Language Models

Authors: Rongxin Zhu, Jey Han Lau, Jianzhong Qi

Abstract: Factual consistency is an important quality in dialogue summarization. Large language model (LLM)-based automatic text summarization models generate more factually consistent summaries compared to those by smaller pretrained language models, but they face deployment challenges in real-world applications due to privacy or resource constraints. In this paper, we investigate the use of symbolic knowl… ▽ More Factual consistency is an important quality in dialogue summarization. Large language model (LLM)-based automatic text summarization models generate more factually consistent summaries compared to those by smaller pretrained language models, but they face deployment challenges in real-world applications due to privacy or resource constraints. In this paper, we investigate the use of symbolic knowledge distillation to improve the factual consistency of smaller pretrained models for dialogue summarization. We employ zero-shot learning to extract symbolic knowledge from LLMs, generating both factually consistent (positive) and inconsistent (negative) summaries. We then apply two contrastive learning objectives on these summaries to enhance smaller summarization models. Experiments with BART, PEGASUS, and Flan-T5 indicate that our approach surpasses strong baselines that rely on complex data augmentation strategies. Our approach achieves better factual consistency while maintaining coherence, fluency, and relevance, as confirmed by various automatic evaluation metrics. We also provide access to the data and code to facilitate future research. △ Less

Submitted 20 June, 2024; originally announced June 2024.

ACM Class: F.2.2; I.2.7

arXiv:2406.13663 [pdf, other]

Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation

Authors: Jirui Qi, Gabriele Sarti, Raquel Fernández, Arianna Bisazza

Abstract: Ensuring the verifiability of model answers is a fundamental challenge for retrieval-augmented generation (RAG) in the question answering (QA) domain. Recently, self-citation prompting was proposed to make large language models (LLMs) generate citations to supporting documents along with their answers. However, self-citing LLMs often struggle to match the required format, refer to non-existent sou… ▽ More Ensuring the verifiability of model answers is a fundamental challenge for retrieval-augmented generation (RAG) in the question answering (QA) domain. Recently, self-citation prompting was proposed to make large language models (LLMs) generate citations to supporting documents along with their answers. However, self-citing LLMs often struggle to match the required format, refer to non-existent sources, and fail to faithfully reflect LLMs' context usage throughout the generation. In this work, we present MIRAGE --Model Internals-based RAG Explanations -- a plug-and-play approach using model internals for faithful answer attribution in RAG applications. MIRAGE detects context-sensitive answer tokens and pairs them with retrieved documents contributing to their prediction via saliency methods. We evaluate our proposed approach on a multilingual extractive QA dataset, finding high agreement with human answer attribution. On open-ended QA, MIRAGE achieves citation quality and efficiency comparable to self-citation while also allowing for a finer-grained control of attribution parameters. Our qualitative evaluation highlights the faithfulness of MIRAGE's attributions and underscores the promising application of model internals for RAG answer attribution. △ Less

Submitted 1 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

Comments: Under review. Code and data released at https://github.com/Betswish/MIRAGE

arXiv:2406.13638 [pdf, other]

XENONnT WIMP Search: Signal & Background Modeling and Statistical Inference

Authors: XENON Collaboration, E. Aprile, J. Aalbers, K. Abe, S. Ahmed Maouloud, L. Althueser, B. Andrieu, E. Angelino, D. Antón Martin, F. Arneodo, L. Baudis, M. Bazyk, L. Bellagamba, R. Biondi, A. Bismark, K. Boese, A. Brown, G. Bruno, R. Budnik, J. M. R. Cardoso, A. P. Cimental Chávez, A. P. Colijn, J. Conrad, J. J. Cuenca-García, V. D'Andrea , et al. (139 additional authors not shown)

Abstract: The XENONnT experiment searches for weakly-interacting massive particle (WIMP) dark matter scattering off a xenon nucleus. In particular, XENONnT uses a dual-phase time projection chamber with a 5.9-tonne liquid xenon target, detecting both scintillation and ionization signals to reconstruct the energy, position, and type of recoil. A blind search for nuclear recoil WIMPs with an exposure of 1.1 t… ▽ More The XENONnT experiment searches for weakly-interacting massive particle (WIMP) dark matter scattering off a xenon nucleus. In particular, XENONnT uses a dual-phase time projection chamber with a 5.9-tonne liquid xenon target, detecting both scintillation and ionization signals to reconstruct the energy, position, and type of recoil. A blind search for nuclear recoil WIMPs with an exposure of 1.1 tonne-years yielded no signal excess over background expectations, from which competitive exclusion limits were derived on WIMP-nucleon elastic scatter cross sections, for WIMP masses ranging from 6 GeV/$c^2$ up to the TeV/$c^2$ scale. This work details the modeling and statistical methods employed in this search. By means of calibration data, we model the detector response, which is then used to derive background and signal models. The construction and validation of these models is discussed, alongside additional purely data-driven backgrounds. We also describe the statistical inference framework, including the definition of the likelihood function and the construction of confidence intervals. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 20 pages, 10 figures

arXiv:2406.12217 [pdf]

An integrated electro-optically tunable multi-channel interference cavity laser

Authors: Junxia Zhou, Yiran Zhu, Botao Fu, **ming Chen, Huiting Song, Zhihao Zhang, Jian** Yu, Jian Liu, Min Wang, Jia Qi, Ya Cheng

Abstract: We demonstrated a continuously tunable laser system by butt coupling a reflective semiconductor optical amplifier (RSOA) chip with a thin-film lithium niobate (TFLN) based multi-channel interference (MCI) cavity chip. This hybrid integrated lasers allows for fine-tuning of the laser wavelength from 1538 nm to 1560 nm with a resolution of 0.014 nm and a side-mode suppression ratio (SMSR) exceeding… ▽ More We demonstrated a continuously tunable laser system by butt coupling a reflective semiconductor optical amplifier (RSOA) chip with a thin-film lithium niobate (TFLN) based multi-channel interference (MCI) cavity chip. This hybrid integrated lasers allows for fine-tuning of the laser wavelength from 1538 nm to 1560 nm with a resolution of 0.014 nm and a side-mode suppression ratio (SMSR) exceeding 30 dB. The MCI cavity chip is fabricated using the photolithography assisted chemo-mechanical etching (PLACE) technique. The developed laser has an output power of approximately 10 μW, which can be further amplified to 70 mW using a commercial erbium-doped fiber amplifier (EDFA) without significant broadening of the laser linewidth. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.08744 [pdf]

Compact low-half-wave-voltage thin film lithium niobate electro-optic phase modulator fabricated by photolithography assisted chemo-mechanical etching

Authors: Lang Gao, Youting Liang, **ming Chen, Jian** Yu, Jia Qi, Lvbin Song, Jian Liu, Zhaoxiang Liu, Hongxin Qi, Ya Cheng

Abstract: This paper presents a compact dual-arm thin film lithium niobate (TFLN) electro-optic phase modulator fabricated using the photolithography-assisted chemo-mechanical etching (PLACE) technique. The design of the device allows for complete utilization of the microwave electric field, doubling the modulation efficiency compared to single-arm modulators in theory. With a half-wave voltage of approxima… ▽ More This paper presents a compact dual-arm thin film lithium niobate (TFLN) electro-optic phase modulator fabricated using the photolithography-assisted chemo-mechanical etching (PLACE) technique. The design of the device allows for complete utilization of the microwave electric field, doubling the modulation efficiency compared to single-arm modulators in theory. With a half-wave voltage of approximately 3 V and a modulation length of 1 cm, the device outperforms conventional phase modulators. Furthermore, the phase modulator exhibits low sensitivity to optical wavelengths in the range of 1510-1600 nm and offers a low insertion loss of 2.8 dB. The capability to generate multiple sideband signals for optical frequency comb applications is also demonstrated, producing 29 sideband signals at an input microwave power of 2 W. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08035 [pdf, other]

LVBench: An Extreme Long Video Understanding Benchmark

Authors: Weihan Wang, Zehai He, Wenyi Hong, Yean Cheng, Xiaohan Zhang, Ji Qi, Shiyu Huang, Bin Xu, Yuxiao Dong, Ming Ding, Jie Tang

Abstract: Recent progress in multimodal large language models has markedly enhanced the understanding of short videos (typically under one minute), and several evaluation datasets have emerged accordingly. However, these advancements fall short of meeting the demands of real-world applications such as embodied intelligence for long-term decision-making, in-depth movie reviews and discussions, and live sport… ▽ More Recent progress in multimodal large language models has markedly enhanced the understanding of short videos (typically under one minute), and several evaluation datasets have emerged accordingly. However, these advancements fall short of meeting the demands of real-world applications such as embodied intelligence for long-term decision-making, in-depth movie reviews and discussions, and live sports commentary, all of which require comprehension of long videos spanning several hours. To address this gap, we introduce LVBench, a benchmark specifically designed for long video understanding. Our dataset comprises publicly sourced videos and encompasses a diverse set of tasks aimed at long video comprehension and information extraction. LVBench is designed to challenge multimodal models to demonstrate long-term memory and extended comprehension capabilities. Our extensive evaluations reveal that current multimodal models still underperform on these demanding long video understanding tasks. Through LVBench, we aim to spur the development of more advanced models capable of tackling the complexities of long video comprehension. Our data and code are publicly available at: https://lvbench.github.io. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07925 [pdf, other]

FDLoRA: Personalized Federated Learning of Large Language Model via Dual LoRA Tuning

Authors: Jiaxing QI, Zhongzhi Luan, Shaohan Huang, Carol Fung, Hailong Yang, Depei Qian

Abstract: Large language models (LLMs) have emerged as important components across various fields, yet their training requires substantial computation resources and abundant labeled data. It poses a challenge to robustly training LLMs for individual users (clients). To tackle this challenge, the intuitive idea is to introduce federated learning (FL), which can collaboratively train models on distributed pri… ▽ More Large language models (LLMs) have emerged as important components across various fields, yet their training requires substantial computation resources and abundant labeled data. It poses a challenge to robustly training LLMs for individual users (clients). To tackle this challenge, the intuitive idea is to introduce federated learning (FL), which can collaboratively train models on distributed private data. However, existing methods suffer from the challenges of data heterogeneity, system heterogeneity, and model size, resulting in suboptimal performance and high costs. In this work, we proposed a variant of personalized federated learning (PFL) framework, namely FDLoRA, which allows the client to be a single device or a cluster and adopts low-rank adaptation (LoRA) tuning. FDLoRA sets dual LoRA modules on each client to capture personalized and global knowledge, respectively, and only the global LoRA module uploads parameters to the central server to aggregate cross-client knowledge. Finally, an adaptive fusion approach is employed to combine the parameters of the dual LoRAs. This enables FDLoRA to make effective use of private data distributed across different clients, thereby improving performance on the client without incurring high communication and computing costs. We conducted extensive experiments in two practice scenarios. The results demonstrate that FDLoRA outperforms six baselines in terms of performance, stability, robustness, computation cost, and communication cost. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.03150 [pdf, other]

Sample-specific Masks for Visual Reprogramming-based Prompting

Authors: Chengyi Cai, Zesheng Ye, Lei Feng, Jianzhong Qi, Feng Liu

Abstract: Visual reprogramming (VR) is a prompting technique that aims to re-purpose a pre-trained model (e.g., a classifier on ImageNet) to target tasks (e.g., medical data prediction) by learning a small-scale pattern added into input images instead of tuning considerable parameters within the model. The location of the pattern within input samples is usually determined by a pre-defined mask shared across… ▽ More Visual reprogramming (VR) is a prompting technique that aims to re-purpose a pre-trained model (e.g., a classifier on ImageNet) to target tasks (e.g., medical data prediction) by learning a small-scale pattern added into input images instead of tuning considerable parameters within the model. The location of the pattern within input samples is usually determined by a pre-defined mask shared across all samples. In this paper, we show that the shared mask potentially limits VR's generalization and increases its approximation error due to the lack of sample-level adaptation. Motivated by this finding, we design a new framework for VR called sample-specific multi-channel masks (SMM). Specifically, SMM employs a lightweight ConvNet and patch-wise interpolation to generate sample-specific three-channel masks instead of a shared and pre-defined mask. Since we generate different masks for individual samples, SMM is theoretically shown to reduce approximation error for the target tasks compared with existing state-of-the-art VR methods. We also empirically demonstrate its performance gain on both ResNet and ViT. The success of SMM further highlights the broader applicability of VR in leveraging the latent knowledge of pre-trained models for various target tasks. Our code is available at https://github.com/tmlr-group/SMM. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.01418 [pdf, ps, other]

Chromatic symmetric functions of conjoined graphs

Authors: E. Y. J. Qi, D. Q. B. Tang, D. G. L. Wang

Abstract: We introduce path-conjoined graphs defined for two rooted graphs by joining their roots with a path, and investigate the chromatic symmetric functions of its two generalizations: spider-conjoined graphs and chain-conjoined graphs. By using the composition method developed by Zhou and the third author, we obtain neat positive $e_I$-expansions for the chromatic symmetric functions of clique-path-cyc… ▽ More We introduce path-conjoined graphs defined for two rooted graphs by joining their roots with a path, and investigate the chromatic symmetric functions of its two generalizations: spider-conjoined graphs and chain-conjoined graphs. By using the composition method developed by Zhou and the third author, we obtain neat positive $e_I$-expansions for the chromatic symmetric functions of clique-path-cycle graphs, path-clique-path graphs, and path-clique-clique graphs. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.16835 [pdf]

Superionic surface Li-ion transport in carbonaceous materials

Authors: Jianbin Zhou, Shen Wang, Chaoshan Wu, Ji Qi, Hongli Wan, Shen Lai, Shijie Feng, Tsz Wai Ko, Zhaohui Liang, Ke Zhou, Nimrod Harpak, Nick Solan, Mengchen Liu, Zeyu Hui, Paulina J. Ai, Kent Griffith, Chunsheng Wang, Shyue ** Ong, Yan Yao, ** Liu

Abstract: Unlike Li-ion transport in the bulk of carbonaceous materials, little is known about Li-ion diffusion on their surface. In this study, we have discovered an ultra-fast Li-ion transport phenomenon on the surface of carbonaceous materials, particularly when they have limited Li insertion capacity along with a high surface area. This is exemplified by a carbon black, Ketjen Black (KB). An ionic condu… ▽ More Unlike Li-ion transport in the bulk of carbonaceous materials, little is known about Li-ion diffusion on their surface. In this study, we have discovered an ultra-fast Li-ion transport phenomenon on the surface of carbonaceous materials, particularly when they have limited Li insertion capacity along with a high surface area. This is exemplified by a carbon black, Ketjen Black (KB). An ionic conductivity of 18.1 mS cm-1 at room temperature is observed, far exceeding most solid-state ion conductors. Theoretical calculations reveal a low diffusion barrier for the surface Li species. The species is also identified as Li*, which features a partial positive charge. As a result, lithiated KB functions effectively as an interlayer between Li and solid-state electrolytes (SSE) to mitigate dendrite growth and cell shorting. This function is found to be electrolyte agnostic, effective for both sulfide and halide SSEs. Further, lithiated KB can act as a high-performance mixed ion/electron conductor that is thermodynamically stable at potentials near Li metal. A graphite anode mixed with KB instead of a solid electrolyte demonstrates full utilization with a capacity retention of ~85% over 300 cycles. The discovery of this surface-mediated ultra-fast Li-ion transport mechanism provides new directions for the design of solid-state ion conductors and solid-state batteries. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 21 pages, 6 figures

arXiv:2405.11826 [pdf, other]

Data quality control system and long-term performance monitor of the LHAASO-KM2A

Authors: Zhen Cao, F. Aharonian, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, H. X. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen , et al. (263 additional authors not shown)

Abstract: The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To… ▽ More The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively. △ Less

Submitted 13 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: 15 pages, 9 figures

arXiv:2405.03989 [pdf]

A Method for Parsing and Vectorization of Semi-structured Data used in Retrieval Augmented Generation

Authors: Hang Yang, **g Guo, Jianchuan Qi, **liang Xie, Si Zhang, Siqi Yang, Nan Li, Ming Xu

Abstract: This paper presents a novel method for parsing and vectorizing semi-structured data to enhance the functionality of Retrieval-Augmented Generation (RAG) within Large Language Models (LLMs). We developed a comprehensive pipeline for converting various data formats into .docx, enabling efficient parsing and structured data extraction. The core of our methodology involves the construction of a vector… ▽ More This paper presents a novel method for parsing and vectorizing semi-structured data to enhance the functionality of Retrieval-Augmented Generation (RAG) within Large Language Models (LLMs). We developed a comprehensive pipeline for converting various data formats into .docx, enabling efficient parsing and structured data extraction. The core of our methodology involves the construction of a vector database using Pinecone, which integrates seamlessly with LLMs to provide accurate, context-specific responses, particularly in environmental management and wastewater treatment operations. Through rigorous testing with both English and Chinese texts in diverse document formats, our results demonstrate a marked improvement in the precision and reliability of LLMs outputs. The RAG-enhanced models displayed enhanced ability to generate contextually rich and technically accurate responses, underscoring the potential of vector knowledge bases in significantly boosting the performance of LLMs in specialized domains. This research not only illustrates the effectiveness of our method but also highlights its potential to revolutionize data processing and analysis in environmental sciences, setting a precedent for future advancements in AI-driven applications. Our code is available at https://github.com/linancn/TianGong-AI-Unstructure.git. △ Less

Submitted 8 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

Comments: 20 pages,4 figures, 5 tables

arXiv:2404.09083 [pdf]

Interplay between electronic dephasing and localization in finite-sized Chern insulator

Authors: Yunhe Bai, Yuanzhao Li, Jianli Luan, Yang Chen, Zongwei Gao, Wenyu Song, Yitian Tong, **song Zhang, Yayu Wang, Junjie Qi, Chui-Zhen Chen, Hua Jiang, X. C. Xie, Ke He, Yang Feng, Xiao Feng, Qi-Kun Xue

Abstract: Anderson localization is anticipated to play a pivotal role in the manifestation of the quantum anomalous Hall effect, akin to its role in conventional quantum Hall effects. The significance of Anderson localization is particularly pronounced in elucidating the reasons behind the fragility of the observed quantum anomalous Hall state in the intrinsic magnetic topological insulator MnBi2Te4 with a… ▽ More Anderson localization is anticipated to play a pivotal role in the manifestation of the quantum anomalous Hall effect, akin to its role in conventional quantum Hall effects. The significance of Anderson localization is particularly pronounced in elucidating the reasons behind the fragility of the observed quantum anomalous Hall state in the intrinsic magnetic topological insulator MnBi2Te4 with a large predicted magnetic gap. Here, employing varying sized MnBi2Te4 micro/nano-structures fabricated from a single molecular-beam-epitaxy-grown thin film, we have carried out a systematic size- and temperature-dependent study on the transport properties of the films regarding the quantum anomalous Hall states. The low-temperature transport properties of the finite-sized MnBi2Te4 samples can be quantitatively understood through Anderson localization, which plays an indispensable role in stabilizing the ground states. At higher temperatures, the failure of electron localization induced by an excessively short electronic dephasing length is identified as the cause of deviation from quantization. The work reveals that electronic dephasing and localization are non-negligible factors in designing high-temperature quantum anomalous Hall systems. △ Less

Submitted 13 April, 2024; originally announced April 2024.

Comments: 20 pages, 4 figures

arXiv:2404.08875 [pdf]

Layer-by-layer connection for large area single crystal boron nitride multilayer films

Authors: Hui Shi, Mingyuan Wang, Hongying Chen, Adrien Rousseau, Junpeng Shu, Ming Tian, Ruowang Chen, Juliette Plo, Pierre Valvin, Bernard Gil, Jiajie Qi, Qinghe Wang, Kaihui Liu, Mingliang Zhang, Guillaume Cassabois, Di Wu, Neng Wan

Abstract: Boron nitride (BN) is today considered as one of the most promising materials for many novel applications including bright single photon emission, deep UV opto-electronics, small sized solid-state neutron detector, and high-performance two-dimensional materials, etc. Despite the recent successful fabrication of large-area BN single-crystals (typically <= 5 atomic layers), the scalable growth of th… ▽ More Boron nitride (BN) is today considered as one of the most promising materials for many novel applications including bright single photon emission, deep UV opto-electronics, small sized solid-state neutron detector, and high-performance two-dimensional materials, etc. Despite the recent successful fabrication of large-area BN single-crystals (typically <= 5 atomic layers), the scalable growth of thicker single-crystalline BN films still constitutes a great challenge. In this work, we demonstrate an approach to grow large-area multilayer single-crystal BN films by chemical vapor deposition on face-centered cubic Fe-Ni (111) single crystal alloy thin films with different stoichiometric phases. We show that the BN growth is greatly tunable and improved by increasing the Fe content in single-crystal Fe-Ni (111). The formation of pyramid-shaped multilayer BN domains with aligned orientation enables a continuous connection following a layer-by-layer, 'first-meet-first-connect', mosaic stitching mechanism. By means of selected area electron diffraction, micro-photoluminescence spectroscopy in the deep UV and high-resolution transmission electron microscopy, the layer-by-layer connection mechanism is unambiguously evidenced, and the stacking order has been verified to occur as unidirectional AB and ABC stackings, i.e., in the Bernal and rhombohedral BN phase. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2404.05948 [pdf, other]

On the robustness of double-word addition algorithms

Authors: Yuanyuan Yang, XinYu Lyu, Sida He, Xiliang Lu, Ji Qi, Zhihao Li

Abstract: We demonstrate that, even when there are moderate overlaps in the inputs of sloppy or accurate double-word addition algorithms in the QD library, these algorithms still guarantee error bounds of $O(u^2(|a|+|b|))$ in faithful rounding. Furthermore, the accurate algorithm can achieve a relative error bound of $O(u^2)$ in the presence of moderate overlaps in the inputs when rounding function is round… ▽ More We demonstrate that, even when there are moderate overlaps in the inputs of sloppy or accurate double-word addition algorithms in the QD library, these algorithms still guarantee error bounds of $O(u^2(|a|+|b|))$ in faithful rounding. Furthermore, the accurate algorithm can achieve a relative error bound of $O(u^2)$ in the presence of moderate overlaps in the inputs when rounding function is round-to-nearest. The relative error bound also holds in directed rounding, but certain additional conditions are required. Consequently, in double-word multiplication and addition operations, we can safely omit the normalization step of double-word multiplication and replace the accurate addition algorithm with the sloppy one. Numerical experiments confirm that this approach nearly doubles the performance of double-word multiplication and addition operations, with negligible precision costs. Moreover, in directed rounding mode, the signs of the errors of the two algorithms are consistent with the rounding direction, even in the presence of input overlap. This allows us to avoid changing the rounding mode in interval arithmetic. We also prove that the relative error bound of the sloppy addition algorithm exceeds $3u^2$ if and only if the input meets the condition of Sterbenz's Lemma when rounding to nearest. These findings suggest that the two addition algorithms are more robust than previously believed. △ Less

Submitted 10 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.05091

MM-MATH: Advancing Multimodal Math Evaluation with Process Evaluation and Fine-grained Classification

Authors: Kai Sun, Yushi Bai, Ji Qi, Lei Hou, Juanzi Li

Abstract: To advance the evaluation of multimodal math reasoning in large multimodal models (LMMs), this paper introduces a novel benchmark, MM-MATH. MM-MATH consists of 5,929 open-ended middle school math problems with visual contexts, with fine-grained classification across difficulty, grade level, and knowledge points. Unlike existing benchmarks relying on binary answer comparison, MM-MATH incorporates b… ▽ More To advance the evaluation of multimodal math reasoning in large multimodal models (LMMs), this paper introduces a novel benchmark, MM-MATH. MM-MATH consists of 5,929 open-ended middle school math problems with visual contexts, with fine-grained classification across difficulty, grade level, and knowledge points. Unlike existing benchmarks relying on binary answer comparison, MM-MATH incorporates both outcome and process evaluations. Process evaluation employs LMM-as-a-judge to automatically analyze solution steps, identifying and categorizing errors into specific error types. Extensive evaluation of ten models on MM-MATH reveals significant challenges for existing LMMs, highlighting their limited utilization of visual information and struggles with higher-difficulty problems. The best-performing model achieves only 31% accuracy on MM-MATH, compared to 82% for humans. This highlights the challenging nature of our benchmark for existing models and the significant gap between the multimodal reasoning capabilities of current models and humans. Our process evaluation reveals that diagram misinterpretation is the most common error, accounting for more than half of the total error cases, underscoring the need for improved image comprehension in multimodal reasoning. △ Less

Submitted 26 June, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

Comments: It has changed a lot from the previous version and needs to set up a new one

arXiv:2403.18282 [pdf, other]

SGDM: Static-Guided Dynamic Module Make Stronger Visual Models

Authors: Wenjie Xing, Zhenchao Cui, **g Qi

Abstract: The spatial attention mechanism has been widely used to improve object detection performance. However, its operation is currently limited to static convolutions lacking content-adaptive features. This paper innovatively approaches from the perspective of dynamic convolution. We propose Razor Dynamic Convolution (RDConv) to address thetwo flaws in dynamic weight convolution, making it hard to imple… ▽ More The spatial attention mechanism has been widely used to improve object detection performance. However, its operation is currently limited to static convolutions lacking content-adaptive features. This paper innovatively approaches from the perspective of dynamic convolution. We propose Razor Dynamic Convolution (RDConv) to address thetwo flaws in dynamic weight convolution, making it hard to implement in spatial mechanism: 1) it is computation-heavy; 2) when generating weights, spatial information is disregarded. Firstly, by using Razor Operation to generate certain features, we vastly reduce the parameters of the entire dynamic convolution operation. Secondly, we added a spatial branch inside RDConv to generate convolutional kernel parameters with richer spatial information. Embedding dynamic convolution will also bring the problem of sensitivity to high-frequency noise. We propose the Static-Guided Dynamic Module (SGDM) to address this limitation. By using SGDM, we utilize a set of asymmetric static convolution kernel parameters to guide the construction of dynamic convolution. We introduce the mechanism of shared weights in static convolution to solve the problem of dynamic convolution being sensitive to high-frequency noise. Extensive experiments illustrate that multiple different object detection backbones equipped with SGDM achieve a highly competitive boost in performance(e.g., +4% mAP with YOLOv5n on VOC and +1.7% mAP with YOLOv8n on COCO) with negligible parameter increase(i.e., +0.33M on YOLOv5n and +0.19M on YOLOv8n). △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: 16 pages, 4 figures

arXiv:2403.17448 [pdf, other]

Adaptive Line-Of-Sight guidance law based on vector fields path following for underactuated unmanned surface vehicle

Authors: Jie Qi, Ronghua Wanga, Nailong Wu

Abstract: The focus of this paper is to develop a methodology that enables an unmanned surface vehicle (USV) to efficiently track a planned path. The introduction of a vector field-based adaptive line of-sight guidance law (VFALOS) for accurate trajectory tracking and minimizing the overshoot response time during USV tracking of curved paths improves the overall line-of-sight (LOS) guidance method. These im… ▽ More The focus of this paper is to develop a methodology that enables an unmanned surface vehicle (USV) to efficiently track a planned path. The introduction of a vector field-based adaptive line of-sight guidance law (VFALOS) for accurate trajectory tracking and minimizing the overshoot response time during USV tracking of curved paths improves the overall line-of-sight (LOS) guidance method. These improvements contribute to faster convergence to the desired path, reduce oscillations, and can mitigate the effects of persistent external disturbances. It is shown that the proposed guidance law exhibits k-exponential stability when converging to the desired path consisting of straight and curved lines. The results in the paper show that the proposed method effectively improves the accuracy of the USV tracking the desired path while ensuring the safety of the USV work. △ Less

Submitted 5 April, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.14878 [pdf, other]

Offline tagging of radon-induced backgrounds in XENON1T and applicability to other liquid xenon detectors

Authors: E. Aprile, J. Aalbers, K. Abe, S. Ahmed Maouloud, L. Althueser, B. Andrieu, E. Angelino, J. R. Angevaare, D. Antón Martin, F. Arneodo, L. Baudis, A. L. Baxter, M. Bazyk, L. Bellagamba, R. Biondi, A. Bismark, E. J. Brookes, A. Brown, G. Bruno, R. Budnik, T. K. Bui, J. M. R. Cardoso, A. P. Cimental Chavez, A. P. Colijn, J. Conrad , et al. (142 additional authors not shown)

Abstract: This paper details the first application of a software tagging algorithm to reduce radon-induced backgrounds in liquid noble element time projection chambers, such as XENON1T and XENONnT. The convection velocity field in XENON1T was mapped out using $^{222}\text{Rn}$ and $^{218}\text{Po}$ events, and the root-mean-square convection speed was measured to be $0.30 \pm 0.01$ cm/s. Given this velocity… ▽ More This paper details the first application of a software tagging algorithm to reduce radon-induced backgrounds in liquid noble element time projection chambers, such as XENON1T and XENONnT. The convection velocity field in XENON1T was mapped out using $^{222}\text{Rn}$ and $^{218}\text{Po}$ events, and the root-mean-square convection speed was measured to be $0.30 \pm 0.01$ cm/s. Given this velocity field, $^{214}\text{Pb}$ background events can be tagged when they are followed by $^{214}\text{Bi}$ and $^{214}\text{Po}$ decays, or preceded by $^{218}\text{Po}$ decays. This was achieved by evolving a point cloud in the direction of a measured convection velocity field, and searching for $^{214}\text{Bi}$ and $^{214}\text{Po}$ decays or $^{218}\text{Po}$ decays within a volume defined by the point cloud. In XENON1T, this tagging system achieved a $^{214}\text{Pb}$ background reduction of $6.2^{+0.4}_{-0.9}\%$ with an exposure loss of $1.8\pm 0.2 \%$, despite the timescales of convection being smaller than the relevant decay times. We show that the performance can be improved in XENONnT, and that the performance of such a software-tagging approach can be expected to be further improved in a diffusion-limited scenario. Finally, a similar method might be useful to tag the cosmogenic $^{137}\text{Xe}$ background, which is relevant to the search for neutrinoless double-beta decay. △ Less

Submitted 19 June, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: 17 pages, 19 figures

arXiv:2403.11066 [pdf, other]

The masses and decay widths of the $S$-wave $Λ_c\barΛ_c$ bound states

Authors: Shi-Ji Cao, **g-Juan Qi, Xin-Heng Guo, Zhen-Yang Wang

Abstract: In this work, we investigate possible bound states of the $Λ_c\barΛ_c$ system in the Bethe-Salpeter formalism in the ladder and instantaneous approximations. By numerically solving the Bethe-Salpeter equation, we confirm the existence of $Λ_c\barΛ_c$ bound states with quantum numbers $J^{PC}=0^{-+}$ and $J^{PC}=1^{--}$. We further investigate the partial decay widths of the $Λ_c\barΛ_c$ bound stat… ▽ More In this work, we investigate possible bound states of the $Λ_c\barΛ_c$ system in the Bethe-Salpeter formalism in the ladder and instantaneous approximations. By numerically solving the Bethe-Salpeter equation, we confirm the existence of $Λ_c\barΛ_c$ bound states with quantum numbers $J^{PC}=0^{-+}$ and $J^{PC}=1^{--}$. We further investigate the partial decay widths of the $Λ_c\barΛ_c$ bound states into $N\bar{N}$, $D\bar{D}$, $D\bar{D}^\ast$, $D^\ast\bar{D}^\ast$, $π\barπ$, and $K\bar{K}$. Our results indicate that the decay width of the $Λ_c\barΛ_c$ bound state with $J^{PC}=1^{--}$ is much larger than that with $J^{PC}=0^{-+}$, and among their decay channels, the $D\bar{D}^\ast$ final state is the main decay mode. We suggest experiments to search for the $Λ_c\barΛ_c$ bound states in the $D\bar{D}^\ast$ final state. △ Less

Submitted 26 June, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

arXiv:2403.10309 [pdf, other]

Revolutionizing Packaging: A Robotic Bagging Pipeline with Constraint-aware Structure-of-Interest Planning

Authors: Jiaming Qi, Peng Zhou, Pai Zheng, Hongmin Wu, Chenguang Yang, David Navarro-Alarcon, Jia Pan

Abstract: Bagging operations, common in packaging and assisted living applications, are challenging due to a bag's complex deformable properties. To address this, we develop a robotic system for automated bagging tasks using an adaptive structure-of-interest (SOI) manipulation approach. Our method relies on real-time visual feedback to dynamically adjust manipulation without requiring prior knowledge of bag… ▽ More Bagging operations, common in packaging and assisted living applications, are challenging due to a bag's complex deformable properties. To address this, we develop a robotic system for automated bagging tasks using an adaptive structure-of-interest (SOI) manipulation approach. Our method relies on real-time visual feedback to dynamically adjust manipulation without requiring prior knowledge of bag materials or dynamics. We present a robust pipeline featuring state estimation for SOIs using Gaussian Mixture Models (GMM), SOI generation via optimization-based bagging techniques, SOI motion planning with Constrained Bidirectional Rapidly-exploring Random Trees (CBiRRT), and dual-arm manipulation coordinated by Model Predictive Control (MPC). Experiments demonstrate the system's ability to achieve precise, stable bagging of various objects using adaptive coordination of the manipulators. The proposed framework advances the capability of dual-arm robots to perform more sophisticated automation of common tasks involving interactions with deformable objects. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.06894 [pdf, other]

Scalable multi-qubit intrinsic gates in quantum dot arrays

Authors: Jiaan Qi, Zhi-Hai Liu, Hongqi Xu

Abstract: We study multi-qubit quantum gates intrinsic to an array of semiconductor quantum dots and investigate how they can be implemented in a scalable way. The intrinsic quantum gates refer to the class of natural-forming transformations in the qubit rotating-frame under direct exchange coupling, and can be recognized as an instruction set of spin-qubit chips. Adopting perturbative treatment, we can mod… ▽ More We study multi-qubit quantum gates intrinsic to an array of semiconductor quantum dots and investigate how they can be implemented in a scalable way. The intrinsic quantum gates refer to the class of natural-forming transformations in the qubit rotating-frame under direct exchange coupling, and can be recognized as an instruction set of spin-qubit chips. Adopting perturbative treatment, we can model the intrinsic gates by first-order dynamics in the coupling strength and develop a general formalism for identifying the multi-qubit intrinsic gates under arbitrary array connectivity. The advantageous applications of the intrinsic gates in quantum computing and quantum error correction are explored. Factors influencing the fidelities of the multi-qubit intrinsic gates are also discussed. To overcome the problem of inhomogeneous coupling, we propose a theoretical scheme in which single-qubit pulses are applied to dynamically calibrate the connecting bonds. This scheme can be further combined with periodic dynamical decoupling for robust implementations of multi-qubit gates in large-scale quantum computers. △ Less

Submitted 13 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.02576 [pdf, other]

AceMap: Knowledge Discovery through Academic Graph

Authors: Xinbing Wang, Luoyi Fu, Xiaoying Gan, Ying Wen, Guanjie Zheng, Jiaxin Ding, Liyao Xiang, Nanyang Ye, Meng **, Shiyu Liang, Bin Lu, Haiwen Wang, Yi Xu, Cheng Deng, Shao Zhang, Huquan Kang, Xingli Wang, Qi Li, Zhixin Guo, Jiexing Qi, Pan Liu, Yuyang Ren, Lyuwen Wu, Jungang Yang, Jian** Zhou , et al. (1 additional authors not shown)

Abstract: The exponential growth of scientific literature requires effective management and extraction of valuable insights. While existing scientific search engines excel at delivering search results based on relational databases, they often neglect the analysis of collaborations between scientific entities and the evolution of ideas, as well as the in-depth analysis of content within scientific publicatio… ▽ More The exponential growth of scientific literature requires effective management and extraction of valuable insights. While existing scientific search engines excel at delivering search results based on relational databases, they often neglect the analysis of collaborations between scientific entities and the evolution of ideas, as well as the in-depth analysis of content within scientific publications. The representation of heterogeneous graphs and the effective measurement, analysis, and mining of such graphs pose significant challenges. To address these challenges, we present AceMap, an academic system designed for knowledge discovery through academic graph. We present advanced database construction techniques to build the comprehensive AceMap database with large-scale academic entities that contain rich visual, textual, and numerical information. AceMap also employs innovative visualization, quantification, and analysis methods to explore associations and logical relationships among academic entities. AceMap introduces large-scale academic network visualization techniques centered on nebular graphs, providing a comprehensive view of academic networks from multiple perspectives. In addition, AceMap proposes a unified metric based on structural entropy to quantitatively measure the knowledge content of different academic entities. Moreover, AceMap provides advanced analysis capabilities, including tracing the evolution of academic ideas through citation relationships and concept co-occurrence, and generating concise summaries informed by this evolutionary process. In addition, AceMap uses machine reading methods to generate potential new ideas at the intersection of different fields. Exploring the integration of large language models and knowledge graphs is a promising direction for future research in idea evolution. Please visit \url{https://www.acemap.info} for further exploration. △ Less

Submitted 14 April, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: Technical Report for AceMap (https://www.acemap.info)

arXiv:2403.01799 [pdf, other]

Superpixel Graph Contrastive Clustering with Semantic-Invariant Augmentations for Hyperspectral Images

Authors: Jianhan Qi, Yuheng Jia, Hui Liu, Junhui Hou

Abstract: Hyperspectral images (HSI) clustering is an important but challenging task. The state-of-the-art (SOTA) methods usually rely on superpixels, however, they do not fully utilize the spatial and spectral information in HSI 3-D structure, and their optimization targets are not clustering-oriented. In this work, we first use 3-D and 2-D hybrid convolutional neural networks to extract the high-order spa… ▽ More Hyperspectral images (HSI) clustering is an important but challenging task. The state-of-the-art (SOTA) methods usually rely on superpixels, however, they do not fully utilize the spatial and spectral information in HSI 3-D structure, and their optimization targets are not clustering-oriented. In this work, we first use 3-D and 2-D hybrid convolutional neural networks to extract the high-order spatial and spectral features of HSI through pre-training, and then design a superpixel graph contrastive clustering (SPGCC) model to learn discriminative superpixel representations. Reasonable augmented views are crucial for contrastive clustering, and conventional contrastive learning may hurt the cluster structure since different samples are pushed away in the embedding space even if they belong to the same class. In SPGCC, we design two semantic-invariant data augmentations for HSI superpixels: pixel sampling augmentation and model weight augmentation. Then sample-level alignment and clustering-center-level contrast are performed for better intra-class similarity and inter-class dissimilarity of superpixel embeddings. We perform clustering and network optimization alternatively. Experimental results on several HSI datasets verify the advantages of the proposed method, e.g., on India Pines, our model improves the clustering accuracy from 58.79% to 67.59% compared to the SOTA method. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.00799 [pdf, other]

An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning

Authors: Zui Chen, Yezeng Chen, Jiaqi Han, Zhijie Huang, Ji Qi, Yi Zhou

Abstract: Large language models (LLMs) are displaying emergent abilities for math reasoning tasks,and there is a growing attention on enhancing the ability of open-source LLMs through supervised fine-tuning (SFT).In this paper, we aim to explore a general data strategy for supervised data to help optimize and expand math reasoning ability.Firstly, we determine the ability boundary of reasoning paths augment… ▽ More Large language models (LLMs) are displaying emergent abilities for math reasoning tasks,and there is a growing attention on enhancing the ability of open-source LLMs through supervised fine-tuning (SFT).In this paper, we aim to explore a general data strategy for supervised data to help optimize and expand math reasoning ability.Firstly, we determine the ability boundary of reasoning paths augmentation by identifying these paths' minimal optimal set.Secondly, we validate that different abilities of the model can be cumulatively enhanced by Mix of Minimal Optimal Sets of corresponding types of data, while our models MMOS achieve SOTA performance on series base models under much lower construction costs.Besides, we point out GSM-HARD is not really hard and today's LLMs no longer lack numerical robustness.Also, we provide an Auto Problem Generator for robustness testing and educational applications.Our code and data are publicly available at https://github.com/cyzhh/MMOS. △ Less

Submitted 23 February, 2024; originally announced March 2024.

Comments: 33 pages, 5 figures

arXiv:2402.10446 [pdf, other]

The XENONnT Dark Matter Experiment

Authors: XENON Collaboration, E. Aprile, J. Aalbers, K. Abe, S. Ahmed Maouloud, L. Althueser, B. Andrieu, E. Angelino, J. R. Angevaare, V. C. Antochi, D. Antón Martin, F. Arneodo, M. Balata, L. Baudis, A. L. Baxter, M. Bazyk, L. Bellagamba, R. Biondi, A. Bismark, E. J. Brookes, A. Brown, S. Bruenner, G. Bruno, R. Budnik, T. K. Bui , et al. (170 additional authors not shown)

Abstract: The multi-staged XENON program at INFN Laboratori Nazionali del Gran Sasso aims to detect dark matter with two-phase liquid xenon time projection chambers of increasing size and sensitivity. The XENONnT experiment is the latest detector in the program, planned to be an upgrade of its predecessor XENON1T. It features an active target of 5.9 tonnes of cryogenic liquid xenon (8.5 tonnes total mass in… ▽ More The multi-staged XENON program at INFN Laboratori Nazionali del Gran Sasso aims to detect dark matter with two-phase liquid xenon time projection chambers of increasing size and sensitivity. The XENONnT experiment is the latest detector in the program, planned to be an upgrade of its predecessor XENON1T. It features an active target of 5.9 tonnes of cryogenic liquid xenon (8.5 tonnes total mass in cryostat). The experiment is expected to extend the sensitivity to WIMP dark matter by more than an order of magnitude compared to XENON1T, thanks to the larger active mass and the significantly reduced background, improved by novel systems such as a radon removal plant and a neutron veto. This article describes the XENONnT experiment and its sub-systems in detail and reports on the detector performance during the first science run. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 32 pages, 19 figures

arXiv:2402.04798 [pdf, other]

Spiking-PhysFormer: Camera-Based Remote Photoplethysmography with Parallel Spike-driven Transformer

Authors: Mingxuan Liu, Jiankai Tang, Haoxiang Li, Jiahao Qi, Siwei Li, Kegang Wang, Yuntao Wang, Hong Chen

Abstract: Artificial neural networks (ANNs) can help camera-based remote photoplethysmography (rPPG) in measuring cardiac activity and physiological signals from facial videos, such as pulse wave, heart rate and respiration rate with better accuracy. However, most existing ANN-based methods require substantial computing resources, which poses challenges for effective deployment on mobile devices. Spiking ne… ▽ More Artificial neural networks (ANNs) can help camera-based remote photoplethysmography (rPPG) in measuring cardiac activity and physiological signals from facial videos, such as pulse wave, heart rate and respiration rate with better accuracy. However, most existing ANN-based methods require substantial computing resources, which poses challenges for effective deployment on mobile devices. Spiking neural networks (SNNs), on the other hand, hold immense potential for energy-efficient deep learning owing to their binary and event-driven architecture. To the best of our knowledge, we are the first to introduce SNNs into the realm of rPPG, proposing a hybrid neural network (HNN) model, the Spiking-PhysFormer, aimed at reducing power consumption. Specifically, the proposed Spiking-PhyFormer consists of an ANN-based patch embedding block, SNN-based transformer blocks, and an ANN-based predictor head. First, to simplify the transformer block while preserving its capacity to aggregate local and global spatio-temporal features, we design a parallel spike transformer block to replace sequential sub-blocks. Additionally, we propose a simplified spiking self-attention mechanism that omits the value parameter without compromising the model's performance. Experiments conducted on four datasets-PURE, UBFC-rPPG, UBFC-Phys, and MMPD demonstrate that the proposed model achieves a 12.4\% reduction in power consumption compared to PhysFormer. Additionally, the power consumption of the transformer block is reduced by a factor of 12.2, while maintaining decent performance as PhysFormer and other ANN-based models. △ Less

Submitted 9 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

Comments: Mingxuan Liu and Jiankai Tang are co-first authors of the article

arXiv:2402.04236 [pdf, other]

CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations

Authors: Ji Qi, Ming Ding, Weihan Wang, Yushi Bai, Qingsong Lv, Wenyi Hong, Bin Xu, Lei Hou, Juanzi Li, Yuxiao Dong, Jie Tang

Abstract: Vision-Language Models (VLMs) have demonstrated their broad effectiveness thanks to extensive training in aligning visual instructions to responses. However, such training of conclusive alignment leads models to ignore essential visual reasoning, further resulting in failures in meticulous visual problems and unfaithful responses. Drawing inspiration from human cognition in solving visual problems… ▽ More Vision-Language Models (VLMs) have demonstrated their broad effectiveness thanks to extensive training in aligning visual instructions to responses. However, such training of conclusive alignment leads models to ignore essential visual reasoning, further resulting in failures in meticulous visual problems and unfaithful responses. Drawing inspiration from human cognition in solving visual problems (e.g., marking, zoom in), this paper introduces Chain of Manipulations, a mechanism that enables VLMs to solve problems step-by-step with evidence. After training, models can solve various visual problems by eliciting intrinsic manipulations (e.g., grounding, zoom in) with results (e.g., boxes, image) actively without involving external tools, while also allowing users to trace error causes. We study the roadmap to implement this mechanism, including (1) a flexible design of manipulations upon extensive analysis, (2) an efficient automated data generation pipeline, (3) a compatible VLM architecture capable of multi-turn multi-image, and (4) a model training process for versatile capabilities. With the design, we also manually annotate 6K high-quality samples for the challenging graphical mathematical problems. Our trained model, \textbf{CogCoM}, equipped with this mechanism with 17B parameters achieves state-of-the-art performance across 9 benchmarks from 4 categories, demonstrating the effectiveness while preserving the interpretability. Our code, model weights, and collected data are publicly available at https://github.com/THUDM/CogCoM. △ Less

Submitted 22 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: 19 pages, 9 figures

arXiv:2402.04144 [pdf, other]

Phylogenetic Trees and the Moduli Space of n Points on the Projective Line

Authors: Herwig Hauser, Jiayue Qi, Josef Schicho

Abstract: This is an expository paper. The geometry of phylogenetic trees is used to present in an accessible and pleasant fashion the results of Deligne, Mumford, and Knudsen about the moduli space of n distinct points on the projective line and its compactification, the moduli space of n-pointed stable curves of genus zero. This is an expository paper. The geometry of phylogenetic trees is used to present in an accessible and pleasant fashion the results of Deligne, Mumford, and Knudsen about the moduli space of n distinct points on the projective line and its compactification, the moduli space of n-pointed stable curves of genus zero. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: 60 pages, 30 figures

MSC Class: 2020: 14-02; 14D20; 14D22; 14H10; 05C05

arXiv:2402.02091 [pdf]

Feasibility of PET-enabled dual-energy CT imaging: First physical phantom and patient results

Authors: Yansong Zhu, Siqi Li, Zhaoheng Xie, Edwin K. Leung, Reimund Bayerlein, Negar Omidvari, Simon R. Cherry, **yi Qi, Ramsey D. Badawi, Benjamin A. Spencer, Guobao Wang

Abstract: X-ray computed tomography (CT) in PET/CT is commonly operated with a single energy, resulting in a limitation of lacking tissue composition information. Dual-energy (DE) spectral CT enables material decomposition by using two different x-ray energies and may be combined with PET for improved multimodality imaging, but would either require hardware upgrade or increase radiation dose due to the adde… ▽ More X-ray computed tomography (CT) in PET/CT is commonly operated with a single energy, resulting in a limitation of lacking tissue composition information. Dual-energy (DE) spectral CT enables material decomposition by using two different x-ray energies and may be combined with PET for improved multimodality imaging, but would either require hardware upgrade or increase radiation dose due to the added second x-ray CT scan. Recently proposed PET-enabled DECT method allows dual-energy spectral imaging using a conventional PET/CT scanner without the need for a second x-ray CT scan. A gamma-ray CT (gCT) image at 511 keV can be generated from the existing time-of-flight PET data with the maximum-likelihood attenuation and activity (MLAA) approach and is then combined with the low-energy x-ray CT image to form dual-energy spectral imaging. To improve the image quality of gCT, a kernel MLAA method was further proposed by incorporating x-ray CT as a priori information. The concept of this PET-enabled DECT has been validated using simulation studies, but not yet with 3D real data. In this work, we developed a general open-source implementation for gCT reconstruction from PET data and use this implementation for the first real data validation with both a physical phantom study and a human subject study on a uEXPLORER total-body PET/CT system. These results have demonstrated the feasibility of this method for spectral imaging and material decomposition. △ Less

Submitted 11 April, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

Comments: 8 pages, 11 figures

arXiv:2402.00572 [pdf, other]

doi 10.1039/D4DD00039K

Developments and applications of the OPTIMADE API for materials discovery, design, and data exchange

Authors: Matthew L. Evans, Johan Bergsma, Andrius Merkys, Casper W. Andersen, Oskar B. Andersson, Daniel Beltrán, Evgeny Blokhin, Tara M. Boland, Rubén Castañeda Balderas, Kamal Choudhary, Alberto Díaz Díaz, Rodrigo Domínguez García, Hagen Eckert, Kristjan Eimre, María Elena Fuentes Montero, Adam M. Krajewski, Jens Jørgen Mortensen, José Manuel Nápoles Duarte, Jacob Pietryga, Ji Qi, Felipe de Jesús Trejo Carrillo, Antanas Vaitkus, Jusong Yu, Adam Zettel, Pedro Baptista de Castro , et al. (34 additional authors not shown)

Abstract: The Open Databases Integration for Materials Design (OPTIMADE) application programming interface (API) empowers users with holistic access to a growing federation of databases, enhancing the accessibility and discoverability of materials and chemical data. Since the first release of the OPTIMADE specification (v1.0), the API has undergone significant development, leading to the upcoming v1.2 relea… ▽ More The Open Databases Integration for Materials Design (OPTIMADE) application programming interface (API) empowers users with holistic access to a growing federation of databases, enhancing the accessibility and discoverability of materials and chemical data. Since the first release of the OPTIMADE specification (v1.0), the API has undergone significant development, leading to the upcoming v1.2 release, and has underpinned multiple scientific studies. In this work, we highlight the latest features of the API format, accompanying software tools, and provide an update on the implementation of OPTIMADE in contributing materials databases. We end by providing several use cases that demonstrate the utility of the OPTIMADE API in materials research that continue to drive its ongoing development. △ Less

Submitted 5 April, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.18058 [pdf, other]

LongAlign: A Recipe for Long Context Alignment of Large Language Models

Authors: Yushi Bai, Xin Lv, Jiajie Zhang, Yuze He, Ji Qi, Lei Hou, Jie Tang, Yuxiao Dong, Juanzi Li

Abstract: Extending large language models to effectively handle long contexts requires instruction fine-tuning on input sequences of similar length. To address this, we present LongAlign -- a recipe of the instruction data, training, and evaluation for long context alignment. First, we construct a long instruction-following dataset using Self-Instruct. To ensure the data diversity, it covers a broad range o… ▽ More Extending large language models to effectively handle long contexts requires instruction fine-tuning on input sequences of similar length. To address this, we present LongAlign -- a recipe of the instruction data, training, and evaluation for long context alignment. First, we construct a long instruction-following dataset using Self-Instruct. To ensure the data diversity, it covers a broad range of tasks from various long context sources. Second, we adopt the packing and sorted batching strategies to speed up supervised fine-tuning on data with varied length distributions. Additionally, we develop a loss weighting method to balance the contribution to the loss across different sequences during packing training. Third, we introduce the LongBench-Chat benchmark for evaluating instruction-following capabilities on queries of 10k-100k in length. Experiments show that LongAlign outperforms existing recipes for LLMs in long context tasks by up to 30\%, while also maintaining their proficiency in handling short, generic tasks. The code, data, and long-aligned models are open-sourced at https://github.com/THUDM/LongAlign. △ Less

Submitted 31 January, 2024; originally announced January 2024.

arXiv:2401.12436 [pdf, other]

Wasserstein Differential Privacy

Authors: Chengyi Yang, Jiayin Qi, Aimin Zhou

Abstract: Differential privacy (DP) has achieved remarkable results in the field of privacy-preserving machine learning. However, existing DP frameworks do not satisfy all the conditions for becoming metrics, which prevents them from deriving better basic private properties and leads to exaggerated values on privacy budgets. We propose Wasserstein differential privacy (WDP), an alternative DP framework to m… ▽ More Differential privacy (DP) has achieved remarkable results in the field of privacy-preserving machine learning. However, existing DP frameworks do not satisfy all the conditions for becoming metrics, which prevents them from deriving better basic private properties and leads to exaggerated values on privacy budgets. We propose Wasserstein differential privacy (WDP), an alternative DP framework to measure the risk of privacy leakage, which satisfies the properties of symmetry and triangle inequality. We show and prove that WDP has 13 excellent properties, which can be theoretical supports for the better performance of WDP than other DP frameworks. In addition, we derive a general privacy accounting method called Wasserstein accountant, which enables WDP to be applied in stochastic gradient descent (SGD) scenarios containing sub-sampling. Experiments on basic mechanisms, compositions and deep learning show that the privacy budgets obtained by Wasserstein accountant are relatively stable and less influenced by order. Moreover, the overestimation on privacy budgets can be effectively alleviated. The code is available at https://github.com/Hifipsysta/WDP. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: Accepted by AAAI 2024

arXiv:2401.11818 [pdf, other]

MInD: Improving Multimodal Sentiment Analysis via Multimodal Information Disentanglement

Authors: Weichen Dai, Xingyu Li, Pengbo Hu, Zeyu Wang, Ji Qi, Jianlin Peng, Yi Zhou

Abstract: Learning effective joint representations has been a central task in multimodal sentiment analysis. Previous methods focus on leveraging the correlations between different modalities and enhancing performance through sophisticated fusion techniques. However, challenges still exist due to the inherent heterogeneity of distinct modalities, which may lead to distributional gap, impeding the full explo… ▽ More Learning effective joint representations has been a central task in multimodal sentiment analysis. Previous methods focus on leveraging the correlations between different modalities and enhancing performance through sophisticated fusion techniques. However, challenges still exist due to the inherent heterogeneity of distinct modalities, which may lead to distributional gap, impeding the full exploitation of inter-modal information and resulting in redundancy and impurity in the information extracted from features. To address this problem, we introduce the Multimodal Information Disentanglement (MInD) approach. MInD decomposes the multimodal inputs into a modality-invariant component, a modality-specific component, and a remnant noise component for each modality through a shared encoder and multiple private encoders. The shared encoder aims to explore the shared information and commonality across modalities, while the private encoders are deployed to capture the distinctive information and characteristic features. These representations thus furnish a comprehensive perspective of the multimodal data, facilitating the fusion process instrumental for subsequent prediction tasks. Furthermore, MInD improves the learned representations by explicitly modeling the task-irrelevant noise in an adversarial manner. Experimental evaluations conducted on benchmark datasets, including CMU-MOSI, CMU-MOSEI, and UR-Funny, demonstrate MInD's superior performance over existing state-of-the-art methods in both multimodal emotion recognition and multimodal humor detection tasks. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.11432 [pdf, other]

Bimanual Deformable Bag Manipulation Using a Structure-of-Interest Based Latent Dynamics Model

Authors: Peng Zhou, Pai Zheng, Jiaming Qi, Chenxi Li, Chenguang Yang, David Navarro-Alarcon, Jia Pan

Abstract: The manipulation of deformable objects by robotic systems presents a significant challenge due to their complex and infinite-dimensional configuration spaces. This paper introduces a novel approach to Deformable Object Manipulation (DOM) by emphasizing the identification and manipulation of Structures of Interest (SOIs) in deformable fabric bags. We propose a bimanual manipulation framework that l… ▽ More The manipulation of deformable objects by robotic systems presents a significant challenge due to their complex and infinite-dimensional configuration spaces. This paper introduces a novel approach to Deformable Object Manipulation (DOM) by emphasizing the identification and manipulation of Structures of Interest (SOIs) in deformable fabric bags. We propose a bimanual manipulation framework that leverages a Graph Neural Network (GNN)-based latent dynamics model to succinctly represent and predict the behavior of these SOIs. Our approach involves constructing a graph representation from partial point cloud data of the object and learning the latent dynamics model that effectively captures the essential deformations of the fabric bag within a reduced computational space. By integrating this latent dynamics model with Model Predictive Control (MPC), we empower robotic manipulators to perform precise and stable manipulation tasks focused on the SOIs. We have validated our framework through various empirical experiments demonstrating its efficacy in bimanual manipulation of fabric bags. Our contributions not only address the complexities inherent in DOM but also provide new perspectives and methodologies for enhancing robotic interactions with deformable objects by concentrating on their critical structural elements. Experimental videos can be obtained from https://sites.google.com/view/bagbot. △ Less

Submitted 21 January, 2024; originally announced January 2024.

arXiv:2401.10518 [pdf, other]

Spatial-temporal Forecasting for Regions without Observations

Authors: Xinyu Su, Jianzhong Qi, Egemen Tanin, Yanchuan Chang, Majid Sarvi

Abstract: Spatial-temporal forecasting plays an important role in many real-world applications, such as traffic forecasting, air pollutant forecasting, crowd-flow forecasting, and so on. State-of-the-art spatial-temporal forecasting models take data-driven approaches and rely heavily on data availability. Such models suffer from accuracy issues when data is incomplete, which is common in reality due to the… ▽ More Spatial-temporal forecasting plays an important role in many real-world applications, such as traffic forecasting, air pollutant forecasting, crowd-flow forecasting, and so on. State-of-the-art spatial-temporal forecasting models take data-driven approaches and rely heavily on data availability. Such models suffer from accuracy issues when data is incomplete, which is common in reality due to the heavy costs of deploying and maintaining sensors for data collection. A few recent studies attempted to address the issue of incomplete data. They typically assume some data availability in a region of interest either for a short period or at a few locations. In this paper, we further study spatial-temporal forecasting for a region of interest without any historical observations, to address scenarios such as unbalanced region development, progressive deployment of sensors or lack of open data. We propose a model named STSM for the task. The model takes a contrastive learning-based approach to learn spatial-temporal patterns from adjacent regions that have recorded data. Our key insight is to learn from the locations that resemble those in the region of interest, and we propose a selective masking strategy to enable the learning. As a result, our model outperforms adapted state-of-the-art models, reducing errors consistently over both traffic and air pollutant forecasting tasks. The source code is available at https://github.com/suzy0223/STSM. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: Accepted by EDBT2024

arXiv:2401.03526 [pdf]

One-dimensional Multiferroic Semiconductor WOI3: Unconventional Anisotropic d^1 Rule and Bulk Photovoltaic Effect

Authors: Zhihao Gong, Yechen Xun, Zhuang Qian, Kai Chang, **gshan Qi, Hua Wang

Abstract: The pursuit of multiferroic magnetoelectrics, combining simultaneous ferroelectric and magnetic orders, remains a central focus in condensed matter physics. Here we report the centrosymmetric, one-dimensional (1D) antiferromagnetic WOI$_3$ undergoes a strain-induced ferroelectric distortion. The paraelectric-ferroelectric transition is originated from the unconventional anisotropic $d^1$ mechanism… ▽ More The pursuit of multiferroic magnetoelectrics, combining simultaneous ferroelectric and magnetic orders, remains a central focus in condensed matter physics. Here we report the centrosymmetric, one-dimensional (1D) antiferromagnetic WOI$_3$ undergoes a strain-induced ferroelectric distortion. The paraelectric-ferroelectric transition is originated from the unconventional anisotropic $d^1$ mechanism, where an unpaired d electron of each W$^{5+}$ ion contributes to magnetic orders. Employing a Heisenberg model with Dzyaloshinskii-Moriya interaction, we predict an antiferromagnetic spin configuration as the paraelectric ground state, transitioning to a ferroelectric phase with noncollinear spin arrangement under uniaxial strain. The ferroelectric polarization and noncollinear spin arrangement can be manipulated by varying the applied strain. While the energy barriers for switching ferroelectric polarizations with magnetic orders are on the order of a few dozen of meV, the shift current bulk photovoltaic effect (BPVE) exhibits remarkable differences, providing a precise and valuable tool for experimentally probing the interplay of ferroelectric and magnetic orders in 1D WOI$_3$. △ Less

Submitted 13 March, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

Comments: 19 pages, 5 figures

arXiv:2401.02992 [pdf]

Advanced Unstructured Data Processing for ESG Reports: A Methodology for Structured Transformation and Enhanced Analysis

Authors: Jiahui Peng, **g Gao, Xin Tong, **g Guo, Hang Yang, Jianchuan Qi, Ruiqiao Li, Nan Li, Ming Xu

Abstract: In the evolving field of corporate sustainability, analyzing unstructured Environmental, Social, and Governance (ESG) reports is a complex challenge due to their varied formats and intricate content. This study introduces an innovative methodology utilizing the "Unstructured Core Library", specifically tailored to address these challenges by transforming ESG reports into structured, analyzable for… ▽ More In the evolving field of corporate sustainability, analyzing unstructured Environmental, Social, and Governance (ESG) reports is a complex challenge due to their varied formats and intricate content. This study introduces an innovative methodology utilizing the "Unstructured Core Library", specifically tailored to address these challenges by transforming ESG reports into structured, analyzable formats. Our approach significantly advances the existing research by offering high-precision text cleaning, adept identification and extraction of text from images, and standardization of tables within these reports. Emphasizing its capability to handle diverse data types, including text, images, and tables, the method adeptly manages the nuances of differing page layouts and report styles across industries. This research marks a substantial contribution to the fields of industrial ecology and corporate sustainability assessment, paving the way for the application of advanced NLP technologies and large language models in the analysis of corporate governance and sustainability. Our code is available at https://github.com/linancn/TianGong-AI-Unstructure.git. △ Less

Submitted 4 January, 2024; originally announced January 2024.

arXiv:2401.01577 [pdf, other]

Test-Time Personalization with Meta Prompt for Gaze Estimation

Authors: Huan Liu, Julia Qi, Zhenhao Li, Mohammad Hassanpour, Yang Wang, Konstantinos Plataniotis, Yuanhao Yu

Abstract: Despite the recent remarkable achievement in gaze estimation, efficient and accurate personalization of gaze estimation without labels is a practical problem but rarely touched on in the literature. To achieve efficient personalization, we take inspiration from the recent advances in Natural Language Processing (NLP) by updating a negligible number of parameters, "prompts", at the test time. Speci… ▽ More Despite the recent remarkable achievement in gaze estimation, efficient and accurate personalization of gaze estimation without labels is a practical problem but rarely touched on in the literature. To achieve efficient personalization, we take inspiration from the recent advances in Natural Language Processing (NLP) by updating a negligible number of parameters, "prompts", at the test time. Specifically, the prompt is additionally attached without perturbing original network and can contain less than 1% of a ResNet-18's parameters. Our experiments show high efficiency of the prompt tuning approach. The proposed one can be 10 times faster in terms of adaptation speed than the methods compared. However, it is non-trivial to update the prompt for personalized gaze estimation without labels. At the test time, it is essential to ensure that the minimizing of particular unsupervised loss leads to the goals of minimizing gaze estimation error. To address this difficulty, we propose to meta-learn the prompt to ensure that its updates align with the goal. Our experiments show that the meta-learned prompt can be effectively adapted even with a simple symmetry loss. In addition, we experiment on four cross-dataset validations to show the remarkable advantages of the proposed method. Code is available at https://github.com/hmarkamcan/TPGaze. △ Less

Submitted 12 March, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

Comments: Accepted by AAAI 2024

arXiv:2312.17259 [pdf]

Empowering Working Memory for Large Language Model Agents

Authors: **g Guo, Nan Li, Jianchuan Qi, Hang Yang, Ruiqiao Li, Yuzhen Feng, Si Zhang, Ming Xu

Abstract: Large language models (LLMs) have achieved impressive linguistic capabilities. However, a key limitation persists in their lack of human-like memory faculties. LLMs exhibit constrained memory retention across sequential interactions, hindering complex reasoning. This paper explores the potential of applying cognitive psychology's working memory frameworks, to enhance LLM architecture. The limitati… ▽ More Large language models (LLMs) have achieved impressive linguistic capabilities. However, a key limitation persists in their lack of human-like memory faculties. LLMs exhibit constrained memory retention across sequential interactions, hindering complex reasoning. This paper explores the potential of applying cognitive psychology's working memory frameworks, to enhance LLM architecture. The limitations of traditional LLM memory designs are analyzed, including their isolation of distinct dialog episodes and lack of persistent memory links. To address this, an innovative model is proposed incorporating a centralized Working Memory Hub and Episodic Buffer access to retain memories across episodes. This architecture aims to provide greater continuity for nuanced contextual reasoning during intricate tasks and collaborative scenarios. While promising, further research is required into optimizing episodic memory encoding, storage, prioritization, retrieval, and security. Overall, this paper provides a strategic blueprint for develo** LLM agents with more sophisticated, human-like memory capabilities, highlighting memory mechanisms as a vital frontier in artificial general intelligence. △ Less

Submitted 28 May, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

arXiv:2312.16355 [pdf, other]

Efficient Cost Modeling of Space-filling Curves

Authors: Guanli Liu, Lars Kulik, Christian S. Jensen, Tianyi Li, Jianzhong Qi

Abstract: A space-filling curve (SFC) maps points in a multi-dimensional space to one-dimensional points by discretizing the multi-dimensional space into cells and imposing a linear order on the cells. This way, an SFC enables the indexing of multi-dimensional data using a one-dimensional index such as a B+-tree. Choosing an appropriate SFC is crucial, as different SFCs have different effects on query perfo… ▽ More A space-filling curve (SFC) maps points in a multi-dimensional space to one-dimensional points by discretizing the multi-dimensional space into cells and imposing a linear order on the cells. This way, an SFC enables the indexing of multi-dimensional data using a one-dimensional index such as a B+-tree. Choosing an appropriate SFC is crucial, as different SFCs have different effects on query performance. Currently, there are two primary strategies: 1) deterministic schemes, which are computationally efficient but often yield suboptimal query performance, and 2) dynamic schemes, which consider a broad range of candidate SFCs based on cost functions but incur significant computational overhead. Despite these strategies, existing methods cannot efficiently measure the effectiveness of SFCs under heavy query workloads and numerous SFC options. To address this problem, we propose means of constant-time cost estimations that can enhance existing SFC selection algorithms, enabling them to learn more effective SFCs. Additionally, we propose an SFC learning method that leverages reinforcement learning and our cost estimation to choose an SFC pattern efficiently. Experimental studies offer evidence of the effectiveness and efficiency of the proposed means of cost estimation and SFC learning. △ Less

Submitted 26 December, 2023; originally announced December 2023.

arXiv:2312.15371 [pdf, other]

New three-dimensional dispersion in the type-II Dirac semimetals PtTe$_2$ and PdTe$_2$ revealed through Angle Resolved Photoemission Spectroscopy

Authors: Ivan Pelayo, Derek Bergner, Archibald J. Williams, Jiayuwen Qi, Penghao Zhu, Mahfuzun Nabi, Warren L. B. Huey, Luca Moreschini, Ziling Deng, Jonathan Denlinger, Alessandra Lanzara, Yuan-Ming Lu, Wolfgang Windl, Joshua Goldberger, Claudia Ojeda-Aristizabal

Abstract: PtTe$_2$ and PdTe$_2$ are among the first transition metal dichalcogenides that were predicted to host type-II Dirac fermions, exotic particles prohibited in free space. These materials are layered and air-stable, which makes them top candidates for technological applications that take advantage of their anisotropic magnetotransport properties. Here, we provide a detailed characterization of the e… ▽ More PtTe$_2$ and PdTe$_2$ are among the first transition metal dichalcogenides that were predicted to host type-II Dirac fermions, exotic particles prohibited in free space. These materials are layered and air-stable, which makes them top candidates for technological applications that take advantage of their anisotropic magnetotransport properties. Here, we provide a detailed characterization of the electronic structure of PtTe$_2$ and PdTe$_2$ using Angle Resolved Photoemission Spectroscopy (ARPES) and Density Functional Theory (DFT) calculations, unveiling a new three-dimensional dispersion in these materials. Through the use of circularly polarized light, we report a different behavior of such dispersion in PdTe$_2$ compared to PtTe$_2$, that we relate to a symmetry analysis of the dipole matrix element. Such analysis reveals a link between the observed circular dichroism and the different momentum-dependent terms in the dispersion of these two compounds, despite their close similarity in crystal structure. Additionally, our data shows a clear difference in the circular dichroic signal for the type-II Dirac cones characteristic of these materials, compared to their topologically protected surface states. Our work provides a useful reference for the ARPES characterization of other transition metal dichalcogenides with topological properties and illustrates the use of circular dichroism as a guide to identify the topological character of two otherwise equivalent band dispersions, and to recognize different attributes in the band structure of similar materials. △ Less

Submitted 16 May, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

Comments: 14 pages, 9 figures

arXiv:2312.05402 [pdf, other]

Towards Controlled Table-to-Text Generation with Scientific Reasoning

Authors: Zhixin Guo, Jian** Zhou, Jiexing Qi, Mingxuan Yan, Ziwei He, Guanjie Zheng, Zhouhan Lin, Xinbing Wang, Chenghu Zhou

Abstract: The sheer volume of scientific experimental results and complex technical statements, often presented in tabular formats, presents a formidable barrier to individuals acquiring preferred information. The realms of scientific reasoning and content generation that adhere to user preferences encounter distinct challenges. In this work, we present a new task for generating fluent and logical descripti… ▽ More The sheer volume of scientific experimental results and complex technical statements, often presented in tabular formats, presents a formidable barrier to individuals acquiring preferred information. The realms of scientific reasoning and content generation that adhere to user preferences encounter distinct challenges. In this work, we present a new task for generating fluent and logical descriptions that match user preferences over scientific tabular data, aiming to automate scientific document analysis. To facilitate research in this direction, we construct a new challenging dataset CTRLSciTab consisting of table-description pairs extracted from the scientific literature, with highlighted cells and corresponding domain-specific knowledge base. We evaluated popular pre-trained language models to establish a baseline and proposed a novel architecture outperforming competing approaches. The results showed that large models struggle to produce accurate content that aligns with user preferences. As the first of its kind, our work should motivate further research in scientific domains. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2312.04708 [pdf]

Integrated Design of Aluminum-Containing High-entropy Refractory B2 Alloys with Synergy of High Strength and Ductility

Authors: Jie Qi, Xuesong Fan, Diego Ibarra Hoyos, Michael Widom, Peter K. Liaw, Joseph Poon

Abstract: Refractory high-entropy alloys, RHEAs, are promising high-temperature structural materials. Their large compositional space poses great design challenges for phase control and high strength-ductility synergy. The present research pioneers using integrated high-throughput machine learning with Monte Carlo simulations to effectively navigate phase-selection and mechanical-properties predictions, dev… ▽ More Refractory high-entropy alloys, RHEAs, are promising high-temperature structural materials. Their large compositional space poses great design challenges for phase control and high strength-ductility synergy. The present research pioneers using integrated high-throughput machine learning with Monte Carlo simulations to effectively navigate phase-selection and mechanical-properties predictions, develo** aluminum-containing RHEAs in single-phase ordered B2 alloys demonstrating both high strength and ductility. These aluminum-containing RHEAs achieve remarkable mechanical properties, including compressive yield strengths up to 1.6 GPa, fracture strains exceeding 50 percent, and significant high-temperature strength retention. They also demonstrate a tensile yield strength of 1.1 GPa with a tension ductility of 6.3 percent. Besides, we identify a valence-electron-count domain for alloy brittleness with the explanation from density-functional theory and provide crucial insights into elements' influence on atomic ordering and mechanical performance. The work sets forth a strategic blueprint for high-throughput alloy design and reveals fundamental principles that govern the mechanical properties of advanced structural alloys. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.04606 [pdf, other]

Urban Region Representation Learning with Attentive Fusion

Authors: Fengze Sun, Jianzhong Qi, Yanchuan Chang, Xiaoliang Fan, Shanika Karunasekera, Egemen Tanin

Abstract: An increasing number of related urban data sources have brought forth novel opportunities for learning urban region representations, i.e., embeddings. The embeddings describe latent features of urban regions and enable discovering similar regions for urban planning applications. Existing methods learn an embedding for a region using every different type of region feature data, and subsequently fus… ▽ More An increasing number of related urban data sources have brought forth novel opportunities for learning urban region representations, i.e., embeddings. The embeddings describe latent features of urban regions and enable discovering similar regions for urban planning applications. Existing methods learn an embedding for a region using every different type of region feature data, and subsequently fuse all learned embeddings of a region to generate a unified region embedding. However, these studies often overlook the significance of the fusion process. The typical fusion methods rely on simple aggregation, such as summation and concatenation, thereby disregarding correlations within the fused region embeddings. To address this limitation, we propose a novel model named HAFusion. Our model is powered by a dual-feature attentive fusion module named DAFusion, which fuses embeddings from different region features to learn higher-order correlations between the regions as well as between the different types of region features. DAFusion is generic - it can be integrated into existing models to enhance their fusion process. Further, motivated by the effective fusion capability of an attentive module, we propose a hybrid attentive feature learning module named HALearning to enhance the embedding learning from each individual type of region features. Extensive experiments on three real-world datasets demonstrate that our model HAFusion outperforms state-of-the-art methods across three different prediction tasks. Using our learned region embedding leads to consistent and up to 31% improvements in the prediction accuracy. △ Less

Submitted 26 April, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.00331 [pdf, other]

Effects of domain walls and chiral supercurrent in quantum anomalous Hall Josephson junctions

Authors: Junjie Qi, Haiwen Liu, Jie Liu, Hua Jiang, Dong E. Liu, Chui-Zhen Chen, Ke He, X. C. Xie

Abstract: The intriguing interplay between topology and superconductivity has attracted significant attention, given its potential for realizing topological superconductivity. In this study, we investigate the transport properties of the chiral Josephson effect in the quantum anomalous Hall insulators (QAHIs)-based junction. We reveal a systematic crossover from edge-state to bulk-state dominant supercurren… ▽ More The intriguing interplay between topology and superconductivity has attracted significant attention, given its potential for realizing topological superconductivity. In this study, we investigate the transport properties of the chiral Josephson effect in the quantum anomalous Hall insulators (QAHIs)-based junction. We reveal a systematic crossover from edge-state to bulk-state dominant supercurrents, with a notable $0-π$ transition observed under non-zero magnetic flux through chemical potential adjustments. This transition underscores the competition between bulk and chiral edge transport. Furthermore, we identify an evolution among three distinct quantum interference patterns: from a $2Φ_0$-periodic oscillation pattern, to a $Φ_0$-periodic oscillation pattern, and then to an asymmetric Fraunhofer pattern ($Φ_0 = h/2e$ is the flux quantum, $h$ the Planck constant, and $e$ the electron charge). Subsequently, we examine the influence of domains on quantum interference patterns. Intriguingly, a distinctive Fraunhofer-like pattern emerges due to coexistence of chiral edge states and domain wall states, even when the chemical potential is within gap. These results not only advance the theoretical understanding but also pave the way for the experimental discovery of the chiral Josephson effect based on QAHI doped with magnetic impurities. △ Less

Submitted 14 December, 2023; v1 submitted 30 November, 2023; originally announced December 2023.

Comments: 8 pages, 8 figures

arXiv:2311.15472 [pdf, other]

Unexpected Field Evaporation Sequence in $γ$-TiAl

Authors: Jiayuwen Qi, Fei Xue, Emmanuelle Marquis, Wolfgang Windl

Abstract: In atom probe tomography (APT), atoms from the surface of a needle shape specimen are evaporated under a high electric field and analyzed via time of flight mass spectrometry and position sensitive detection. 3D reconstruction of the atom positions follows a simple projection law, which can sometimes lead to artifacts due to deviation from an assumed ideal evaporation sequence. Here, we revisit th… ▽ More In atom probe tomography (APT), atoms from the surface of a needle shape specimen are evaporated under a high electric field and analyzed via time of flight mass spectrometry and position sensitive detection. 3D reconstruction of the atom positions follows a simple projection law, which can sometimes lead to artifacts due to deviation from an assumed ideal evaporation sequence. Here, we revisit the evaporation behavior of [001]-oriented $γ$-TiAl using a full-dynamics simulation approach empowered by molecular dynamics. Without any knowledge of charge states or assumptions about evaporation fields, we successfully reproduced the lack of distinct Al and Ti layers observed in reconstructions of experimental data which is traditionally attributed to the retention of Al on the evaporating surface. We further showed that a step-wise bond breaking process of Ti in contrast to the simultaneous bond breaking of Al explains the seemingly counterintuitive preferential evaporation of the strongly bonded Ti atoms. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Showing 1–50 of 511 results for author: Qi, J