Search | arXiv e-print repository

arXiv:2407.02043 [pdf, other]

Concise and Precise Context Compression for Tool-Using Language Models

Authors: Yang Xu, Yunlong Feng, Honglin Mu, Yutai Hou, Yitong Li, Xinghao Wang, Wanjun Zhong, Zhongyang Li, Dandan Tu, Qingfu Zhu, Min Zhang, Wanxiang Che

Abstract: Through reading the documentation in the context, tool-using language models can dynamically extend their capability using external tools. The cost is that we have to input lengthy documentation every time the model needs to use the tool, occupying the input window as well as slowing down the decoding process. Given the progress in general-purpose compression, soft context compression is a suita… ▽ More Through reading the documentation in the context, tool-using language models can dynamically extend their capability using external tools. The cost is that we have to input lengthy documentation every time the model needs to use the tool, occupying the input window as well as slowing down the decoding process. Given the progress in general-purpose compression, soft context compression is a suitable approach to alleviate the problem. However, when compressing tool documentation, existing methods suffer from the weaknesses of key information loss (specifically, tool/parameter name errors) and difficulty in adjusting the length of compressed sequences based on documentation lengths. To address these problems, we propose two strategies for compressing tool documentation into concise and precise summary sequences for tool-using language models. 1) Selective compression strategy mitigates key information loss by deliberately retaining key information as raw text tokens. 2) Block compression strategy involves dividing tool documentation into short chunks and then employing a fixed-length compression model to achieve variable-length compression. This strategy facilitates the flexible adjustment of the compression ratio. Results on API-Bank and APIBench show that our approach reaches a performance comparable to the upper-bound baseline under up to 16x compression ratio. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.00569 [pdf, other]

Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models

Authors: Weihong Zhong, Xiaocheng Feng, Liang Zhao, Qiming Li, Lei Huang, Yuxuan Gu, Weitao Ma, Yuan Xu, Bing Qin

Abstract: Though advanced in understanding visual information with human languages, Large Vision-Language Models (LVLMs) still suffer from multimodal hallucinations. A natural concern is that during multimodal interaction, the generated hallucinations could influence the LVLMs' subsequent generation. Thus, we raise a question: When presented with a query relevant to the previously generated hallucination, w… ▽ More Though advanced in understanding visual information with human languages, Large Vision-Language Models (LVLMs) still suffer from multimodal hallucinations. A natural concern is that during multimodal interaction, the generated hallucinations could influence the LVLMs' subsequent generation. Thus, we raise a question: When presented with a query relevant to the previously generated hallucination, will LVLMs be misled and respond incorrectly, even though the ground visual information exists? To answer this, we propose a framework called MMHalSnowball to evaluate LVLMs' behaviors when encountering generated hallucinations, where LVLMs are required to answer specific visual questions within a curated hallucinatory conversation. Crucially, our experiment shows that the performance of open-source LVLMs drops by at least $31\%$, indicating that LVLMs are prone to accept the generated hallucinations and make false claims that they would not have supported without distractions. We term this phenomenon Multimodal Hallucination Snowballing. To mitigate this, we further propose a training-free method called Residual Visual Decoding, where we revise the output distribution of LVLMs with the one derived from the residual visual input, providing models with direct access to the visual information. Experiments show that our method can mitigate more than $24\%$ of the snowballed multimodal hallucination while maintaining capabilities. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: Accepted to ACL 2024 Main Conference. 21 pages, 20 figures

arXiv:2406.19827 [pdf, other]

Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory

Authors: Wenliang Zhong, Haoyu Tang, Qinghai Zheng, Mingzhu Xu, Yupeng Hu, Liqiang Nie

Abstract: The rapid evolution of deep learning and large language models has led to an exponential growth in the demand for training data, prompting the development of Dataset Distillation methods to address the challenges of managing large datasets. Among these, Matching Training Trajectories (MTT) has been a prominent approach, which replicates the training trajectory of an expert network on real data wit… ▽ More The rapid evolution of deep learning and large language models has led to an exponential growth in the demand for training data, prompting the development of Dataset Distillation methods to address the challenges of managing large datasets. Among these, Matching Training Trajectories (MTT) has been a prominent approach, which replicates the training trajectory of an expert network on real data with a synthetic dataset. However, our investigation found that this method suffers from three significant limitations: 1. Instability of expert trajectory generated by Stochastic Gradient Descent (SGD); 2. Low convergence speed of the distillation process; 3. High storage consumption of the expert trajectory. To address these issues, we offer a new perspective on understanding the essence of Dataset Distillation and MTT through a simple transformation of the objective function, and introduce a novel method called Matching Convexified Trajectory (MCT), which aims to provide better guidance for the student trajectory. MCT leverages insights from the linearized dynamics of Neural Tangent Kernel methods to create a convex combination of expert trajectories, guiding the student network to converge rapidly and stably. This trajectory is not only easier to store, but also enables a continuous sampling strategy during distillation, ensuring thorough learning and fitting of the entire expert trajectory. Comprehensive experiments across three public datasets validate the superiority of MCT over traditional MTT methods. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 11 pages

arXiv:2406.15796 [pdf, other]

Rethinking Entity-level Unlearning for Large Language Models

Authors: Weitao Ma, Xiaocheng Feng, Weihong Zhong, Lei Huang, Yangfan Ye, Bing Qin

Abstract: Large language model unlearning has gained increasing attention due to its potential to mitigate security and privacy concerns. Current research predominantly focuses on Instance-level unlearning, specifically aiming at forgetting predefined instances of sensitive content. However, a notable gap still exists in exploring the deletion of complete entity-related information, which is crucial in many… ▽ More Large language model unlearning has gained increasing attention due to its potential to mitigate security and privacy concerns. Current research predominantly focuses on Instance-level unlearning, specifically aiming at forgetting predefined instances of sensitive content. However, a notable gap still exists in exploring the deletion of complete entity-related information, which is crucial in many real-world scenarios, such as copyright protection. To this end, we propose a novel task of Entity-level unlearning, where the entity-related knowledge within the target model is supposed to be entirely erased. Given the challenge of practically accessing all entity-related knowledge within a model, we begin by simulating entity-level unlearning scenarios through fine-tuning models to introduce pseudo entities. Following this, we develop baseline methods inspired by trending unlearning techniques and conduct a detailed comparison of their effectiveness in this task. Extensive experiments reveal that current unlearning algorithms struggle to achieve effective entity-level unlearning. Additionally, our analyses further indicate that entity-related knowledge injected through fine-tuning is more susceptible than original entities from pre-training during unlearning, highlighting the necessity for more thorough pseudo-entity injection methods to make them closer to pre-trained knowledge. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: Work in progress

arXiv:2406.12278 [pdf, other]

Persuasion and Optimal Stop**

Authors: Andrew Koh, Sivakorn Sanguanmoo, Weijie Zhong

Abstract: We provide a unified analysis of how dynamic information should be designed in optimal stop** problems: a principal controls the flow of information about a payoff relevant state to persuade an agent to stop at the right time, in the right state, and choose the right action. We further show that for arbitrary preferences, intertemporal commitment is unnecessary: optimal dynamic information desig… ▽ More We provide a unified analysis of how dynamic information should be designed in optimal stop** problems: a principal controls the flow of information about a payoff relevant state to persuade an agent to stop at the right time, in the right state, and choose the right action. We further show that for arbitrary preferences, intertemporal commitment is unnecessary: optimal dynamic information designs can always be made revision-proof. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.08790 [pdf, ps, other]

Direct generation of multi-photon hyperentanglement

Authors: Peng Zhao, Jia-Wei Ying, Meng-Ying Yang, Wei Zhong, Ming-Ming Du, Shu-Ting Shen, Yun-Xi Li, An-Lei Zhang, Lan Zhou, Yu-Bo Sheng

Abstract: Multi-photon hyperentangement is of fundamental importance in optical quantum information processing. Existing theory and experiment producing multi-photon hyperentangled states have until now relied on the outcome post-selection, a procedure where only the measurement results corresponding to the desired state are considered. Such approach severely limits the usefulness of the resulting hyperenta… ▽ More Multi-photon hyperentangement is of fundamental importance in optical quantum information processing. Existing theory and experiment producing multi-photon hyperentangled states have until now relied on the outcome post-selection, a procedure where only the measurement results corresponding to the desired state are considered. Such approach severely limits the usefulness of the resulting hyperentangled states. We present the protocols of direct production of three- and four-photon hyperentanglement and extend the approach to an arbitrary number of photons through a straightforward cascade of spontaneous parametric down-conversion (SPDC) sources. The generated multi-photon hyperentangled states are encoded in polarization-spatial modes and polarization-time bin degrees of freedom, respectively. Numerical calculation shows that if the average photon number $μ$ is set to 1, the down conversion efficiency is $7.6*10^{-6}$ and the repetition frequency of the laser is $10^9$ Hz, the number of the generation of three-photon and four-photon hyperentanglement after cascading can reach about $5.78*10^{-2}$ and $4.44*10^{-7}$ pairs per second, respectively. By eliminating the constraints of outcome post-selection, our protocols may represent important progresses for multi-photon hyperentangement generation and providing a pivotal role in future multi-party and high-capacity communication networks. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.02987 [pdf, other]

Enhancing Multimodal Large Language Models with Multi-instance Visual Prompt Generator for Visual Representation Enrichment

Authors: Wenliang Zhong, Wenyi Wu, Qi Li, Rob Barton, Boxin Du, Shioulin Sam, Karim Bouyarmane, Ismail Tutar, Junzhou Huang

Abstract: Multimodal Large Language Models (MLLMs) have achieved SOTA performance in various visual language tasks by fusing the visual representations with LLMs leveraging some visual adapters. In this paper, we first establish that adapters using query-based Transformers such as Q-former is a simplified Multi-instance Learning method without considering instance heterogeneity/correlation. We then propose… ▽ More Multimodal Large Language Models (MLLMs) have achieved SOTA performance in various visual language tasks by fusing the visual representations with LLMs leveraging some visual adapters. In this paper, we first establish that adapters using query-based Transformers such as Q-former is a simplified Multi-instance Learning method without considering instance heterogeneity/correlation. We then propose a general component termed Multi-instance Visual Prompt Generator (MIVPG) to incorporate enriched visual representations into LLMs by taking advantage of instance correlation between images or patches for the same sample. Quantatitive evaluation on three public vision-language (VL) datasets from different scenarios shows that the proposed MIVPG improves Q-former in main VL tasks. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.20314 [pdf, ps, other]

S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs

Authors: Wei Zhong, Manasa Bharadwaj

Abstract: Speculative decoding (SD) has attracted a significant amount of research attention due to the substantial speedup it can achieve for LLM inference. However, despite the high speedups they offer, speculative decoding methods often achieve optimal performance on high-end devices or with a substantial GPU memory overhead. Given limited memory and the necessity of quantization, a high-performing model… ▽ More Speculative decoding (SD) has attracted a significant amount of research attention due to the substantial speedup it can achieve for LLM inference. However, despite the high speedups they offer, speculative decoding methods often achieve optimal performance on high-end devices or with a substantial GPU memory overhead. Given limited memory and the necessity of quantization, a high-performing model on a high-end GPU can slow down by up to 7 times. To this end, we propose Skippy Simultaneous Speculative Decoding (or S3D), a cost-effective self-speculative SD method based on simultaneous multi-token decoding and mid-layer skip**. When compared against recent effective open-source SD systems, our method has achieved one of the top performance-memory ratios while requiring minimal architecture changes and training data. Leveraging our memory efficiency, we created a smaller yet more effective SD model based on Phi-3. It is 1.4 to 2 times faster than the quantized EAGLE model and operates in half-precision while using less VRAM. △ Less

Submitted 1 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.17132 [pdf, other]

Your decision path does matter in pre-training industrial recommenders with multi-source behaviors

Authors: Chun**g Gan, Binbin Hu, Bo Huang, Ziqi Liu, Jian Ma, Zhiqiang Zhang, Wenliang Zhong, Jun Zhou

Abstract: Online service platforms offering a wide range of services through miniapps have become crucial for users who visit these platforms with clear intentions to find services they are interested in. Aiming at effective content delivery, cross-domain recommendation are introduced to learn high-quality representations by transferring behaviors from data-rich scenarios. However, these methods overlook th… ▽ More Online service platforms offering a wide range of services through miniapps have become crucial for users who visit these platforms with clear intentions to find services they are interested in. Aiming at effective content delivery, cross-domain recommendation are introduced to learn high-quality representations by transferring behaviors from data-rich scenarios. However, these methods overlook the impact of the decision path that users take when conduct behaviors, that is, users ultimately exhibit different behaviors based on various intents. To this end, we propose HIER, a novel Hierarchical decIsion path Enhanced Representation learning for cross-domain recommendation. With the help of graph neural networks for high-order topological information of the knowledge graph between multi-source behaviors, we further adaptively learn decision paths through well-designed exemplar-level and information bottleneck based contrastive learning. Extensive experiments in online and offline environments show the superiority of HIER. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16970 [pdf, ps, other]

Memory-assisted measurement-device-independent quantum secret sharing

Authors: Cheng Zhang, Qi Zhang, Wei Zhong, Ming-Ming Du, Shu-Ting Shen, Xi-Yun Li, An-Lei Zhang, Lan Zhou, Yu-Bo Sheng

Abstract: Measurement-device-independent quantum secret sharing (MDI-QSS) can eliminate all the security loopholes associated with imperfect measurement devices and greatly enhance QS's security under practical experimental condition. MDI-QSS requires each communication user to send single photon to the measurement party for the coincident measurement. However, the unsynchronization of the transmitted photo… ▽ More Measurement-device-independent quantum secret sharing (MDI-QSS) can eliminate all the security loopholes associated with imperfect measurement devices and greatly enhance QS's security under practical experimental condition. MDI-QSS requires each communication user to send single photon to the measurement party for the coincident measurement. However, the unsynchronization of the transmitted photons greatly limits MDI-QSS's practical performance.In the paper, we propose a high-efficient quantum memory (QM)-assisted MDI-QSS protocol, which employs the QM-assisted synchronization of three heralded single-photon sources to efficiently generate three simultaneous single-photon states. The QM constructed with all-optical, polarization-insensitive storage loop has superior performance in terms of bandwidth, storage efficiency, and noise resistance, and is feasible under current experiment conditions. Combining with the decoy-state method, we perform the numerical simulation of the secure key rate in the symmetric model without considering the finite-size effect. The simulation results show that our QM-assisted MDI-QSS protocol exhibit largely improved secure key rate and maximal photon transmission distance compared with all existing MDI-QSS protocols without QM. Our protocol provides a promising way for implementing the high-efficient long-distance MDI-QSS in the near future. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 11 pages, 6 figures

arXiv:2405.15600 [pdf, ps, other]

Transfer Learning for Spatial Autoregressive Models

Authors: Hao Zeng, Wei Zhong, Xingbai Xu

Abstract: The spatial autoregressive (SAR) model has been widely applied in various empirical economic studies to characterize the spatial dependence among subjects. However, the precision of estimating the SAR model diminishes when the sample size of the target data is limited. In this paper, we propose a new transfer learning framework for the SAR model to borrow the information from similar source data t… ▽ More The spatial autoregressive (SAR) model has been widely applied in various empirical economic studies to characterize the spatial dependence among subjects. However, the precision of estimating the SAR model diminishes when the sample size of the target data is limited. In this paper, we propose a new transfer learning framework for the SAR model to borrow the information from similar source data to improve both estimation and prediction. When the informative source data sets are known, we introduce a two-stage algorithm, including a transferring stage and a debiasing stage, to estimate the unknown parameters and also establish the theoretical convergence rates for the resulting estimators. If we do not know which sources to transfer, a transferable source detection algorithm is proposed to detect informative sources data based on spatial residual bootstrap to retain the necessary spatial dependence. Its detection consistency is also derived. Simulation studies demonstrate that using informative source data, our transfer learning algorithm significantly enhances the performance of the classical two-stage least squares estimator. In the empirical application, we apply our method to the election prediction in swing states in the 2020 U.S. presidential election, utilizing polling data from the 2016 U.S. presidential election along with other demographic and geographical data. The empirical results show that our method outperforms traditional estimation methods. △ Less

Submitted 19 May, 2024; originally announced May 2024.

arXiv:2405.11826 [pdf, other]

Data quality control system and long-term performance monitor of the LHAASO-KM2A

Authors: Zhen Cao, F. Aharonian, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, H. X. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen , et al. (263 additional authors not shown)

Abstract: The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To… ▽ More The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively. △ Less

Submitted 13 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: 15 pages, 9 figures

arXiv:2405.11273 [pdf, other]

Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

Authors: Yunxin Li, Shenyuan Jiang, Baotian Hu, Longyue Wang, Wanqi Zhong, Wenhan Luo, Lin Ma, Min Zhang

Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) underscore the significance of scalable models and data to boost performance, yet this often incurs substantial computational costs. Although the Mixture of Experts (MoE) architecture has been employed to efficiently scale large language and image-text models, these efforts typically involve fewer experts and limited modalities. To ad… ▽ More Recent advancements in Multimodal Large Language Models (MLLMs) underscore the significance of scalable models and data to boost performance, yet this often incurs substantial computational costs. Although the Mixture of Experts (MoE) architecture has been employed to efficiently scale large language and image-text models, these efforts typically involve fewer experts and limited modalities. To address this, our work presents the pioneering attempt to develop a unified MLLM with the MoE architecture, named Uni-MoE that can handle a wide array of modalities. Specifically, it features modality-specific encoders with connectors for a unified multimodal representation. We also implement a sparse MoE architecture within the LLMs to enable efficient training and inference through modality-level data parallelism and expert-level model parallelism. To enhance the multi-expert collaboration and generalization, we present a progressive training strategy: 1) Cross-modality alignment using various connectors with different cross-modality data, 2) Training modality-specific experts with cross-modality instruction data to activate experts' preferences, and 3) Tuning the Uni-MoE framework utilizing Low-Rank Adaptation (LoRA) on mixed multimodal instruction data. We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets. The extensive experimental results demonstrate Uni-MoE's principal advantage of significantly reducing performance bias in handling mixed multimodal datasets, alongside improved multi-expert collaboration and generalization. Our findings highlight the substantial potential of MoE frameworks in advancing MLLMs and the code is available at https://github.com/HITsz-TMG/UMOE-Scaling-Unified-Multimodal-LLMs. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: 22 pages, 13 figures. Project Website: https://uni-moe.github.io/. Working in progress

arXiv:2405.11221 [pdf, other]

Real-time equilibrium reconstruction by neural network based on HL-3 tokamak

Authors: Guohui Zheng, Songfen Liu, Zongyu Yang, Rui Ma, Xinwen Gong, Ao Wang, Shuo Wang, Wulyu Zhong

Abstract: A neural network model, EFITNN, has been developed capable of real-time magnetic equilibrium reconstruction based on HL-3 tokamak magnetic measurement signals. The model processes inputs from 68 channels of magnetic measurement data gathered from 1159 HL-3 experimental discharges, including plasma current, loop voltage, and the poloidal magnetic fields measured by equilibrium probes. The outputs o… ▽ More A neural network model, EFITNN, has been developed capable of real-time magnetic equilibrium reconstruction based on HL-3 tokamak magnetic measurement signals. The model processes inputs from 68 channels of magnetic measurement data gathered from 1159 HL-3 experimental discharges, including plasma current, loop voltage, and the poloidal magnetic fields measured by equilibrium probes. The outputs of the model feature eight key plasma parameters, alongside high-resolution ($129\times129$) reconstructions of the toroidal current density $J_{\text P}$ and poloidal magnetic flux profiles $Ψ_{rz}$. Moreover, the network's architecture employs a multi-task learning structure, which enables the sharing of weights and mutual correction among different outputs, and lead to increase the model's accuracy by up to 32%. The performance of EFITNN demonstrates remarkable consistency with the offline EFIT, achieving average $R^2 = 0.941, 0.997$ and $0.959$ for eight plasma parameters, $Ψ_{rz}$ and $J_{\text P}$, respectively. The model's robust generalization capabilities are particularly evident in its successful predictions of quasi-snowflake (QSF) divertor configurations and its adept handling of data from shot numbers or plasma current intervals not previously encountered during training. Compared to numerical methods, EFITNN significantly enhances computational efficiency with average computation time ranging from 0.08ms to 0.45ms, indicating its potential utility in real-time isoflux control and plasma profile management. △ Less

Submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.10676 [pdf, other]

Identifying L-H transition in HL-2A through deep learning

Authors: Meihuizi He, Songfen Liu, Fan Xia, Zongyu Yang, Wulyu Zhong

Abstract: During the operation of tokamak devices, addressing the thermal load issues caused by Edge Localized Modes (ELMs) eruption is crucial. Ideally, mitigation and suppression measures for ELMs should be promptly initiated as soon as the first low-to-high confinement (L-H) transition occurs, which necessitates the real-time monitoring and accurate identification of the L-H transition process. Motivated… ▽ More During the operation of tokamak devices, addressing the thermal load issues caused by Edge Localized Modes (ELMs) eruption is crucial. Ideally, mitigation and suppression measures for ELMs should be promptly initiated as soon as the first low-to-high confinement (L-H) transition occurs, which necessitates the real-time monitoring and accurate identification of the L-H transition process. Motivated by this, and by recent deep learning boom, we propose a deep learning-based L-H transition identification algorithm on HL-2A tokamak. In this work, we have constructed a neural network comprising layers of Residual Long Short-Term Memory (LSTM) and Temporal Convolutional Network (TCN). Unlike previous work based on recognition for ELMs by slice, this method implements recognition on L-H transition process before the first ELMs crash. Therefore the mitigation techniques can be triggered in time to suppress the initial ELMs bursts. In order to further explain the effectiveness of the algorithm, we developed a series of evaluation indicators by shots, and the results show that this algorithm can provide necessary reference for the mitigation and suppression system. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.05739 [pdf]

Preliminary Exploration on the Low-Pressure Ar-O2 Plasma Generated by Low-Frequency Alternating Current (AC) Power Supply

Authors: Niaz Wali, W. W. Xiao, Q. U. Din, N. U. Rehman, C. Y. Wang, J. T. Ma, W. J. Zhong, Q. W. Yang

Abstract: This study reports a low-frequency alternating current (AC) power supply as a novel approach for generating low-pressure capacitively coupled Ar-O2 plasma, offering advantages in cost, compactness, and operational simplicity, which are crucial for both material science and biological applications. The effectiveness of low-frequency AC-generated plasma against traditional RF systems by examining ke… ▽ More This study reports a low-frequency alternating current (AC) power supply as a novel approach for generating low-pressure capacitively coupled Ar-O2 plasma, offering advantages in cost, compactness, and operational simplicity, which are crucial for both material science and biological applications. The effectiveness of low-frequency AC-generated plasma against traditional RF systems by examining key plasma parameters such as electron density, electron temperature, and electron energy distribution function (EEDF), are investigated. Experimental results revealed that AC power supply could effectively produce low pressure Ar-O2 plasma with comparable properties to RF systems. Most notably, the AC-generated plasma achieved a significant reduction in bacterial growth, suggesting its potential as a more economical and flexible alternative for enhancing plasma-assisted applications in sterilization and material processing. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 16 pages, 7 figures

arXiv:2404.13646 [pdf, other]

Physics-informed Mesh-independent Deep Compositional Operator Network

Authors: Weiheng Zhong, Hadi Meidani

Abstract: Solving parametric Partial Differential Equations (PDEs) for a broad range of parameters is a critical challenge in scientific computing. To this end, neural operators, which learn map**s from parameters to solutions, have been successfully used. However, the training of neural operators typically demands large training datasets, the acquisition of which can be prohibitively expensive. To addres… ▽ More Solving parametric Partial Differential Equations (PDEs) for a broad range of parameters is a critical challenge in scientific computing. To this end, neural operators, which learn map**s from parameters to solutions, have been successfully used. However, the training of neural operators typically demands large training datasets, the acquisition of which can be prohibitively expensive. To address this challenge, physics-informed training can offer a cost-effective strategy. However, current physics-informed neural operators face limitations, either in handling irregular domain shapes or in generalization to various discretizations of PDE parameters with variable mesh sizes. In this research, we introduce a novel physics-informed model architecture which can generalize to parameter discretizations of variable size and irregular domain shapes. Particularly, inspired by deep operator neural networks, our model involves a discretization-independent learning of parameter embedding repeatedly, and this parameter embedding is integrated with the response embeddings through multiple compositional layers, for more expressivity. Numerical results demonstrate the accuracy and efficiency of the proposed method. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.12602 [pdf]

A visualization method for data domain changes in CNN networks and the optimization method for selecting thresholds in classification tasks

Authors: Minzhe Huang, Changwei Nie, Weihong Zhong

Abstract: In recent years, Face Anti-Spoofing (FAS) has played a crucial role in preserving the security of face recognition technology. With the rise of counterfeit face generation techniques, the challenge posed by digitally edited faces to face anti-spoofing is escalating. Existing FAS technologies primarily focus on intercepting physically forged faces and lack a robust solution for cross-domain FAS cha… ▽ More In recent years, Face Anti-Spoofing (FAS) has played a crucial role in preserving the security of face recognition technology. With the rise of counterfeit face generation techniques, the challenge posed by digitally edited faces to face anti-spoofing is escalating. Existing FAS technologies primarily focus on intercepting physically forged faces and lack a robust solution for cross-domain FAS challenges. Moreover, determining an appropriate threshold to achieve optimal deployment results remains an issue for intra-domain FAS. To address these issues, we propose a visualization method that intuitively reflects the training outcomes of models by visualizing the prediction results on datasets. Additionally, we demonstrate that employing data augmentation techniques, such as downsampling and Gaussian blur, can effectively enhance performance on cross-domain tasks. Building upon our data visualization approach, we also introduce a methodology for setting threshold values based on the distribution of the training dataset. Ultimately, our methods secured us second place in both the Unified Physical-Digital Face Attack Detection competition and the Snapshot Spectral Imaging Face Anti-spoofing contest. The training code is available at https://github.com/SeaRecluse/CVPRW2024. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.03333 [pdf, other]

The Impact-driven Atmospheric Loss of Super-Earths around Different Spectral Type Host Stars

Authors: Wei Zhong, Cong Yu, Shi Jia, Shang-Fei Liu

Abstract: The planet's mass loss is important for the planet's formation and evolution. The radius valley (RV) is believed to be triggered by evaporation-induced mass loss. As an alternative mechanism for the RV, the mass loss of post-impact planets is thoroughly investigated in this work. The impact energy is converted to the planet's internal energy, enhancing its core energy and accelerating mass loss an… ▽ More The planet's mass loss is important for the planet's formation and evolution. The radius valley (RV) is believed to be triggered by evaporation-induced mass loss. As an alternative mechanism for the RV, the mass loss of post-impact planets is thoroughly investigated in this work. The impact energy is converted to the planet's internal energy, enhancing its core energy and accelerating mass loss and orbital migration. As the host star changes from K-type to F-type, the planet's mass loss and orbital migration increase. When the initial gas-to-core mass ratio (GCR) is small, the migration efficiency for planets around K-type stars will increase, which helps to suppress mass loss and retain the planet's mass and radius within a specific range. On the contrary, planets around more massive F-type stars experience more substantial mass loss, potentially leading to complete mass loss, and migrate to orbits with longer periods. Our calculation shows that planets around different spectral types of host stars give rise to an RV ranging from 1.3-2.0 $R_{\oplus}$, consistent with the observed range of 1.3-2.6 $R_{\oplus}$. Despite the presence of uncertain parameters, the planetesimal impact can promote the RV establishment for planets around host stars of different spectral types. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: 19 pages, 12 figures; ApJ Accepted

arXiv:2404.01735 [pdf, other]

CIRP: Cross-Item Relational Pre-training for Multimodal Product Bundling

Authors: Yunshan Ma, Yingzhi He, Wenjun Zhong, Xiang Wang, Roger Zimmermann, Tat-Seng Chua

Abstract: Product bundling has been a prevailing marketing strategy that is beneficial in the online shop** scenario. Effective product bundling methods depend on high-quality item representations, which need to capture both the individual items' semantics and cross-item relations. However, previous item representation learning methods, either feature fusion or graph learning, suffer from inadequate cross… ▽ More Product bundling has been a prevailing marketing strategy that is beneficial in the online shop** scenario. Effective product bundling methods depend on high-quality item representations, which need to capture both the individual items' semantics and cross-item relations. However, previous item representation learning methods, either feature fusion or graph learning, suffer from inadequate cross-modal alignment and struggle to capture the cross-item relations for cold-start items. Multimodal pre-train models could be the potential solutions given their promising performance on various multimodal downstream tasks. However, the cross-item relations have been under-explored in the current multimodal pre-train models. To bridge this gap, we propose a novel and simple framework Cross-Item Relational Pre-training (CIRP) for item representation learning in product bundling. Specifically, we employ a multimodal encoder to generate image and text representations. Then we leverage both the cross-item contrastive loss (CIC) and individual item's image-text contrastive loss (ITC) as the pre-train objectives. Our method seeks to integrate cross-item relation modeling capability into the multimodal encoder, while preserving the in-depth aligned multimodal semantics. Therefore, even for cold-start items that have no relations, their representations are still relation-aware. Furthermore, to eliminate the potential noise and reduce the computational cost, we harness a relation pruning module to remove the noisy and redundant relations. We apply the item representations extracted by CIRP to the product bundling model ItemKNN, and experiments on three e-commerce datasets demonstrate that CIRP outperforms various leading representation learning methods. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: arXiv preprint, 10 pages, 4 figures, 6 tables

ACM Class: H.3.0

arXiv:2403.17313 [pdf, other]

doi 10.1051/0004-6361/202348989

The Physical Origin of the Mass-Size Relation and Its Scatter of Disk Galaxies

Authors: Min Du, Hong-Chuan Ma, Wen-Yu Zhong, Luis C. Ho, Shihong Liao, Yingjie Peng

Abstract: Utilizing a kinematic decomposition of simulated galaxies, we focus on galaxies with tiny kinematically inferred stellar halos, indicative of weak external influences. We investigate the intricate interplay between internal (natural) and external (nurture) processes in sha** the scaling relationships of specific angular momentum ($j_\star$), stellar mass ($M_\star$), and size of disk galaxies wi… ▽ More Utilizing a kinematic decomposition of simulated galaxies, we focus on galaxies with tiny kinematically inferred stellar halos, indicative of weak external influences. We investigate the intricate interplay between internal (natural) and external (nurture) processes in sha** the scaling relationships of specific angular momentum ($j_\star$), stellar mass ($M_\star$), and size of disk galaxies within the IllustrisTNG simulation. The correlation among mass, size, and angular momentum of galaxies is examined by comparing simulations with observations and the theoretical predictions of the exponential hypothesis. Galaxies with tiny stellar halos exhibit a large scatter in the $j_\star$-$M_\star$ relation, which suggests that it is inherently present in their initial conditions. The analysis reveals that the disks of these galaxies adhere to the exponential hypothesis, resulting in a tight fiducial $j_\star$-$M_\star$-scale length (size) relation that is qualitatively consistent with observations. The inherent scatter in $j_\star$ provides a robust explanation for the mass-size relation and its substantial variability. Notably, galaxies that are moderately influenced by external processes closely adhere to a scaling relation akin to that of galaxies with tiny stellar halos. This result underscores the dominant role of internal processes in sha** the overall $j_\star$-$M_\star$ and mass-size relation, with external effects playing a relatively minor role in disk galaxies. Furthermore, the correlation between galaxy size and the virial radius of the dark matter halo exists but fails to provide strong evidence of the connection between galaxies and their parent dark matter halos. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 11 pages, 9 figures, accepted for publication in A&A

Journal ref: A&A 686, A168 (2024)

arXiv:2403.12524 [pdf, other]

Search for GeV gamma-ray emission from the possible TeV-bright red dwarfs with Fermi-LAT

Authors: Chen Huang, Xiao Zhang, Yang Chen, Wen-Juan Zhong

Abstract: Red dwarfs have been suggested to be among the possible astrophysical species accelerating particles and emitting TeV $γ$-rays. As an effort to search for the GeV $γ$-ray counterparts of the suggested TeV emission from eight red dwarfs, we analyse the 0.2--500 GeV $γ$-ray emission of the regions covering them exploiting the $\sim$13.6 yr Pass 8 data of the Fermi Large Area Telescope. A GeV $γ$-ray… ▽ More Red dwarfs have been suggested to be among the possible astrophysical species accelerating particles and emitting TeV $γ$-rays. As an effort to search for the GeV $γ$-ray counterparts of the suggested TeV emission from eight red dwarfs, we analyse the 0.2--500 GeV $γ$-ray emission of the regions covering them exploiting the $\sim$13.6 yr Pass 8 data of the Fermi Large Area Telescope. A GeV $γ$-ray emission excess with significance of 3.8$σ$ is detected in the direction of the red dwarf V962 Tau. This emission contains V962 Tau in 1$σ$ error radius and is independent of the catalog source. However, the stellar flare scenario can hardly explain the total energy and lightcurve derived from the $γ$-ray emission in view of the spectral analysis. We also analyse the lightcurves in the positions of the eight red dwarfs and no time bin with significance $>$5$σ$ is found. Therefore, no significant emission from the red dwarfs could be concluded to be detected by Fermi-LAT. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: 11 pages, 6 figures

arXiv:2403.11880 [pdf, other]

Topological edge modes and phase transition in the critical fermionic chain with long-range interaction

Authors: Wen-Hao Zhong, Wei-Lin Li, Yong-Chang Chen, Xue-Jia Yu

Abstract: The long-range interaction can fundamentally alter properties in gapped topological phases such as emergent massive edge modes. However, recent research has shifted attention to topological nontrivial critical points or phases, and it is natural to explore how long-range interaction influences them. In this work, we investigate the topological behavior and phase transition of extended Kitaev chain… ▽ More The long-range interaction can fundamentally alter properties in gapped topological phases such as emergent massive edge modes. However, recent research has shifted attention to topological nontrivial critical points or phases, and it is natural to explore how long-range interaction influences them. In this work, we investigate the topological behavior and phase transition of extended Kitaev chains with long-range interactions, which can be derived from the critical Ising model via the Jordan-Wigner transformation in the short-range limit. Specifically, we analytically find the critical edge modes at the critical point remain stable against long-range interaction. More importantly, we observe these critical edge modes remain massless even when long-range interactions become substantially strong. As a byproduct, we numerically find that the critical behavior of the long-range model belongs to the free Majorana fermion universality class, which is entirely different from the long-range universality class in usual long-range spin models. Our work could shed new light on the interplay between long-range interactions (frustrated) and the gapless topological phases of matter. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 13 pages, 11 figures. Any comments or suggestions are welcome !

arXiv:2403.11211 [pdf]

RCdpia: A Renal Carcinoma Digital Pathology Image Annotation dataset based on pathologists

Authors: Qingrong Sun, Weixiang Zhong, Jie Zhou, Chong Lai, Xiaodong Teng, Maode Lai

Abstract: The annotation of digital pathological slide data for renal cell carcinoma is of paramount importance for correct diagnosis of artificial intelligence models due to the heterogeneous nature of the tumor. This process not only facilitates a deeper understanding of renal cell cancer heterogeneity but also aims to minimize noise in the data for more accurate studies. To enhance the applicability of t… ▽ More The annotation of digital pathological slide data for renal cell carcinoma is of paramount importance for correct diagnosis of artificial intelligence models due to the heterogeneous nature of the tumor. This process not only facilitates a deeper understanding of renal cell cancer heterogeneity but also aims to minimize noise in the data for more accurate studies. To enhance the applicability of the data, two pathologists were enlisted to meticulously curate, screen, and label a kidney cancer pathology image dataset from The Cancer Genome Atlas Program (TCGA) database. Subsequently, a Resnet model was developed to validate the annotated dataset against an additional dataset from the First Affiliated Hospital of Zhejiang University. Based on these results, we have meticulously compiled the TCGA digital pathological dataset with independent labeling of tumor regions and adjacent areas (RCdpia), which includes 109 cases of kidney chromophobe cell carcinoma, 486 cases of kidney clear cell carcinoma, and 292 cases of kidney papillary cell carcinoma. This dataset is now publicly accessible at http://39.171.241.18:8888/RCdpia/. Furthermore, model analysis has revealed significant discrepancies in predictive outcomes when applying the same model to datasets from different centers. Leveraging the RCdpia, we can now develop more precise digital pathology artificial intelligence models for tasks such as normalization, classification, and segmentation. These advancements underscore the potential for more nuanced and accurate AI applications in the field of digital pathology. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: 8 pages, 3 figures, 1 table

arXiv:2403.10137 [pdf, ps, other]

Device-independent quantum secret sharing with noise pre-processing and post-selection

Authors: Qi Zhang, Wei Zhong, Ming-Ming Du, Shu-Ting Shen, Xi-Yun Li, An-Lei Zhang, Lan Zhou, Yu-Bo Sheng

Abstract: Quantum secret sharing (QSS) is a fundamental quantum secure communication primitive, which enables a dealer to distribute secret keys to a set of players. Device-independent (DI) QSS can relax the security assumptions about the devices' internal working, and effectively enhance QSS's security under practical experimental conditions. Here, we propose a DI-QSS protocol based on Greenberger-Horne-Ze… ▽ More Quantum secret sharing (QSS) is a fundamental quantum secure communication primitive, which enables a dealer to distribute secret keys to a set of players. Device-independent (DI) QSS can relax the security assumptions about the devices' internal working, and effectively enhance QSS's security under practical experimental conditions. Here, we propose a DI-QSS protocol based on Greenberger-Horne-Zeilinger state, which guarantees the security of keys by the observation of the data conclusively violating the Svetlichny inequality. We estimate the performance of our DI-QSS protocol in practical noisy communication scenarios by simulating its key generation rate, noise tolerance threshold, and detection efficiency threshold. Moreover, some active improvement strategies, such as the noise pre-processing strategy and post-selection strategy are introduced into the DI-QSS protocol, which can increase its noise tolerance threshold from 7.148% to 8.072%, and reduce the detection efficiency threshold from 96.32% to 94.30%. It indicates that the adoption of the active improvement strategies can enhance DI-QSS's robustness against the noise and photon loss, which can reduce the experimental difficulty and promote DI-QSS's experimental realization. Our work may be a promising guidance for future DI-QSS's experiments. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: 13 pages, 6 figures

arXiv:2403.09957 [pdf, other]

Suppression of Star Formation in Galaxy Pairs

Authors: Shuai Feng, Shi-Yin Shen, Fang-Ting Yuan, Wen-Xin Zhong, Wen-Yuan Cui, Lin-Lin Li

Abstract: We investigate the suppression of star formation in galaxy pairs based on the isolated galaxy pair sample derived from the SDSS survey. By comparing the star formation rate between late-type galaxies in galaxy pairs and those in the isolated environment, we detect the signal of star formation suppression in galaxy pairs at $d_p < 100$kpc and $200$kpc$ < d_p < 350$kpc. The occurrence of star format… ▽ More We investigate the suppression of star formation in galaxy pairs based on the isolated galaxy pair sample derived from the SDSS survey. By comparing the star formation rate between late-type galaxies in galaxy pairs and those in the isolated environment, we detect the signal of star formation suppression in galaxy pairs at $d_p < 100$kpc and $200$kpc$ < d_p < 350$kpc. The occurrence of star formation suppression in these late-type galaxies requires their companion galaxies to have an early-type morphology ($n_s > 2.5$). Star formation suppression in wide galaxy pairs with $200$kpc$ < d_p < 350$kpc mainly occurs in massive late-type galaxies, while in close galaxy pairs with $d_p < 100$kpc, it only appears in late-type galaxies with a massive companion ( $\log M_\star > 11.0$), nearly independent of their own stellar mass. Based on these findings, we infer that star formation suppression in wide galaxy pairs is actually a result of galaxy conformity, while in close galaxy pairs, it stems from the influence of hot circum-galactic medium surrounding companion galaxies. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: 10 pages, 4 figures, accepted for publication in ApJ

arXiv:2403.08967 [pdf, other]

PathM3: A Multimodal Multi-Task Multiple Instance Learning Framework for Whole Slide Image Classification and Captioning

Authors: Qifeng Zhou, Wenliang Zhong, Yuzhi Guo, Michael Xiao, Hehuan Ma, Junzhou Huang

Abstract: In the field of computational histopathology, both whole slide images (WSIs) and diagnostic captions provide valuable insights for making diagnostic decisions. However, aligning WSIs with diagnostic captions presents a significant challenge. This difficulty arises from two main factors: 1) Gigapixel WSIs are unsuitable for direct input into deep learning models, and the redundancy and correlation… ▽ More In the field of computational histopathology, both whole slide images (WSIs) and diagnostic captions provide valuable insights for making diagnostic decisions. However, aligning WSIs with diagnostic captions presents a significant challenge. This difficulty arises from two main factors: 1) Gigapixel WSIs are unsuitable for direct input into deep learning models, and the redundancy and correlation among the patches demand more attention; and 2) Authentic WSI diagnostic captions are extremely limited, making it difficult to train an effective model. To overcome these obstacles, we present PathM3, a multimodal, multi-task, multiple instance learning (MIL) framework for WSI classification and captioning. PathM3 adapts a query-based transformer to effectively align WSIs with diagnostic captions. Given that histopathology visual patterns are redundantly distributed across WSIs, we aggregate each patch feature with MIL method that considers the correlations among instances. Furthermore, our PathM3 overcomes data scarcity in WSI-level captions by leveraging limited WSI diagnostic caption data in the manner of multi-task joint learning. Extensive experiments with improved classification accuracy and caption generation demonstrate the effectiveness of our method on both WSI classification and captioning task. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.05890 [pdf, other]

Towards Efficient Replay in Federated Incremental Learning

Authors: Yichen Li, Qunwei Li, Haozhao Wang, Ruixuan Li, Wenliang Zhong, Guannan Zhang

Abstract: In Federated Learning (FL), the data in each client is typically assumed fixed or static. However, data often comes in an incremental manner in real-world applications, where the data domain may increase dynamically. In this work, we study catastrophic forgetting with data heterogeneity in Federated Incremental Learning (FIL) scenarios where edge clients may lack enough storage space to retain ful… ▽ More In Federated Learning (FL), the data in each client is typically assumed fixed or static. However, data often comes in an incremental manner in real-world applications, where the data domain may increase dynamically. In this work, we study catastrophic forgetting with data heterogeneity in Federated Incremental Learning (FIL) scenarios where edge clients may lack enough storage space to retain full data. We propose to employ a simple, generic framework for FIL named Re-Fed, which can coordinate each client to cache important samples for replay. More specifically, when a new task arrives, each client first caches selected previous samples based on their global and local importance. Then, the client trains the local model with both the cached samples and the samples from the new task. Theoretically, we analyze the ability of Re-Fed to discover important samples for replay thus alleviating the catastrophic forgetting problem. Moreover, we empirically show that Re-Fed achieves competitive performance compared to state-of-the-art methods. △ Less

Submitted 3 June, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

arXiv:2403.03514 [pdf, other]

CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models

Authors: Zexuan Qiu, **g**g Li, Shijue Huang, Wanjun Zhong, Irwin King

Abstract: Develo** Large Language Models (LLMs) with robust long-context capabilities has been the recent research focus, resulting in the emergence of long-context LLMs proficient in Chinese. However, the evaluation of these models remains underdeveloped due to a lack of benchmarks. To address this gap, we present CLongEval, a comprehensive Chinese benchmark for evaluating long-context LLMs. CLongEval is… ▽ More Develo** Large Language Models (LLMs) with robust long-context capabilities has been the recent research focus, resulting in the emergence of long-context LLMs proficient in Chinese. However, the evaluation of these models remains underdeveloped due to a lack of benchmarks. To address this gap, we present CLongEval, a comprehensive Chinese benchmark for evaluating long-context LLMs. CLongEval is characterized by three key features: (1) Sufficient data volume, comprising 7 distinct tasks and 7,267 examples; (2) Broad applicability, accommodating to models with context windows size from 1K to 100K; (3) High quality, with over 2,000 manually annotated question-answer pairs in addition to the automatically constructed labels. With CLongEval, we undertake a comprehensive assessment of 6 open-source long-context LLMs and 2 leading commercial counterparts that feature both long-context abilities and proficiency in Chinese. We also provide in-depth analysis based on the empirical results, trying to shed light on the critical capabilities that present challenges in long-context settings. The dataset, evaluation scripts, and model outputs will be released. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 19 pages, 4 figures

arXiv:2403.01766 [pdf, other]

doi 10.1145/3610978.3640648

Improving Visual Perception of a Social Robot for Controlled and In-the-wild Human-robot Interaction

Authors: Wangjie Zhong, Leimin Tian, Duy Tho Le, Hamid Rezatofighi

Abstract: Social robots often rely on visual perception to understand their users and the environment. Recent advancements in data-driven approaches for computer vision have demonstrated great potentials for applying deep-learning models to enhance a social robot's visual perception. However, the high computational demands of deep-learning methods, as opposed to the more resource-efficient shallow-learning… ▽ More Social robots often rely on visual perception to understand their users and the environment. Recent advancements in data-driven approaches for computer vision have demonstrated great potentials for applying deep-learning models to enhance a social robot's visual perception. However, the high computational demands of deep-learning methods, as opposed to the more resource-efficient shallow-learning models, bring up important questions regarding their effects on real-world interaction and user experience. It is unclear how will the objective interaction performance and subjective user experience be influenced when a social robot adopts a deep-learning based visual perception model. We employed state-of-the-art human perception and tracking models to improve the visual perception function of the Pepper robot and conducted a controlled lab study and an in-the-wild human-robot interaction study to evaluate this novel perception function for following a specific user with other people present in the scene. △ Less

Submitted 5 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: accepted to HRI 2024 (LBR track)

arXiv:2402.16427 [pdf]

Electronic phase transitions and superconductivity in ferroelectric Sn$_2$P$_2$Se$_6$ under pressure

Authors: He Zhang, Wei Zhong, Xiaohui Yu, Binbin Yue, Fang Hong

Abstract: Since there is both strong electron-phonon coupling during a ferroelectric/FE transition and superconducting/SC transition, it has been an important topic to explore superconductivity from the FE instability. Sn$_2$P$_2$Se$_6$ arouses broad attention due to its unique FE properties. Here, we reported the electronic phase transitions and superconductivity in this compound based on high-pressure ele… ▽ More Since there is both strong electron-phonon coupling during a ferroelectric/FE transition and superconducting/SC transition, it has been an important topic to explore superconductivity from the FE instability. Sn$_2$P$_2$Se$_6$ arouses broad attention due to its unique FE properties. Here, we reported the electronic phase transitions and superconductivity in this compound based on high-pressure electrical transport measurement, optical absorption spectroscopy and Raman based structural analysis. Upon compression, the conductivity of Sn$_2$P$_2$Se$_6$ was elevated monotonously, an electronic phase transition occurred near 5.4 GPa, revealed by optical absorption spectroscopy, and the insulating state is estimated to be fully suppressed near 15 GPa. Then, it started to show the signature of superconductivity near 15.3 GPa. The zero-resistance state was presented from 19.4 GPa, and the superconductivity was enhanced with pressure continuously. The magnetic field effect further confirmed the SC behavior and this compound had a $T_c$ of 5.4 K at 41.8 GPa with a zero temperature upper critical field of 6.55 T. The Raman spectra confirmed the structural origin of the electronic transition near 5.4 GPa, which should due to the transition from the paraelectric phase to the incommensurate phase, and suggested a possible first-order phase transition when the sample underwent the semiconductor-metal transition near 15 GPa. This work demonstrates the versatile physical properties in ferroelectrics and inspires the further investigation on the correlation between FE instability and SC in M$_2$P$_2$X$_6$ family. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: 13 pages, 5 figures

arXiv:2402.16288 [pdf, other]

PerLTQA: A Personal Long-Term Memory Dataset for Memory Classification, Retrieval, and Synthesis in Question Answering

Authors: Yiming Du, Hongru Wang, Zhengyi Zhao, Bin Liang, Baojun Wang, Wanjun Zhong, Zezhong Wang, Kam-Fai Wong

Abstract: Long-term memory plays a critical role in personal interaction, considering long-term memory can better leverage world knowledge, historical information, and preferences in dialogues. Our research introduces PerLTQA, an innovative QA dataset that combines semantic and episodic memories, including world knowledge, profiles, social relationships, events, and dialogues. This dataset is collected to i… ▽ More Long-term memory plays a critical role in personal interaction, considering long-term memory can better leverage world knowledge, historical information, and preferences in dialogues. Our research introduces PerLTQA, an innovative QA dataset that combines semantic and episodic memories, including world knowledge, profiles, social relationships, events, and dialogues. This dataset is collected to investigate the use of personalized memories, focusing on social interactions and events in the QA task. PerLTQA features two types of memory and a comprehensive benchmark of 8,593 questions for 30 characters, facilitating the exploration and application of personalized memories in Large Language Models (LLMs). Based on PerLTQA, we propose a novel framework for memory integration and generation, consisting of three main components: Memory Classification, Memory Retrieval, and Memory Synthesis. We evaluate this framework using five LLMs and three retrievers. Experimental results demonstrate that BERT-based classification models significantly outperform LLMs such as ChatGLM3 and ChatGPT in the memory classification task. Furthermore, our study highlights the importance of effective memory integration in the QA task. △ Less

Submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.11905 [pdf, other]

Learning to Edit: Aligning LLMs with Knowledge Editing

Authors: Yuxin Jiang, Yufei Wang, Chuhan Wu, Wanjun Zhong, Xingshan Zeng, Jiahui Gao, Liangyou Li, Xin Jiang, Lifeng Shang, Ruiming Tang, Qun Liu, Wei Wang

Abstract: Knowledge editing techniques, aiming to efficiently modify a minor proportion of knowledge in large language models (LLMs) without negatively impacting performance across other inputs, have garnered widespread attention. However, existing methods predominantly rely on memorizing the updated knowledge, impeding LLMs from effectively combining the new knowledge with their inherent knowledge when ans… ▽ More Knowledge editing techniques, aiming to efficiently modify a minor proportion of knowledge in large language models (LLMs) without negatively impacting performance across other inputs, have garnered widespread attention. However, existing methods predominantly rely on memorizing the updated knowledge, impeding LLMs from effectively combining the new knowledge with their inherent knowledge when answering questions. To this end, we propose a Learning to Edit (LTE) framework, focusing on teaching LLMs to apply updated knowledge into input questions, inspired by the philosophy of "Teach a man to fish." LTE features a two-phase process: (i) the Alignment Phase, which fine-tunes LLMs on a meticulously curated parallel dataset to make reliable, in-scope edits while preserving out-of-scope information and linguistic proficiency; and (ii) the Inference Phase, which employs a retrieval-based mechanism for real-time and mass knowledge editing. By comparing our approach with seven advanced baselines across four popular knowledge editing benchmarks and two LLM architectures, we demonstrate LTE's superiority in knowledge editing performance, robustness in both batch and sequential editing, minimal interference on general tasks, and rapid editing speeds. The data and code are available at https://github.com/YJiangcm/LTE. △ Less

Submitted 5 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: 17 pages, 8 figures, 9 tables. ACL 2024 main camera-ready version

arXiv:2402.02718 [pdf, other]

Denoising Time Cycle Modeling for Recommendation

Authors: Sicong Xie, Qunwei Li, Weidi Xu, Kaiming Shen, Shaohu Chen, Wenliang Zhong

Abstract: Recently, modeling temporal patterns of user-item interactions have attracted much attention in recommender systems. We argue that existing methods ignore the variety of temporal patterns of user behaviors. We define the subset of user behaviors that are irrelevant to the target item as noises, which limits the performance of target-related time cycle modeling and affect the recommendation perform… ▽ More Recently, modeling temporal patterns of user-item interactions have attracted much attention in recommender systems. We argue that existing methods ignore the variety of temporal patterns of user behaviors. We define the subset of user behaviors that are irrelevant to the target item as noises, which limits the performance of target-related time cycle modeling and affect the recommendation performance. In this paper, we propose Denoising Time Cycle Modeling (DiCycle), a novel approach to denoise user behaviors and select the subset of user behaviors that are highly related to the target item. DiCycle is able to explicitly model diverse time cycle patterns for recommendation. Extensive experiments are conducted on both public benchmarks and a real-world dataset, demonstrating the superior performance of DiCycle over the state-of-the-art recommendation methods. △ Less

Submitted 4 February, 2024; originally announced February 2024.

arXiv:2402.02709 [pdf, ps, other]

Passive decoy-state quantum secure direct communication with heralded single-photon source

Authors: Jia-Wei Ying, Peng Zhao, Wei Zhong, Ming-Ming Du, Xi-Yun Li, Shu-Ting Shen, An-Lei Zhang, Lan Zhou, Yu-Bo Sheng

Abstract: Quantum secure direct communications (QSDC) can directly transmit secret messages through quantum channel without keys. The imperfect photon source is a major obstacle for QSDC's practical implementation. The unwanted vacuum state and multi-photon components emitted from imperfect photon source largely reduce QSDC's secrecy message capacity and even threaten its security. In the paper, we propose… ▽ More Quantum secure direct communications (QSDC) can directly transmit secret messages through quantum channel without keys. The imperfect photon source is a major obstacle for QSDC's practical implementation. The unwanted vacuum state and multi-photon components emitted from imperfect photon source largely reduce QSDC's secrecy message capacity and even threaten its security. In the paper, we propose a high-efficient passive decoy-state QSDC protocol with the heralded single-photon source (HSPS). We adopt a spontaneous parametric down-conversion source to emit entangled photon pairs in two spatial modes. By detecting the photons in one of the two correlated spatial modes, we can infer the photon number distribution of the other spatial mode. Meanwhile, our protocol allows a simple passive preparation of the signal states and decoy state. The HSPS can effectively reduce the probability of vacuum state and increase QSDC's secrecy message capacity. Meanwhile, the passive decoy-state method can simplify the experimental operations and enhance QSDC's robustness against the third-party side-channel attacks. Under the communication distance of 10 km, the secrecy message capacity of our QSDC protocol can achieve 81.85 times (average photon number of 0.1) and 12.79 times (average photon number of 0.01) of that in the original single-photon-based QSDC protocol without the HSPS. Our QSDC protocol has longer maximal communication distance (about 17.975 km with average photon number of 0.01). Our work serves as a major step toward the further development of practical passive decoy-state QSDC systems. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: 11 pages, 3 figures

arXiv:2402.00578 [pdf, other]

Discovery and timing of pulsar J2016$+$3711 in supernova remnant CTB 87 with FAST

Authors: Qian-Cheng Liu, Wen-Juan Zhong, Yang Chen, Pei Wang, ** Zhou, You-Ling Yue, Di Li

Abstract: We report on our discovery of the radio pulsar, PSR J2016$+$3711, in supernova remnant (SNR) CTB 87, with a $\sim10.8σ$ significance of pulses, which confirms the compact nature of the X-ray point source in CTB 87. It is the first pulsar discovered in SNRs using Five-hundred-meter Aperture Spherical radio Telescope (FAST). Its integrated radio pulse profile can be well described by a single compon… ▽ More We report on our discovery of the radio pulsar, PSR J2016$+$3711, in supernova remnant (SNR) CTB 87, with a $\sim10.8σ$ significance of pulses, which confirms the compact nature of the X-ray point source in CTB 87. It is the first pulsar discovered in SNRs using Five-hundred-meter Aperture Spherical radio Telescope (FAST). Its integrated radio pulse profile can be well described by a single component, with a width at 50% of the peak flux density of about 28.1$^\circ$ and an effective width of about 32.2$^\circ$. The mean flux density at 1.25 GHz is estimated to be about 15.5$μ$Jy. Combined with the non-detection of the radio pulse at lower frequencies, the radio spectral index of the pulsar is constrained to be $\lesssim 2.3$. We also present the timing solution based on 28 follow-up FAST observations. Our results reveal a period of 50.81 ms, period derivative of $7.2\times 10^{-14}$ s s$^{-1}$, and dispersion measure of 428 pc cm$^{-3}$. The strength of the equatorial surface magnetic dipole magnetic field is inferred to be about $1.9\times10^{12}$ G. Using the ephemeris obtained from the radio observations, we searched Fermi-LAT data for gamma-ray pulsations but detected no pulsed signal. We also searched for radio pulses with FAST toward the X-ray counterpart of the gamma-ray binary HESS J1832$-$093 proximate to SNR G22.7$-$00.2 but found no signal. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: 7 pages, 5 figures, accepted for publication in MNRAS

arXiv:2401.17167 [pdf, other]

Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios

Authors: Shijue Huang, Wanjun Zhong, Jianqiao Lu, Qi Zhu, Jiahui Gao, Weiwen Liu, Yutai Hou, Xingshan Zeng, Yasheng Wang, Lifeng Shang, Xin Jiang, Ruifeng Xu, Qun Liu

Abstract: The recent trend of using Large Language Models (LLMs) as tool agents in real-world applications underscores the necessity for comprehensive evaluations of their capabilities, particularly in complex scenarios involving planning, creating, and using tools. However, existing benchmarks typically focus on simple synthesized queries that do not reflect real-world complexity, thereby offering limited… ▽ More The recent trend of using Large Language Models (LLMs) as tool agents in real-world applications underscores the necessity for comprehensive evaluations of their capabilities, particularly in complex scenarios involving planning, creating, and using tools. However, existing benchmarks typically focus on simple synthesized queries that do not reflect real-world complexity, thereby offering limited perspectives in evaluating tool utilization. To address this issue, we present UltraTool, a novel benchmark designed to improve and evaluate LLMs' ability in tool utilization within real-world scenarios. UltraTool focuses on the entire process of using tools - from planning and creating to applying them in complex tasks. It emphasizes real-world complexities, demanding accurate, multi-step planning for effective problem-solving. A key feature of UltraTool is its independent evaluation of planning with natural language, which happens before tool usage and simplifies the task solving by map** out the intermediate steps. Thus, unlike previous work, it eliminates the restriction of pre-defined toolset. Through extensive experiments on various LLMs, we offer novel insights into the evaluation of capabilities of LLMs in tool utilization, thereby contributing a fresh perspective to this rapidly evolving field. The benchmark is publicly available at https://github.com/JoeYing1019/UltraTool. △ Less

Submitted 3 June, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: Accepted by ACL2024 Findings

arXiv:2401.15670 [pdf, other]

YODA: Teacher-Student Progressive Learning for Language Models

Authors: Jianqiao Lu, Wanjun Zhong, Yufei Wang, Zhijiang Guo, Qi Zhu, Wenyong Huang, Yanlin Wang, Fei Mi, Baojun Wang, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu

Abstract: Although large language models (LLMs) have demonstrated adeptness in a range of tasks, they still lag behind human learning efficiency. This disparity is often linked to the inherent human capacity to learn from basic examples, gradually generalize and handle more complex problems, and refine their skills with continuous feedback. Inspired by this, this paper introduces YODA, a novel teacher-stude… ▽ More Although large language models (LLMs) have demonstrated adeptness in a range of tasks, they still lag behind human learning efficiency. This disparity is often linked to the inherent human capacity to learn from basic examples, gradually generalize and handle more complex problems, and refine their skills with continuous feedback. Inspired by this, this paper introduces YODA, a novel teacher-student progressive learning framework that emulates the teacher-student education process to improve the efficacy of model fine-tuning. The framework operates on an interactive \textit{basic-generalized-harder} loop. The teacher agent provides tailored feedback on the student's answers, and systematically organizes the education process. This process unfolds by teaching the student basic examples, reinforcing understanding through generalized questions, and then enhancing learning by posing questions with progressively enhanced complexity. With the teacher's guidance, the student learns to iteratively refine its answer with feedback, and forms a robust and comprehensive understanding of the posed questions. The systematic procedural data, which reflects the progressive learning process of humans, is then utilized for model training. Taking math reasoning as a testbed, experiments show that training LLaMA2 with data from YODA improves SFT with significant performance gain (+17.01\% on GSM8K and +9.98\% on MATH). In addition, we find that training with curriculum learning further improves learning robustness. △ Less

Submitted 28 January, 2024; originally announced January 2024.

Comments: 14 pages, 4 figures, 3 tables

arXiv:2401.11657 [pdf]

A photon-level broadband dual-comb interferometer for turbulent open-air trace gases detection application

Authors: Wei Zhong, Yingyu Liu, Qin Yin, Ruocan Zhao, Yiwei Ding, Chong Wang, Tindi Chen, Xiankang Dou, Xianghui Xue

Abstract: Open-path dual-comb spectroscopy (DCS) significantly enhances our understanding of regional trace gases. However, due to technical challenges, cost considerations, and eye-safety regulations, its sensing range and flexibility remain limited. The photon-counting DCS demonstrated recently heralds potential innovations over open-path DCS. Nevertheless, a major challenge in open-air applications of th… ▽ More Open-path dual-comb spectroscopy (DCS) significantly enhances our understanding of regional trace gases. However, due to technical challenges, cost considerations, and eye-safety regulations, its sensing range and flexibility remain limited. The photon-counting DCS demonstrated recently heralds potential innovations over open-path DCS. Nevertheless, a major challenge in open-air applications of this approach lies in accurately extracting information from the arrival time of photons that have traversed the turbulent atmosphere. Here, we demonstrate a photon-level dual-comb interferometer for field deployment in open-air environments, uniquely designed to counteract the impact of optical path-length variations caused by atmospheric turbulence and fiber-length wandering. Under variable optical path-length conditions, 20nm broadband absorption spectrum of H13C14N is acquired, with the power per comb line detected as low as 4 attowatt . Furthermore, this photon-level DCS achieves comb-line resolution with a quantum-noise-limited signal-to-noise (SNR). This paves the way for novel open-path DCS applications, including non-cooperative target sensing and sensing over a hundred-kilometers range, all within a portable, fieldable, eye-safety and low power consumption system. △ Less

Submitted 21 January, 2024; originally announced January 2024.

Comments: 24 pages, 10 figures

arXiv:2401.06300 [pdf, other]

Advantage of Quantum Neural Networks as Quantum Information Decoders

Authors: Weishun Zhong, Oles Shtanko, Ramis Movassagh

Abstract: A promising strategy to protect quantum information from noise-induced errors is to encode it into the low-energy states of a topological quantum memory device. However, readout errors from such memory under realistic settings is less understood. We study the problem of decoding quantum information encoded in the groundspaces of topological stabilizer Hamiltonians in the presence of generic pertur… ▽ More A promising strategy to protect quantum information from noise-induced errors is to encode it into the low-energy states of a topological quantum memory device. However, readout errors from such memory under realistic settings is less understood. We study the problem of decoding quantum information encoded in the groundspaces of topological stabilizer Hamiltonians in the presence of generic perturbations, such as quenched disorder. We first prove that the standard stabilizer-based error correction and decoding schemes work adequately well in such perturbed quantum codes by showing that the decoding error diminishes exponentially in the distance of the underlying unperturbed code. We then prove that Quantum Neural Network (QNN) decoders provide an almost quadratic improvement on the readout error. Thus, we demonstrate provable advantage of using QNNs for decoding realistic quantum error-correcting codes, and our result enables the exploration of a wider range of non-stabilizer codes in the near-term laboratory settings. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: 25 pages, 5 figures

arXiv:2401.05975 [pdf, other]

End-to-end Learnable Clustering for Intent Learning in Recommendation

Authors: Yue Liu, Shihao Zhu, Jun Xia, Yingwei Ma, Jian Ma, Wenliang Zhong, Xinwang Liu, Guannan Zhang, Kejun Zhang

Abstract: Intent learning, which aims to learn users' intents for user understanding and item recommendation, has become a hot research spot in recent years. However, the existing methods suffer from complex and cumbersome alternating optimization, limiting the performance and scalability. To this end, we propose a novel intent learning method termed \underline{ELCRec}, by unifying behavior representation l… ▽ More Intent learning, which aims to learn users' intents for user understanding and item recommendation, has become a hot research spot in recent years. However, the existing methods suffer from complex and cumbersome alternating optimization, limiting the performance and scalability. To this end, we propose a novel intent learning method termed \underline{ELCRec}, by unifying behavior representation learning into an \underline{E}nd-to-end \underline{L}earnable \underline{C}lustering framework, for effective and efficient \underline{Rec}ommendation. Concretely, we encode users' behavior sequences and initialize the cluster centers (latent intents) as learnable neurons. Then, we design a novel learnable clustering module to separate different cluster centers, thus decoupling users' complex intents. Meanwhile, it guides the network to learn intents from behaviors by forcing behavior embeddings close to cluster centers. This allows simultaneous optimization of recommendation and clustering via mini-batch data. Moreover, we propose intent-assisted contrastive learning by using cluster centers as self-supervision signals, further enhancing mutual promotion. Both experimental results and theoretical analyses demonstrate the superiority of ELCRec from six perspectives. Compared to the runner-up, ELCRec improves NDCG@5 by 8.9\% and reduces computational costs by 22.5\% on Beauty dataset. Furthermore, due to the scalability and universal applicability, we deploy this method on the industrial recommendation system with 130 million page views and achieve promising results. △ Less

Submitted 2 February, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

Comments: 24 pages

arXiv:2401.03906 [pdf, other]

Reconstruction of hypermatrices from subhypermatrices

Authors: Xiande Zhang, Wenjie Zhong

Abstract: For a given $n$, what is the smallest number $k$ such that every sequence of length $n$ is determined by the multiset of all its $k$-subsequences? This is called the $k$-deck problem for sequence reconstruction, and has been generalized to the two-dimensional case -- reconstruction of $n\times n$-matrices from submatrices. Previous works show that the smallest $k$ is at most $O(n^\frac{1}{2})$ for… ▽ More For a given $n$, what is the smallest number $k$ such that every sequence of length $n$ is determined by the multiset of all its $k$-subsequences? This is called the $k$-deck problem for sequence reconstruction, and has been generalized to the two-dimensional case -- reconstruction of $n\times n$-matrices from submatrices. Previous works show that the smallest $k$ is at most $O(n^\frac{1}{2})$ for sequences and at most $O(n^\frac{2}{3})$ for matrices. We study this $k$-deck problem for general dimension $d$ and prove that, the smallest $k$ is at most $O(n^\frac{d}{d+1})$ for reconstructing a $d$ dimensional hypermatrix of order $n$ from the multiset of all its subhypermatrices of order $k$. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: 25 pages, 4 figures

arXiv:2401.02035 [pdf, ps, other]

Efficient Information Geometry Approach for Massive MIMO-OFDM Channel Estimation

Authors: Jiyuan Yang, Yan Chen, Mingrui Fan, An-An Lu, Wen Zhong, Xiqi Gao, Xiaohu You, Xiang-Gen Xia, Dirk Slock

Abstract: We investigate the channel estimation for massive multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) systems. We revisit the information geometry approach (IGA) for massive MIMO-OFDM channel estimation. By using the constant magnitude property of the entries of the measurement matrix, we find that the second-order natural parameters of the distributions on all th… ▽ More We investigate the channel estimation for massive multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) systems. We revisit the information geometry approach (IGA) for massive MIMO-OFDM channel estimation. By using the constant magnitude property of the entries of the measurement matrix, we find that the second-order natural parameters of the distributions on all the auxiliary manifolds are equivalent to each other, and the first-order natural parameters are asymptotically equivalent to each other at the fixed point. Motivated by these results, we simplify the process of IGA and propose an efficient IGA (EIGA) for massive MIMO-OFDM channel estimation, which allows efficient implementation with fast Fourier transformation (FFT). We then establish a sufficient condition of its convergence and accordingly find a range of the dam** factor for the convergence. We show that this range of dam** factor is sufficiently wide by using the specific properties of the measurement matrices. Further, we prove that at the fixed point, the a posteriori mean obtained by EIGA is asymptotically optimal. Simulations confirm that EIGA can achieve the optimal performance with low complexity in a limited number of iterations. △ Less

Submitted 3 June, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

arXiv:2312.17266 [pdf]

Automatic laminectomy cutting plane planning based on artificial intelligence in robot assisted laminectomy surgery

Authors: Zhuofu Li, Yonghong Zhang, Chengxia Wang, Shanshan Liu, Xiongkang Song, Xuquan Ji, Shuai Jiang, Woquan Zhong, Lei Hu, Weishi Li

Abstract: Objective: This study aims to use artificial intelligence to realize the automatic planning of laminectomy, and verify the method. Methods: We propose a two-stage approach for automatic laminectomy cutting plane planning. The first stage was the identification of key points. 7 key points were manually marked on each CT image. The Spatial Pyramid Upsampling Network (SPU-Net) algorithm developed by… ▽ More Objective: This study aims to use artificial intelligence to realize the automatic planning of laminectomy, and verify the method. Methods: We propose a two-stage approach for automatic laminectomy cutting plane planning. The first stage was the identification of key points. 7 key points were manually marked on each CT image. The Spatial Pyramid Upsampling Network (SPU-Net) algorithm developed by us was used to accurately locate the 7 key points. In the second stage, based on the identification of key points, a personalized coordinate system was generated for each vertebra. Finally, the transverse and longitudinal cutting planes of laminectomy were generated under the coordinate system. The overall effect of planning was evaluated. Results: In the first stage, the average localization error of the SPU-Net algorithm for the seven key points was 0.65mm. In the second stage, a total of 320 transverse cutting planes and 640 longitudinal cutting planes were planned by the algorithm. Among them, the number of horizontal plane planning effects of grade A, B, and C were 318(99.38%), 1(0.31%), and 1(0.31%), respectively. The longitudinal planning effects of grade A, B, and C were 622(97.18%), 1(0.16%), and 17(2.66%), respectively. Conclusions: In this study, we propose a method for automatic surgical path planning of laminectomy based on the localization of key points in CT images. The results showed that the method achieved satisfactory results. More studies are needed to confirm the reliability of this approach in the future. △ Less

Submitted 25 December, 2023; originally announced December 2023.

arXiv:2312.17109 [pdf, other]

MIVC: Multiple Instance Visual Component for Visual-Language Models

Authors: Wenyi Wu, Qi Li, Wenliang Zhong, Junzhou Huang

Abstract: Vision-language models have been widely explored across a wide range of tasks and achieve satisfactory performance. However, it's under-explored how to consolidate entity understanding through a varying number of images and to align it with the pre-trained language models for generative tasks. In this paper, we propose MIVC, a general multiple instance visual component to bridge the gap between va… ▽ More Vision-language models have been widely explored across a wide range of tasks and achieve satisfactory performance. However, it's under-explored how to consolidate entity understanding through a varying number of images and to align it with the pre-trained language models for generative tasks. In this paper, we propose MIVC, a general multiple instance visual component to bridge the gap between various image inputs with off-the-shelf vision-language models by aggregating visual representations in a permutation-invariant fashion through a neural network. We show that MIVC could be plugged into the visual-language models to improve the model performance consistently on visual question answering, classification and captioning tasks on a public available e-commerce dataset with multiple images per product. Furthermore, we show that the component provides insight into the contribution of each image to the downstream tasks. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: Accepted at WACV 2024

arXiv:2312.16484 [pdf]

Emergence of superconductivity near 11 K by suppressing the 3-fold helical-chain structure in noncentrosymmetric HgS

Authors: He Zhang, Wei Zhong, Yanghao Meng, Bowen Tang, Binbin Yue, Xiaohui Yu, Fang Hong

Abstract: The trigonal $α$-HgS has a 3-fold helical chain structure, and is in form of a noncentrosymmetric $P3_121$ phase, known as the cinnabar phase. However, under pressure, the helical chains gradually approach and connect with each other, finally reconstructing into a centrosymmetric NaCl structure at 21 GPa. Superconductivity emerges just after this helical-nonhelical structural transition. The maxim… ▽ More The trigonal $α$-HgS has a 3-fold helical chain structure, and is in form of a noncentrosymmetric $P3_121$ phase, known as the cinnabar phase. However, under pressure, the helical chains gradually approach and connect with each other, finally reconstructing into a centrosymmetric NaCl structure at 21 GPa. Superconductivity emerges just after this helical-nonhelical structural transition. The maximum critical temperature ($T_c$) reaches 11 K at 25.4 GPa, $T_c$ decreases with further compression, and is still 3.5 K at 44.8 GPa. Furthermore, the $T_c$-critical magnetic field ($B_{c2}$) relation exhibits multi-band features, with a $B_{c2}$ of 5.65 T at 0 K by two-band fitting. Raman spectra analysis demonstrates that phonon softening plays a key role in structural transition and the emergence of superconductivity. It is noted that HgS is the first reported IIB group metal sulfide superconductor and the only NaCl-type metal sulfide superconductor with a $T_c$ above 10 K. This work will inspire the exploration of superconductivity in other chiral systems and will extend our understanding of the versatile behavior in such kinds of materials. △ Less

Submitted 27 December, 2023; originally announced December 2023.

Comments: 16 pages, 6 figures

arXiv:2312.11370 [pdf, other]

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

Authors: Jiahui Gao, Renjie Pi, Jipeng Zhang, Jiacheng Ye, Wanjun Zhong, Yufei Wang, Lanqing Hong, Jianhua Han, Hang Xu, Zhenguo Li, Lingpeng Kong

Abstract: Large language models (LLMs) have shown remarkable proficiency in human-level reasoning and generation capabilities, which encourages extensive research on their application in mathematical problem solving. However, current work has been largely focused on text-based mathematical problems, with limited investigation in problems involving geometric information. Addressing this gap, we aim to enable… ▽ More Large language models (LLMs) have shown remarkable proficiency in human-level reasoning and generation capabilities, which encourages extensive research on their application in mathematical problem solving. However, current work has been largely focused on text-based mathematical problems, with limited investigation in problems involving geometric information. Addressing this gap, we aim to enable LLMs to solve geometric problems by understanding image input. We first analyze the limitations of current Multimodal Large Language Models (MLLMs) in this area: they struggle to accurately comprehending basic geometric elements and their relationships. To overcome these challenges, we take advantage of the unique characteristics of geometric problems (such as unique geometric logical form, and geometric scalability) and the capacity of the textual LLMs to build an enriched multimodal geometry dataset based on existing data. The augmented dataset, Geo170K, contains more than 170K geometric image-caption and question-answer pairs. Utilizing our constructed Geo170K dataset, we develop G-LLaVA, which demonstrates exceptional performance in solving geometric problems, significantly outperforming GPT-4-V on the MathVista benchmark with only 7B parameters. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: 10 pages

arXiv:2312.01916 [pdf, other]

PEACE: Prototype lEarning Augmented transferable framework for Cross-domain rEcommendation

Authors: Chun**g Gan, Bo Huang, Binbin Hu, Jian Ma, Ziqi Liu, Zhiqiang Zhang, Jun Zhou, Guannan Zhang, Wenliang Zhong

Abstract: To help merchants/customers to provide/access a variety of services through miniapps, online service platforms have occupied a critical position in the effective content delivery, in which how to recommend items in the new domain launched by the service provider for customers has become more urgent. However, the non-negligible gap between the source and diversified target domains poses a considera… ▽ More To help merchants/customers to provide/access a variety of services through miniapps, online service platforms have occupied a critical position in the effective content delivery, in which how to recommend items in the new domain launched by the service provider for customers has become more urgent. However, the non-negligible gap between the source and diversified target domains poses a considerable challenge to cross-domain recommendation systems, which often leads to performance bottlenecks in industrial settings. While entity graphs have the potential to serve as a bridge between domains, rudimentary utilization still fail to distill useful knowledge and even induce the negative transfer issue. To this end, we propose PEACE, a Prototype lEarning Augmented transferable framework for Cross-domain rEcommendation. For domain gap bridging, PEACE is built upon a multi-interest and entity-oriented pre-training architecture which could not only benefit the learning of generalized knowledge in a multi-granularity manner, but also help leverage more structural information in the entity graph. Then, we bring the prototype learning into the pre-training over source domains, so that representations of users and items are greatly improved by the contrastive prototype learning module and the prototype enhanced attention mechanism for adaptive knowledge utilization. To ease the pressure of online serving, PEACE is carefully deployed in a lightweight manner, and significant performance improvements are observed in both online and offline environments. △ Less

Submitted 17 December, 2023; v1 submitted 4 December, 2023; originally announced December 2023.

Comments: Accepted by WSDM 2024

arXiv:2312.01700 [pdf, other]

Data Management For Large Language Models: A Survey

Authors: Zige Wang, Wanjun Zhong, Yufei Wang, Qi Zhu, Fei Mi, Baojun Wang, Lifeng Shang, Xin Jiang, Qun Liu

Abstract: Data plays a fundamental role in the training of Large Language Models (LLMs). Effective data management, particularly in the formulation of a well-suited training dataset, holds significance for enhancing model performance and improving training efficiency during pretraining and supervised fine-tuning phases. Despite the considerable importance of data management, the current research community s… ▽ More Data plays a fundamental role in the training of Large Language Models (LLMs). Effective data management, particularly in the formulation of a well-suited training dataset, holds significance for enhancing model performance and improving training efficiency during pretraining and supervised fine-tuning phases. Despite the considerable importance of data management, the current research community still falls short in providing a systematic analysis of the rationale behind management strategy selection, its consequential effects, methodologies for evaluating curated datasets, and the ongoing pursuit of improved strategies. Consequently, the exploration of data management has attracted more and more attention among the research community. This survey provides a comprehensive overview of current research in data management within both the pretraining and supervised fine-tuning stages of LLMs, covering various noteworthy aspects of data management strategy design: data quantity, data quality, domain/task composition, etc. Looking toward the future, we extrapolate existing challenges and outline promising directions for development in this field. Therefore, this survey serves as a guiding resource for practitioners aspiring to construct powerful LLMs through effective data management practices. The collection of the latest papers is available at https://github.com/ZigeW/data_management_LLM. △ Less

Submitted 25 December, 2023; v1 submitted 4 December, 2023; originally announced December 2023.

Comments: Work in progress

arXiv:2312.00553 [pdf]

A Spatio-Temporal Graph Convolutional Network for Gesture Recognition from High-Density Electromyography

Authors: Wenjuan Zhong, Yuyang Zhang, Peiwen Fu, Wenxuan Xiong, Mingming Zhang

Abstract: Accurate hand gesture prediction is crucial for effective upper-limb prosthetic limbs control. As the high flexibility and multiple degrees of freedom exhibited by human hands, there has been a growing interest in integrating deep networks with high-density surface electromyography (HD-sEMG) grids to enhance gesture recognition capabilities. However, many existing methods fall short in fully explo… ▽ More Accurate hand gesture prediction is crucial for effective upper-limb prosthetic limbs control. As the high flexibility and multiple degrees of freedom exhibited by human hands, there has been a growing interest in integrating deep networks with high-density surface electromyography (HD-sEMG) grids to enhance gesture recognition capabilities. However, many existing methods fall short in fully exploit the specific spatial topology and temporal dependencies present in HD-sEMG data. Additionally, these studies are often limited number of gestures and lack generality. Hence, this study introduces a novel gesture recognition method, named STGCN-GR, which leverages spatio-temporal graph convolution networks for HD-sEMG-based human-machine interfaces. Firstly, we construct muscle networks based on functional connectivity between channels, creating a graph representation of HD-sEMG recordings. Subsequently, a temporal convolution module is applied to capture the temporal dependences in the HD-sEMG series and a spatial graph convolution module is employed to effectively learn the intrinsic spatial topology information among distinct HD-sEMG channels. We evaluate our proposed model on a public HD-sEMG dataset comprising a substantial number of gestures (i.e., 65). Our results demonstrate the remarkable capability of the STGCN-GR method, achieving an impressive accuracy of 91.07% in predicting gestures, which surpasses state-of-the-art deep learning methods applied to the same dataset. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Showing 1–50 of 368 results for author: Zhong, W