Search | arXiv e-print repository

Generalize by Touching: Tactile Ensemble Skill Transfer for Robotic Furniture Assembly

Authors: Haohong Lin, Radu Corcodel, Ding Zhao

Abstract: Furniture assembly remains an unsolved problem in robotic manipulation due to its long task horizon and nongeneralizable operations plan. This paper presents the Tactile Ensemble Skill Transfer (TEST) framework, a pioneering offline reinforcement learning (RL) approach that incorporates tactile feedback in the control loop. TEST's core design is to learn a skill transition model for high-level pla… ▽ More Furniture assembly remains an unsolved problem in robotic manipulation due to its long task horizon and nongeneralizable operations plan. This paper presents the Tactile Ensemble Skill Transfer (TEST) framework, a pioneering offline reinforcement learning (RL) approach that incorporates tactile feedback in the control loop. TEST's core design is to learn a skill transition model for high-level planning, along with a set of adaptive intra-skill goal-reaching policies. Such design aims to solve the robotic furniture assembly problem in a more generalizable way, facilitating seamless chaining of skills for this long-horizon task. We first sample demonstration from a set of heuristic policies and trajectories consisting of a set of randomized sub-skill segments, enabling the acquisition of rich robot trajectories that capture skill stages, robot states, visual indicators, and crucially, tactile signals. Leveraging these trajectories, our offline RL method discerns skill termination conditions and coordinates skill transitions. Our evaluations highlight the proficiency of TEST on the in-distribution furniture assemblies, its adaptability to unseen furniture configurations, and its robustness against visual disturbances. Ablation studies further accentuate the pivotal role of two algorithmic components: the skill transition model and tactile ensemble policies. Results indicate that TEST can achieve a success rate of 90\% and is over 4 times more efficient than the heuristic policy in both in-distribution and generalization settings, suggesting a scalable skill transfer approach for contact-rich manipulation. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.16425 [pdf, other]

Soft X-ray prompt emission from a high-redshift gamma-ray burst EP240315a

Authors: Y. Liu, H. Sun, D. Xu, D. S. Svinkin, J. Delaunay, N. R. Tanvir, H. Gao, C. Zhang, Y. Chen, X. -F. Wu, B. Zhang, W. Yuan, J. An, G. Bruni, D. D. Frederiks, G. Ghirlanda, J. -W. Hu, A. Li, C. -K. Li, J. -D. Li, D. B. Malesani, L. Piro, G. Raman, R. Ricci, E. Troja , et al. (170 additional authors not shown)

Abstract: Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a,… ▽ More Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a, whose bright peak was also detected by the Swift Burst Alert Telescope and Konus-Wind through off-line analyses. At a redshift of $z=4.859$, EP240315a showed a much longer and more complicated light curve in the soft X-ray band than in gamma-rays. Benefiting from a large field-of-view ($\sim$3600 deg$^2$) and a high sensitivity, EP-WXT captured the earlier engine activation and extended late engine activity through a continuous detection. With a peak X-ray flux at the faint end of previously known high-$z$ GRBs, the detection of EP240315a demonstrates the great potential for EP to study the early universe via GRBs. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 41 pages, 8 figures, 7 tables

arXiv:2404.16299 [pdf, ps, other]

Conformal transformation of f(Q) gravity and its cosmological perturbations

Authors: Dehao Zhao

Abstract: Symmetric teleparallel gravity (STG) is a gravity theory which takes non-metricity tensor to describe gravity effects. In the STG framework, we study the conformal equivalent scalar-tensor theory of f(Q) model and calculate the cosmological linear perturbations of the conformal transformed action. We confirm the result already present in references that f(Q) gravity shows different degrees of free… ▽ More Symmetric teleparallel gravity (STG) is a gravity theory which takes non-metricity tensor to describe gravity effects. In the STG framework, we study the conformal equivalent scalar-tensor theory of f(Q) model and calculate the cosmological linear perturbations of the conformal transformed action. We confirm the result already present in references that f(Q) gravity shows different degrees of freedom on different backgrounds at linear perturbation level. We also explain that this situation often means the linear perturbation theory breaks down and the model may suffer from strong coupling problem. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 14 pages, no figure

arXiv:2404.14934 [pdf, other]

G3R: Generating Rich and Fine-grained mmWave Radar Data from 2D Videos for Generalized Gesture Recognition

Authors: Kaikai Deng, Dong Zhao, Wenxin Zheng, Yue Ling, Kangwen Yin, Huadong Ma

Abstract: Millimeter wave radar is gaining traction recently as a promising modality for enabling pervasive and privacy-preserving gesture recognition. However, the lack of rich and fine-grained radar datasets hinders progress in develo** generalized deep learning models for gesture recognition across various user postures (e.g., standing, sitting), positions, and scenes. To remedy this, we resort to desi… ▽ More Millimeter wave radar is gaining traction recently as a promising modality for enabling pervasive and privacy-preserving gesture recognition. However, the lack of rich and fine-grained radar datasets hinders progress in develo** generalized deep learning models for gesture recognition across various user postures (e.g., standing, sitting), positions, and scenes. To remedy this, we resort to designing a software pipeline that exploits wealthy 2D videos to generate realistic radar data, but it needs to address the challenge of simulating diversified and fine-grained reflection properties of user gestures. To this end, we design G3R with three key components: (i) a gesture reflection point generator expands the arm's skeleton points to form human reflection points; (ii) a signal simulation model simulates the multipath reflection and attenuation of radar signals to output the human intensity map; (iii) an encoder-decoder model combines a sampling module and a fitting module to address the differences in number and distribution of points between generated and real-world radar data for generating realistic radar data. We implement and evaluate G3R using 2D videos from public data sources and self-collected real-world radar data, demonstrating its superiority over other state-of-the-art approaches for gesture recognition. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 18 pages, 29 figures

arXiv:2404.12738 [pdf, other]

doi 10.1109/TNET.2024.3398778

DeviceRadar: Online IoT Device Fingerprinting in ISPs using Programmable Switches

Authors: Ruoyu Li, Qing Li, Tao Lin, Qingsong Zou, Dan Zhao, Yucheng Huang, Gareth Tyson, Guorui Xie, Yong Jiang

Abstract: Device fingerprinting can be used by Internet Service Providers (ISPs) to identify vulnerable IoT devices for early prevention of threats. However, due to the wide deployment of middleboxes in ISP networks, some important data, e.g., 5-tuples and flow statistics, are often obscured, rendering many existing approaches invalid. It is further challenged by the high-speed traffic of hundreds of teraby… ▽ More Device fingerprinting can be used by Internet Service Providers (ISPs) to identify vulnerable IoT devices for early prevention of threats. However, due to the wide deployment of middleboxes in ISP networks, some important data, e.g., 5-tuples and flow statistics, are often obscured, rendering many existing approaches invalid. It is further challenged by the high-speed traffic of hundreds of terabytes per day in ISP networks. This paper proposes DeviceRadar, an online IoT device fingerprinting framework that achieves accurate, real-time processing in ISPs using programmable switches. We innovatively exploit "key packets" as a basis of fingerprints only using packet sizes and directions, which appear periodically while exhibiting differences across different IoT devices. To utilize them, we propose a packet size embedding model to discover the spatial relationships between packets. Meanwhile, we design an algorithm to extract the "key packets" of each device, and propose an approach that jointly considers the spatial relationships and the key packets to produce a neighboring key packet distribution, which can serve as a feature vector for machine learning models for inference. Last, we design a model transformation method and a feature extraction process to deploy the model on a programmable data plane within its constrained arithmetic operations and memory to achieve line-speed processing. Our experiments show that DeviceRadar can achieve state-of-the-art accuracy across 77 IoT devices with 40 Gbps throughput, and requires only 1.3% of the processing time compared to GPU-accelerated approaches. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: Submitted to IEEE/ACM Transactions on Networking (ToN)

arXiv:2404.12022 [pdf, other]

Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration

Authors: Pengfei Wu, Jiahao Liu, Zhuocheng Gong, Qifan Wang, **peng Li, **gang Wang, Xunliang Cai, Dongyan Zhao

Abstract: Large language models (LLMs) have recently shown remarkable performance across a wide range of tasks. However, the substantial number of parameters in LLMs contributes to significant latency during model inference. This is particularly evident when utilizing autoregressive decoding methods, which generate one token in a single forward process, thereby not fully capitalizing on the parallel computi… ▽ More Large language models (LLMs) have recently shown remarkable performance across a wide range of tasks. However, the substantial number of parameters in LLMs contributes to significant latency during model inference. This is particularly evident when utilizing autoregressive decoding methods, which generate one token in a single forward process, thereby not fully capitalizing on the parallel computing capabilities of GPUs. In this paper, we propose a novel parallel decoding approach, namely \textit{hidden transfer}, which decodes multiple successive tokens simultaneously in a single forward pass. The idea is to transfer the intermediate hidden states of the previous context to the \textit{pseudo} hidden states of the future tokens to be generated, and then the pseudo hidden states will pass the following transformer layers thereby assimilating more semantic information and achieving superior predictive accuracy of the future tokens. Besides, we use the novel tree attention mechanism to simultaneously generate and verify multiple candidates of output sequences, which ensure the lossless generation and further improves the generation efficiency of our method. Experiments demonstrate the effectiveness of our method. We conduct a lot of analytic experiments to prove our motivation. In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.10610 [pdf, other]

Shining Light into the Tunnel: Understanding and Classifying Network Traffic of Residential Proxies

Authors: Ronghong Huang, Dongfang Zhao, Xianghang Mi, Xiaofeng Wang

Abstract: Emerging in recent years, residential proxies (RESIPs) feature multiple unique characteristics when compared with traditional network proxies (e.g., commercial VPNs), particularly, the deployment in residential networks rather than data center networks, the worldwide distribution in tens of thousands of cities and ISPs, and the large scale of millions of exit nodes. All these factors allow RESIP u… ▽ More Emerging in recent years, residential proxies (RESIPs) feature multiple unique characteristics when compared with traditional network proxies (e.g., commercial VPNs), particularly, the deployment in residential networks rather than data center networks, the worldwide distribution in tens of thousands of cities and ISPs, and the large scale of millions of exit nodes. All these factors allow RESIP users to effectively masquerade their traffic flows as ones from authentic residential users, which leads to the increasing adoption of RESIP services, especially in malicious online activities. However, regarding the (malicious) usage of RESIPs (i.e., what traffic is relayed by RESIPs), current understanding turns out to be insufficient. Particularly, previous works on RESIP traffic studied only the maliciousness of web traffic destinations and the suspicious patterns of visiting popular websites. Also, a general methodology is missing regarding capturing large-scale RESIP traffic and analyzing RESIP traffic for security risks. Furthermore, considering many RESIP nodes are found to be located in corporate networks and are deployed without proper authorization from device owners or network administrators, it is becoming increasingly necessary to detect and block RESIP traffic flows, which unfortunately is impeded by the scarcity of realistic RESIP traffic datasets and effective detection methodologies. To fill in these gaps, multiple novel tools have been designed and implemented in this study, which include a general framework to deploy RESIP nodes and collect RESIP traffic in a distributed manner, a RESIP traffic analyzer to efficiently process RESIP traffic logs and surface out suspicious traffic flows, and multiple machine learning based RESIP traffic classifiers to timely and accurately detect whether a given traffic flow is RESIP traffic or not. △ Less

Submitted 30 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.10004 [pdf]

A Strategy Transfer and Decision Support Approach for Epidemic Control in Experience Shortage Scenarios

Authors: X. Xiao, P. Chen, X. Cao, K. Liu, L. Deng, D. Zhao, Z. Chen, Q. Deng, F. Yu, H. Zhang

Abstract: Epidemic outbreaks can cause critical health concerns and severe global economic crises. For countries or regions with new infectious disease outbreaks, it is essential to generate preventive strategies by learning lessons from others with similar risk profiles. A Strategy Transfer and Decision Support Approach (STDSA) is proposed based on the profile similarity evaluation. There are four steps in… ▽ More Epidemic outbreaks can cause critical health concerns and severe global economic crises. For countries or regions with new infectious disease outbreaks, it is essential to generate preventive strategies by learning lessons from others with similar risk profiles. A Strategy Transfer and Decision Support Approach (STDSA) is proposed based on the profile similarity evaluation. There are four steps in this method: (1) The similarity evaluation indicators are determined from three dimensions, i.e., the Basis of National Epidemic Prevention & Control, Social Resilience, and Infection Situation. (2) The data related to the indicators are collected and preprocessed. (3) The first round of screening on the preprocessed dataset is conducted through an improved collaborative filtering algorithm to calculate the preliminary similarity result from the perspective of the infection situation. (4) Finally, the K-Means model is used for the second round of screening to obtain the final similarity values. The approach will be applied to decision-making support in the context of COVID-19. Our results demonstrate that the recommendations generated by the STDSA model are more accurate and aligned better with the actual situation than those produced by pure K-means models. This study will provide new insights into preventing and controlling epidemics in regions that lack experience. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 20 pages, 9 figures

arXiv:2404.05649 [pdf]

Realization of a three-dimensional photonic higher-order topological insulator

Authors: Ziyao Wang, Yan Meng, Bei Yan, Dong Zhao, Linyun Yang, **g-Ming Chen, Min-Qi Cheng, Tao Xiao, Perry ** Shum, Gui-Geng Liu, Yihao Yang, Hongsheng Chen, Xiang Xi, Zhen-Xiao Zhu, Biye Xie, Zhen Gao

Abstract: The discovery of photonic higher-order topological insulators (HOTIs) has significantly expanded our understanding of band topology and provided unprecedented lower-dimensional topological boundary states for robust photonic devices. However, due to the vectorial and leaky nature of electromagnetic waves, it is challenging to discover three-dimensional (3D) topological photonic systems and photoni… ▽ More The discovery of photonic higher-order topological insulators (HOTIs) has significantly expanded our understanding of band topology and provided unprecedented lower-dimensional topological boundary states for robust photonic devices. However, due to the vectorial and leaky nature of electromagnetic waves, it is challenging to discover three-dimensional (3D) topological photonic systems and photonic HOTIs have so far still been limited to two dimensions (2D). Here, we report on the first experimental realization of a 3D Wannier-type photonic HOTI in a tight-binding-like metal-cage photonic crystal, whose band structure matches well with that of a 3D tight-binding model due to the confined Mie resonances. By microwave near-field measurements, we directly observe coexisting topological surface, hinge, and corner states in a single 3D photonic HOTI, as predicted by the tight-binding model and simulation results. Moreover, we demonstrate that all-order topological boundary states are self-guided even in the light cone continuum and can be exposed to air without ancillary cladding, making them well-suited for practical applications. Our work thus opens routes to the multi-dimensional robust manipulation of electromagnetic waves at the outer surfaces of 3D cladding-free photonic bandgap materials and may find novel applications in 3D topological integrated photonics devices. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 23 pages,4 figures

arXiv:2404.05609 [pdf, other]

Feedback Stability Under Mixed Gain and Phase Uncertainty

Authors: Jia** Liang, Di Zhao, Li Qiu

Abstract: In this study, we investigate the robust feedback stability problem for multiple-input-multiple-output linear time-invariant systems involving sectored-disk uncertainty, namely, dynamic uncertainty subject to simultaneous gain and phase constraints. This problem is thereby called a sectored-disk problem. Employing a frequency-wise analysis approach, we derive a fundamental static matrix problem th… ▽ More In this study, we investigate the robust feedback stability problem for multiple-input-multiple-output linear time-invariant systems involving sectored-disk uncertainty, namely, dynamic uncertainty subject to simultaneous gain and phase constraints. This problem is thereby called a sectored-disk problem. Employing a frequency-wise analysis approach, we derive a fundamental static matrix problem that serves as a key component in addressing the feedback stability. The study of this matrix problem heavily relies on the Davis-Wielandt (DW) shells of matrices, providing a profound insight into matrices subjected to simultaneous gain and phase constraints. This understanding is pivotal for establishing a less conservative sufficient condition for the matrix sectored-disk problem, from which we formulate several robust feedback stability conditions against sectored-disk uncertainty. Finally, several conditions based on linear matrix inequalities are developed for efficient computation and verification of feedback robust stability against sectored-disk uncertainty. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.04608 [pdf, other]

doi 10.1109/TGRS.2024.3392778

Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal Remote Sensing Image Interpretation

Authors: Danpei Zhao, Bo Yuan, Ziqiang Chen, Tian Li, Zhuoran Liu, Wentao Li, Yue Gao

Abstract: Current remote-sensing interpretation models often focus on a single task such as detection, segmentation, or caption. However, the task-specific designed models are unattainable to achieve the comprehensive multi-level interpretation of images. The field also lacks support for multi-task joint interpretation datasets. In this paper, we propose Panoptic Perception, a novel task and a new fine-grai… ▽ More Current remote-sensing interpretation models often focus on a single task such as detection, segmentation, or caption. However, the task-specific designed models are unattainable to achieve the comprehensive multi-level interpretation of images. The field also lacks support for multi-task joint interpretation datasets. In this paper, we propose Panoptic Perception, a novel task and a new fine-grained dataset (FineGrip) to achieve a more thorough and universal interpretation for RSIs. The new task, 1) integrates pixel-level, instance-level, and image-level information for universal image perception, 2) captures image information from coarse to fine granularity, achieving deeper scene understanding and description, and 3) enables various independent tasks to complement and enhance each other through multi-task learning. By emphasizing multi-task interactions and the consistency of perception results, this task enables the simultaneous processing of fine-grained foreground instance segmentation, background semantic segmentation, and global fine-grained image captioning. Concretely, the FineGrip dataset includes 2,649 remote sensing images, 12,054 fine-grained instance segmentation masks belonging to 20 foreground things categories, 7,599 background semantic masks for 5 stuff classes and 13,245 captioning sentences. Furthermore, we propose a joint optimization-based panoptic perception model. Experimental results on FineGrip demonstrate the feasibility of the panoptic perception task and the beneficial effect of multi-task joint optimization on individual tasks. The dataset will be publicly available. △ Less

Submitted 25 April, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

Journal ref: IEEE Transactions on Geoscience and Remote Sensing, 2024

arXiv:2404.00064 [pdf, other]

Confronting CP symmetry of order 4 with experimental data

Authors: Igor P. Ivanov, Duanyang Zhao

Abstract: CP4 3HDM is a three-Higgs-doublet model based on a $CP$ symmetry of order 4 (CP4). It is the minimal model incorporating CP4 without leading to accidental symmetries or running into immediate conflict with experiment. Imposing CP4 on the lagrangian induces remarkably tight connections between the scalar and Yukawa sectors, including the unavoidable tree-level flavor-changing neutral couplings (FCN… ▽ More CP4 3HDM is a three-Higgs-doublet model based on a $CP$ symmetry of order 4 (CP4). It is the minimal model incorporating CP4 without leading to accidental symmetries or running into immediate conflict with experiment. Imposing CP4 on the lagrangian induces remarkably tight connections between the scalar and Yukawa sectors, including the unavoidable tree-level flavor-changing neutral couplings (FCNC). Here, we explore whether it is at all possible in the CP4 3HDM to suppressed FCNC to a level compatible with the neutral meson oscillation constraints. We express the FCNC matrices in terms of physical quark observables and quark rotation parameters, and scan the Yukawa parameter space using the quark masses and mixing parameters as input. With this procedure, we find that only two out of the eight possible CP4 Yukawa sectors are compatible with the $K$, $B$, $B_s$ and, in particular, $D$-meson oscillation constraints. The results clearly indicate a way how to construct phenomenologically viable benchmark CP4 3HDMs. △ Less

Submitted 28 March, 2024; originally announced April 2024.

Comments: 14 pages, 3 figures, Proceedings of the "Workshop on the Standard Model and Beyond" within the Corfu Summer Institute 2023, Corfu, Greece, August 27 - September 7, 2023

Journal ref: PoS(CORFU2023)086

arXiv:2403.20173 [pdf, other]

MCNet: A crowd denstity estimation network based on integrating multiscale attention module

Authors: Qiang Guo, Rubo Zhang, Di Zhao

Abstract: Aiming at the metro video surveillance system has not been able to effectively solve the metro crowd density estimation problem, a Metro Crowd density estimation Network (called MCNet) is proposed to automatically classify crowd density level of passengers. Firstly, an Integrating Multi-scale Attention (IMA) module is proposed to enhance the ability of the plain classifiers to extract semantic cro… ▽ More Aiming at the metro video surveillance system has not been able to effectively solve the metro crowd density estimation problem, a Metro Crowd density estimation Network (called MCNet) is proposed to automatically classify crowd density level of passengers. Firstly, an Integrating Multi-scale Attention (IMA) module is proposed to enhance the ability of the plain classifiers to extract semantic crowd texture features to accommodate to the characteristics of the crowd texture feature. The innovation of the IMA module is to fuse the dilation convolution, multiscale feature extraction and attention mechanism to obtain multi-scale crowd feature activation from a larger receptive field with lower computational cost, and to strengthen the crowds activation state of convolutional features in top layers. Secondly, a novel lightweight crowd texture feature extraction network is proposed, which can directly process video frames and automatically extract texture features for crowd density estimation, while its faster image processing speed and fewer network parameters make it flexible to be deployed on embedded platforms with limited hardware resources. Finally, this paper integrates IMA module and the lightweight crowd texture feature extraction network to construct the MCNet, and validate the feasibility of this network on image classification dataset: Cifar10 and four crowd density datasets: PETS2009, Mall, QUT and SH_METRO to validate the MCNet whether can be a suitable solution for crowd density estimation in metro video surveillance where there are image processing challenges such as high density, high occlusion, perspective distortion and limited hardware resources. △ Less

Submitted 29 March, 2024; originally announced March 2024.

arXiv:2403.19725 [pdf, other]

MUGC: Machine Generated versus User Generated Content Detection

Authors: Yaqi Xie, Anjali Rawal, Yu**g Cen, Dixuan Zhao, Sunil K Narang, Shanu Sushmita

Abstract: As advanced modern systems like deep neural networks (DNNs) and generative AI continue to enhance their capabilities in producing convincing and realistic content, the need to distinguish between user-generated and machine generated content is becoming increasingly evident. In this research, we undertake a comparative evaluation of eight traditional machine-learning algorithms to distinguish betwe… ▽ More As advanced modern systems like deep neural networks (DNNs) and generative AI continue to enhance their capabilities in producing convincing and realistic content, the need to distinguish between user-generated and machine generated content is becoming increasingly evident. In this research, we undertake a comparative evaluation of eight traditional machine-learning algorithms to distinguish between machine-generated and human-generated data across three diverse datasets: Poems, Abstracts, and Essays. Our results indicate that traditional methods demonstrate a high level of accuracy in identifying machine-generated data, reflecting the documented effectiveness of popular pre-trained models like RoBERT. We note that machine-generated texts tend to be shorter and exhibit less word variety compared to human-generated content. While specific domain-related keywords commonly utilized by humans, albeit disregarded by current LLMs (Large Language Models), may contribute to this high detection accuracy, we show that deeper word representations like word2vec can capture subtle semantic variances. Furthermore, readability, bias, moral, and affect comparisons reveal a discernible contrast between machine-generated and human generated content. There are variations in expression styles and potentially underlying biases in the data sources (human and machine-generated). This study provides valuable insights into the advancing capacities and challenges associated with machine-generated content across various domains. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: 11 pages, 16 figures

arXiv:2403.19248 [pdf, other]

Genos: General In-Network Unsupervised Intrusion Detection by Rule Extraction

Authors: Ruoyu Li, Qing Li, Yu Zhang, Dan Zhao, Xi Xiao, Yong Jiang

Abstract: Anomaly-based network intrusion detection systems (A-NIDS) use unsupervised models to detect unforeseen attacks. However, existing A-NIDS solutions suffer from low throughput, lack of interpretability, and high maintenance costs. Recent in-network intelligence (INI) exploits programmable switches to offer line-rate deployment of NIDS. Nevertheless, current in-network NIDS are either model-specific… ▽ More Anomaly-based network intrusion detection systems (A-NIDS) use unsupervised models to detect unforeseen attacks. However, existing A-NIDS solutions suffer from low throughput, lack of interpretability, and high maintenance costs. Recent in-network intelligence (INI) exploits programmable switches to offer line-rate deployment of NIDS. Nevertheless, current in-network NIDS are either model-specific or only apply to supervised models. In this paper, we propose Genos, a general in-network framework for unsupervised A-NIDS by rule extraction, which consists of a Model Compiler, a Model Interpreter, and a Model Debugger. Specifically, observing benign data are multimodal and usually located in multiple subspaces in the feature space, we utilize a divide-and-conquer approach for model-agnostic rule extraction. In the Model Compiler, we first propose a tree-based clustering algorithm to partition the feature space into subspaces, then design a decision boundary estimation mechanism to approximate the source model in each subspace. The Model Interpreter interprets predictions by important attributes to aid network operators in understanding the predictions. The Model Debugger conducts incremental updating to rectify errors by only fine-tuning rules on affected subspaces, thus reducing maintenance costs. We implement a prototype using physical hardware, and experiments demonstrate its superior performance of 100 Gbps throughput, great interpretability, and trivial updating overhead. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: accepted by IEEE International Conference on Computer Communications (INFOCOM 2024)

arXiv:2403.18197 [pdf, other]

LocoMan: Advancing Versatile Quadrupedal Dexterity with Lightweight Loco-Manipulators

Authors: Changyi Lin, Xingyu Liu, Yuxiang Yang, Yaru Niu, Wenhao Yu, Tingnan Zhang, Jie Tan, Byron Boots, Ding Zhao

Abstract: Quadrupedal robots have emerged as versatile agents capable of locomoting and manipulating in complex environments. Traditional designs typically rely on the robot's inherent body parts or incorporate top-mounted arms for manipulation tasks. However, these configurations may limit the robot's operational dexterity, efficiency and adaptability, particularly in cluttered or constrained spaces. In th… ▽ More Quadrupedal robots have emerged as versatile agents capable of locomoting and manipulating in complex environments. Traditional designs typically rely on the robot's inherent body parts or incorporate top-mounted arms for manipulation tasks. However, these configurations may limit the robot's operational dexterity, efficiency and adaptability, particularly in cluttered or constrained spaces. In this work, we present LocoMan, a dexterous quadrupedal robot with a novel morphology to perform versatile manipulation in diverse constrained environments. By equip** a Unitree Go1 robot with two low-cost and lightweight modular 3-DoF loco-manipulators on its front calves, LocoMan leverages the combined mobility and functionality of the legs and grippers for complex manipulation tasks that require precise 6D positioning of the end effector in a wide workspace. To harness the loco-manipulation capabilities of LocoMan, we introduce a unified control framework that extends the whole-body controller (WBC) to integrate the dynamics of loco-manipulators. Through experiments, we validate that the proposed whole-body controller can accurately and stably follow desired 6D trajectories of the end effector and torso, which, when combined with the large workspace from our design, facilitates a diverse set of challenging dexterous loco-manipulation tasks in confined spaces, such as opening doors, plugging into sockets, picking objects in narrow and low-lying spaces, and bimanual manipulation. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: Project page: https://linchangyi1.github.io/LocoMan

arXiv:2403.17354 [pdf]

Large topological Hall effect arising from spin reorientation in kagome magnet Fe3Ge

Authors: Zixuan Zhang, Mingyue Zhao, Li Ma, Guoke Li, Congmian Zhen, Dewei Zhao, Denglu Hou

Abstract: Materials systems with spin chirality can provide ultra-high-density, ultra-fast, and ultralow-power information carriers for digital transformation. These material systems include magnetic skyrmions, chiral domain walls, spin reorientation,and so on. The topological Hall effect (THE) has been identified as the most convenient and effective tool for detecting the presence of spin chirality in thes… ▽ More Materials systems with spin chirality can provide ultra-high-density, ultra-fast, and ultralow-power information carriers for digital transformation. These material systems include magnetic skyrmions, chiral domain walls, spin reorientation,and so on. The topological Hall effect (THE) has been identified as the most convenient and effective tool for detecting the presence of spin chirality in these systems. The research on the THE that may arise from spin reorientation and specifically in Fe3Ge with spin reorientation remains an unexplored area, so we study the THE in Fe3Ge Conduct systematic research. X-Ray Diffraction (XRD) results indicate that our Fe3Ge ribbon sample has a D019 structure. First-principles calculations and magnetic and electrical testing confirm spin reorientation in the Fe3Ge ribbon sample at 350 K.The Hall resistivity test results are consistent with our expectations, indicating the presence of the THE in the Fe3Ge ribbon sample. The topological Hall resistivity reaches a maximum value of 0.69 mΩ cm at 400 K. For the first time, a detailed experimental study of the THE in Fe3Ge with spin reorientation has been conducted, introducing a new member to the family of THE. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.13208 [pdf, other]

CaDRE: Controllable and Diverse Generation of Safety-Critical Driving Scenarios using Real-World Trajectories

Authors: Peide Huang, Wenhao Ding, Jonathan Francis, Bingqing Chen, Ding Zhao

Abstract: Simulation is an indispensable tool in the development and testing of autonomous vehicles (AVs), offering an efficient and safe alternative to road testing by allowing the exploration of a wide range of scenarios. Despite its advantages, a significant challenge within simulation-based testing is the generation of safety-critical scenarios, which are essential to ensure that AVs can handle rare but… ▽ More Simulation is an indispensable tool in the development and testing of autonomous vehicles (AVs), offering an efficient and safe alternative to road testing by allowing the exploration of a wide range of scenarios. Despite its advantages, a significant challenge within simulation-based testing is the generation of safety-critical scenarios, which are essential to ensure that AVs can handle rare but potentially fatal situations. This paper addresses this challenge by introducing a novel generative framework, CaDRE, which is specifically designed for generating diverse and controllable safety-critical scenarios using real-world trajectories. Our approach optimizes for both the quality and diversity of scenarios by employing a unique formulation and algorithm that integrates real-world data, domain knowledge, and black-box optimization techniques. We validate the effectiveness of our framework through extensive testing in three representative types of traffic scenarios. The results demonstrate superior performance in generating diverse and high-quality scenarios with greater sample efficiency than existing reinforcement learning and sampling-based methods. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.12969 [pdf, other]

Entangling Machine Learning with Quantum Tensor Networks

Authors: Constantijn van der Poel, Dan Zhao

Abstract: This paper examines the use of tensor networks, which can efficiently represent high-dimensional quantum states, in language modeling. It is a distillation and continuation of the work done in (van der Poel, 2023). To do so, we will abstract the problem down to modeling Motzkin spin chains, which exhibit long-range correlations reminiscent of those found in language. The Matrix Product State (MPS)… ▽ More This paper examines the use of tensor networks, which can efficiently represent high-dimensional quantum states, in language modeling. It is a distillation and continuation of the work done in (van der Poel, 2023). To do so, we will abstract the problem down to modeling Motzkin spin chains, which exhibit long-range correlations reminiscent of those found in language. The Matrix Product State (MPS), also known as the tensor train, has a bond dimension which scales as the length of the sequence it models. To combat this, we use the factored core MPS, whose bond dimension scales sub-linearly. We find that the tensor models reach near perfect classifying ability, and maintain a stable level of performance as the number of valid training examples is decreased. △ Less

Submitted 8 January, 2024; originally announced March 2024.

Comments: See source code at https://github.com/ConstantijnvdP/eidolon

arXiv:2403.11439 [pdf, other]

StyleChat: Learning Recitation-Augmented Memory in LLMs for Stylized Dialogue Generation

Authors: **peng Li, Zekai Zhang, Quan Tu, Xin Cheng, Dongyan Zhao, Rui Yan

Abstract: Large Language Models (LLMs) demonstrate superior performance in generative scenarios and have attracted widespread attention. Among them, stylized dialogue generation is essential in the context of LLMs for building intelligent and engaging dialogue agent. However the ability of LLMs is data-driven and limited by data bias, leading to poor performance on specific tasks. In particular, stylized di… ▽ More Large Language Models (LLMs) demonstrate superior performance in generative scenarios and have attracted widespread attention. Among them, stylized dialogue generation is essential in the context of LLMs for building intelligent and engaging dialogue agent. However the ability of LLMs is data-driven and limited by data bias, leading to poor performance on specific tasks. In particular, stylized dialogue generation suffers from a severe lack of supervised data. Furthermore, although many prompt-based methods have been proposed to accomplish specific tasks, their performance in complex real-world scenarios involving a wide variety of dialog styles further enhancement. In this work, we first introduce a stylized dialogue dataset StyleEval with 38 styles by leveraging the generative power of LLMs comprehensively, which has been carefully constructed with rigorous human-led quality control. Based on this, we propose the stylized dialogue framework StyleChat via recitation-augmented memory strategy and multi-task style learning strategy to promote generalization ability. To evaluate the effectiveness of our approach, we created a test benchmark that included both a generation task and a choice task to comprehensively evaluate trained models and assess whether styles and preferences are remembered and understood. Experimental results show that our proposed framework StyleChat outperforms all the baselines and helps to break the style boundary of LLMs. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.10228 [pdf, other]

HawkEye: Training Video-Text LLMs for Grounding Text in Videos

Authors: Yueqian Wang, Xiaojun Meng, Jianxin Liang, Yuxuan Wang, Qun Liu, Dongyan Zhao

Abstract: Video-text Large Language Models (video-text LLMs) have shown remarkable performance in answering questions and holding conversations on simple videos. However, they perform almost the same as random on grounding text queries in long and complicated videos, having little ability to understand and reason about temporal information, which is the most fundamental difference between videos and images.… ▽ More Video-text Large Language Models (video-text LLMs) have shown remarkable performance in answering questions and holding conversations on simple videos. However, they perform almost the same as random on grounding text queries in long and complicated videos, having little ability to understand and reason about temporal information, which is the most fundamental difference between videos and images. In this paper, we propose HawkEye, one of the first video-text LLMs that can perform temporal video grounding in a fully text-to-text manner. To collect training data that is applicable for temporal video grounding, we construct InternVid-G, a large-scale video-text corpus with segment-level captions and negative spans, with which we introduce two new time-aware training objectives to video-text LLMs. We also propose a coarse-grained method of representing segments in videos, which is more robust and easier for LLMs to learn and follow than other alternatives. Extensive experiments show that HawkEye is better at temporal video grounding and comparable on other video-text tasks with existing video-text LLMs, which verifies its superior video-text multi-modal understanding abilities. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.09971 [pdf, other]

Advancing Object Goal Navigation Through LLM-enhanced Object Affinities Transfer

Authors: Mengying Lin, Yaran Chen, Dongbin Zhao, Zhaoran Wang

Abstract: In object goal navigation, agents navigate towards objects identified by category labels using visual and spatial information. Previously, solely network-based methods typically rely on historical data for object affinities estimation, lacking adaptability to new environments and unseen targets. Simultaneously, employing Large Language Models (LLMs) for navigation as either planners or agents, tho… ▽ More In object goal navigation, agents navigate towards objects identified by category labels using visual and spatial information. Previously, solely network-based methods typically rely on historical data for object affinities estimation, lacking adaptability to new environments and unseen targets. Simultaneously, employing Large Language Models (LLMs) for navigation as either planners or agents, though offering a broad knowledge base, is cost-inefficient and lacks targeted historical experience. Addressing these challenges, we present the LLM-enhanced Object Affinities Transfer (LOAT) framework, integrating LLM-derived object semantics with network-based approaches to leverage experiential object affinities, thus improving adaptability in unfamiliar settings. LOAT employs a dual-module strategy: a generalized affinities module for accessing LLMs' vast knowledge and an experiential affinities module for applying learned object semantic relationships, complemented by a dynamic fusion module harmonizing these information sources based on temporal context. The resulting scores activate semantic maps before feeding into downstream policies, enhancing navigation systems with context-aware inputs. Our evaluations in AI2-THOR and Habitat simulators demonstrate improvements in both navigation success rates and efficiency, validating the LOAT's efficacy in integrating LLM insights for improved object goal navigation. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.09933 [pdf, other]

Design and Control Co-Optimization for Automated Design Iteration of Dexterous Anthropomorphic Soft Robotic Hands

Authors: Pragna Mannam, Xingyu Liu, Ding Zhao, Jean Oh, Nancy Pollard

Abstract: We automate soft robotic hand design iteration by co-optimizing design and control policy for dexterous manipulation skills in simulation. Our design iteration pipeline combines genetic algorithms and policy transfer to learn control policies for nearly 400 hand designs, testing grasp quality under external force disturbances. We validate the optimized designs in the real world through teleoperati… ▽ More We automate soft robotic hand design iteration by co-optimizing design and control policy for dexterous manipulation skills in simulation. Our design iteration pipeline combines genetic algorithms and policy transfer to learn control policies for nearly 400 hand designs, testing grasp quality under external force disturbances. We validate the optimized designs in the real world through teleoperation of pickup and reorient manipulation tasks. Our real world evaluation, from over 900 teleoperated tasks, shows that the trend in design performance in simulation resembles that of the real world. Furthermore, we show that optimized hand designs from our approach outperform existing soft robot hands from prior work in the real world. The results highlight the usefulness of simulation in guiding parameter choices for anthropomorphic soft robotic hand systems, and the effectiveness of our automated design iteration approach, despite the sim-to-real gap. △ Less

Submitted 25 June, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Journal ref: IEEE-RAS International Conference on Soft Robotics (RoboSoft) 2024

arXiv:2403.09650 [pdf, ps, other]

doi 10.7153/mia-2023-26-49

Discrete Opial type inequalities for interval-valued functions

Authors: Dafang Zhao, Xuexiao You, Delfim F. M. Torres

Abstract: We introduce the forward (backward) gH-difference operator of interval sequences, and establish some new discrete Opial type inequalities for interval-valued functions. Further, we obtain generalizations of classical discrete Opial type inequalities. Some examples are presented to illustrate our results. We introduce the forward (backward) gH-difference operator of interval sequences, and establish some new discrete Opial type inequalities for interval-valued functions. Further, we obtain generalizations of classical discrete Opial type inequalities. Some examples are presented to illustrate our results. △ Less

Submitted 22 November, 2023; originally announced March 2024.

Comments: This is a preprint of a paper whose final and definite form is published in Mathematical Inequalities & Applications

MSC Class: 26D15; 26E50; 65G30

Journal ref: Math. Inequal. Appl. 26 (2023), no. 4, 811--826

arXiv:2403.06408 [pdf, other]

What Makes Quantization for Large Language Models Hard? An Empirical Study from the Lens of Perturbation

Authors: Zhuocheng Gong, Jiahao Liu, **gang Wang, Xunliang Cai, Dongyan Zhao, Rui Yan

Abstract: Quantization has emerged as a promising technique for improving the memory and computational efficiency of large language models (LLMs). Though the trade-off between performance and efficiency is well-known, there is still much to be learned about the relationship between quantization and LLM performance. To shed light on this relationship, we propose a new perspective on quantization, viewing it… ▽ More Quantization has emerged as a promising technique for improving the memory and computational efficiency of large language models (LLMs). Though the trade-off between performance and efficiency is well-known, there is still much to be learned about the relationship between quantization and LLM performance. To shed light on this relationship, we propose a new perspective on quantization, viewing it as perturbations added to the weights and activations of LLMs. We call this approach "the lens of perturbation". Using this lens, we conduct experiments with various artificial perturbations to explore their impact on LLM performance. Our findings reveal several connections between the properties of perturbations and LLM performance, providing insights into the failure cases of uniform quantization and suggesting potential solutions to improve the robustness of LLM quantization. To demonstrate the significance of our findings, we implement a simple non-uniform quantization approach based on our insights. Our experiments show that this approach achieves minimal performance degradation on both 4-bit weight quantization and 8-bit quantization for weights and activations. These results validate the correctness of our approach and highlight its potential to improve the efficiency of LLMs without sacrificing performance. △ Less

Submitted 10 March, 2024; originally announced March 2024.

arXiv:2403.05169 [pdf, ps, other]

Bivariate $Q$-polynomial structures for the nonbinary Johnson scheme and the association scheme obtained from attenuated spaces

Authors: Eiichi Bannai, Hirotake Kurihara, Da Zhao, Yan Zhu

Abstract: The study of $P$-polynomial association schemes (distance-regular graphs) and $Q$-polynomial association schemes, and in particular $P$- and $Q$-polynomial association schemes, has been a central theme not only in the theory of association schemes but also in the whole study of algebraic combinatorics in general. Leonard's theorem (1982) says that the spherical functions (or the character tables)… ▽ More The study of $P$-polynomial association schemes (distance-regular graphs) and $Q$-polynomial association schemes, and in particular $P$- and $Q$-polynomial association schemes, has been a central theme not only in the theory of association schemes but also in the whole study of algebraic combinatorics in general. Leonard's theorem (1982) says that the spherical functions (or the character tables) of $P$- and $Q$-polynomial association schemes are described by Askey-Wilson orthogonal polynomials or their relatives. These polynomials are one-variable orthogonal polynomials. It seems that the new attempt to define and study higher rank $P$- and $Q$-polynomial association schemes had been hoped for, but had gotten only limited success. The first very successful attempt was initiated recently by Bernard-Crampé-d'Andecy-Vinet-Zaimi [arXiv:2212.10824], and then followed by Bannai-Kurihara-Zhao-Zhu [arXiv:2305.00707]. The general theory and some explicit examples of families of higher rank (multivariate) $P$- and/or $Q$-polynomial association schemes have been obtained there. The main purpose of the present paper is to prove that some important families of association schemes are shown to be bivariate $Q$-polynomial. Namely, we show that all the nonbinary Johnson association schemes and all the attenuated space association schemes are bivariate $Q$-polynomial. It should be noted that the parameter restrictions needed in the previous papers are completely lifted in this paper. Our proofs are done by explicitly calculating the Krein parameters of these association schemes. At the end, we mention some speculations and indications of what we can expect in the future study. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: 31 pages, no figure

MSC Class: 05E30; 20C15

arXiv:2403.04553 [pdf, other]

Improvements & Evaluations on the MLCommons CloudMask Benchmark

Authors: Varshitha Chennamsetti, Laiba Mehnaz, Dan Zhao, Banani Ghosh, Sergey V. Samsonau

Abstract: In this paper, we report the performance benchmarking results of deep learning models on MLCommons' Science cloud-masking benchmark using a high-performance computing cluster at New York University (NYU): NYU Greene. MLCommons is a consortium that develops and maintains several scientific benchmarks that can benefit from developments in AI. We provide a description of the cloud-masking benchmark t… ▽ More In this paper, we report the performance benchmarking results of deep learning models on MLCommons' Science cloud-masking benchmark using a high-performance computing cluster at New York University (NYU): NYU Greene. MLCommons is a consortium that develops and maintains several scientific benchmarks that can benefit from developments in AI. We provide a description of the cloud-masking benchmark task, updated code, and the best model for this benchmark when using our selected hyperparameter settings. Our benchmarking results include the highest accuracy achieved on the NYU system as well as the average time taken for both training and inference on the benchmark across several runs/seeds. Our code can be found on GitHub. MLCommons team has been kept informed about our progress and may use the developed code for their future work. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:2401.08636

arXiv:2403.04365 [pdf, other]

DV-Hop localization based on Distance Estimation using Multinode and Hop Loss in WSNs

Authors: Penghong Wang, Xingtao Wang, Wenrui Li, Xiaopeng Fan, Debin Zhao

Abstract: Location awareness is a critical issue in wireless sensor network applications. For more accurate location estimation, the two issues should be considered extensively: 1) how to sufficiently utilize the connection information between multiple nodes and 2) how to select a suitable solution from multiple solutions obtained by the Euclidean distance loss. In this paper, a DV-Hop localization based on… ▽ More Location awareness is a critical issue in wireless sensor network applications. For more accurate location estimation, the two issues should be considered extensively: 1) how to sufficiently utilize the connection information between multiple nodes and 2) how to select a suitable solution from multiple solutions obtained by the Euclidean distance loss. In this paper, a DV-Hop localization based on the distance estimation using multinode (DEMN) and the hop loss in WSNs is proposed to address the two issues. In DEMN, when multiple anchor nodes can detect an unknown node, the distance expectation between the unknown node and an anchor node is calculated using the cross-domain information and is considered as the expected distance between them, which narrows the search space. When minimizing the traditional Euclidean distance loss, multiple solutions may exist. To select a suitable solution, the hop loss is proposed, which minimizes the difference between the real and its predicted hops. Finally, the Euclidean distance loss calculated by the DEMN and the hop loss are embedded into the multi-objective optimization algorithm. The experimental results show that the proposed method gains 86.11\% location accuracy in the randomly distributed network, which is 6.05% better than the DEM-DV-Hop, while DEMN and the hop loss can contribute 2.46% and 3.41%, respectively. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.04296 [pdf, other]

Variational quantum eigensolver with linear depth problem-inspired ansatz for solving portfolio optimization in finance

Authors: Shengbin Wang, Peng Wang, Guihui Li, Shubin Zhao, Dongyi Zhao, **g Wang, Yuan Fang, Menghan Dou, Yongjian Gu, Yu-Chun Wu, Guo-** Guo

Abstract: Great efforts have been dedicated in recent years to explore practical applications for noisy intermediate-scale quantum (NISQ) computers, which is a fundamental and challenging problem in quantum computing. As one of the most promising methods, the variational quantum eigensolver (VQE) has been extensively studied. In this paper, VQE is applied to solve portfolio optimization problems in finance… ▽ More Great efforts have been dedicated in recent years to explore practical applications for noisy intermediate-scale quantum (NISQ) computers, which is a fundamental and challenging problem in quantum computing. As one of the most promising methods, the variational quantum eigensolver (VQE) has been extensively studied. In this paper, VQE is applied to solve portfolio optimization problems in finance by designing two hardware-efficient Dicke state ansatze that reach a maximum of 2n two-qubit gate depth and n^2/4 parameters, with n being the number of qubits used. Both ansatze are partitioning-friendly, allowing for the proposal of a highly scalable quantum/classical hybrid distributed computing (HDC) scheme. Combining simultaneous sampling, problem-specific measurement error mitigation, and fragment reuse techniques, we successfully implement the HDC experiments on the superconducting quantum computer Wu Kong with up to 55 qubits. The simulation and experimental results illustrate that the restricted expressibility of the ansatze, induced by the small number of parameters and limited entanglement, is advantageous for solving classical optimization problems with the cost function of the conditional value-at-risk (CVaR) for the NISQ era and beyond. Furthermore, the HDC scheme shows great potential for achieving quantum advantage in the NISQ era. We hope that the heuristic idea presented in this paper can motivate fruitful investigations in current and future quantum computing paradigms. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 21 pages, 20 figures

arXiv:2403.03788 [pdf, other]

PPTC-R benchmark: Towards Evaluating the Robustness of Large Language Models for PowerPoint Task Completion

Authors: Zekai Zhang, Yiduo Guo, Yaobo Liang, Dongyan Zhao, Nan Duan

Abstract: The growing dependence on Large Language Models (LLMs) for finishing user instructions necessitates a comprehensive understanding of their robustness to complex task completion in real-world situations. To address this critical need, we propose the PowerPoint Task Completion Robustness benchmark (PPTC-R) to measure LLMs' robustness to the user PPT task instruction and software version. Specificall… ▽ More The growing dependence on Large Language Models (LLMs) for finishing user instructions necessitates a comprehensive understanding of their robustness to complex task completion in real-world situations. To address this critical need, we propose the PowerPoint Task Completion Robustness benchmark (PPTC-R) to measure LLMs' robustness to the user PPT task instruction and software version. Specifically, we construct adversarial user instructions by attacking user instructions at sentence, semantic, and multi-language levels. To assess the robustness of Language Models to software versions, we vary the number of provided APIs to simulate both the newest version and earlier version settings. Subsequently, we test 3 closed-source and 4 open-source LLMs using a benchmark that incorporates these robustness settings, aiming to evaluate how deviations impact LLMs' API calls for task completion. We find that GPT-4 exhibits the highest performance and strong robustness in our benchmark, particularly in the version update and the multilingual settings. However, we find that all LLMs lose their robustness when confronted with multiple challenges (e.g., multi-turn) simultaneously, leading to significant performance drops. We further analyze the robustness behavior and error reasons of LLMs in our benchmark, which provide valuable insights for researchers to understand the LLM's robustness in task completion and develop more robust LLMs and agents. We release the code and data at \url{https://github.com/ZekaiGalaxy/PPTCR}. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: LLM evaluation, Multi-turn, Multi-language, Multi-modal benchmark

arXiv:2403.03068 [pdf, other]

Enhancing single-atom loading in tightly confined dipole traps with ancillary dipole beam

Authors: Guang-Jie Chen, Zhu-Bo Wang, Chenyue Gu, Dong Zhao, Ji-Zhe Zhang, Yan-Lei Zhang, Chun-Hua Dong, Kun Huang, Guang-Can Guo, Chang-Ling Zou

Abstract: Single atoms trapped in tightly focused optical dipole traps provide an excellent experimental platform for quantum computing, precision measurement, and fundamental physics research. In this work, we propose and demonstrate a novel approach to enhancing the loading of single atoms by introducing a weak ancillary dipole beam. The loading rate of single atoms in a dipole trap can be significantly i… ▽ More Single atoms trapped in tightly focused optical dipole traps provide an excellent experimental platform for quantum computing, precision measurement, and fundamental physics research. In this work, we propose and demonstrate a novel approach to enhancing the loading of single atoms by introducing a weak ancillary dipole beam. The loading rate of single atoms in a dipole trap can be significantly improved by only a few tens of microwatts of counter-propagating beam. It was also demonstrated that multiple atoms could be loaded with the assistance of a counter-propagating beam. By reducing the power requirements for trap** single atoms and enabling the trap** of multiple atoms, our method facilitates the extension of single-atom arrays and the investigation of collective light-atom interactions. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 10 pages, 7 figures

arXiv:2403.01451 [pdf, other]

Enhancing Data Provenance and Model Transparency in Federated Learning Systems -- A Database Approach

Authors: Michael Gu, Ramasoumya Naraparaju, Dongfang Zhao

Abstract: Federated Learning (FL) presents a promising paradigm for training machine learning models across decentralized edge devices while preserving data privacy. Ensuring the integrity and traceability of data across these distributed environments, however, remains a critical challenge. The ability to create transparent artificial intelligence, such as detailing the training process of a machine learnin… ▽ More Federated Learning (FL) presents a promising paradigm for training machine learning models across decentralized edge devices while preserving data privacy. Ensuring the integrity and traceability of data across these distributed environments, however, remains a critical challenge. The ability to create transparent artificial intelligence, such as detailing the training process of a machine learning model, has become an increasingly prominent concern due to the large number of sensitive (hyper)parameters it utilizes; thus, it is imperative to strike a reasonable balance between openness and the need to protect sensitive information. In this paper, we propose one of the first approaches to enhance data provenance and model transparency in federated learning systems. Our methodology leverages a combination of cryptographic techniques and efficient model management to track the transformation of data throughout the FL process, and seeks to increase the reproducibility and trustworthiness of a trained FL model. We demonstrate the effectiveness of our approach through experimental evaluations on diverse FL scenarios, showcasing its ability to tackle accountability and explainability across the board. Our findings show that our system can greatly enhance data transparency in various FL environments by storing chained cryptographic hashes and client model snapshots in our proposed design for data decoupled FL. This is made possible by also employing multiple optimization techniques which enables comprehensive data provenance without imposing substantial computational loads. Extensive experimental results suggest that integrating a database subsystem into federated learning systems can improve data provenance in an efficient manner, encouraging secure FL adoption in privacy-sensitive applications and paving the way for future advancements in FL transparency and security features. △ Less

Submitted 3 March, 2024; originally announced March 2024.

Comments: 14 pages, 16 figures

arXiv:2402.19111 [pdf, other]

Deep Network for Image Compressed Sensing Coding Using Local Structural Sampling

Authors: Wenxue Cui, Xingtao Wang, Xiaopeng Fan, Shaohui Liu, Xinwei Gao, Debin Zhao

Abstract: Existing image compressed sensing (CS) coding frameworks usually solve an inverse problem based on measurement coding and optimization-based image reconstruction, which still exist the following two challenges: 1) The widely used random sampling matrix, such as the Gaussian Random Matrix (GRM), usually leads to low measurement coding efficiency. 2) The optimization-based reconstruction methods gen… ▽ More Existing image compressed sensing (CS) coding frameworks usually solve an inverse problem based on measurement coding and optimization-based image reconstruction, which still exist the following two challenges: 1) The widely used random sampling matrix, such as the Gaussian Random Matrix (GRM), usually leads to low measurement coding efficiency. 2) The optimization-based reconstruction methods generally maintain a much higher computational complexity. In this paper, we propose a new CNN based image CS coding framework using local structural sampling (dubbed CSCNet) that includes three functional modules: local structural sampling, measurement coding and Laplacian pyramid reconstruction. In the proposed framework, instead of GRM, a new local structural sampling matrix is first developed, which is able to enhance the correlation between the measurements through a local perceptual sampling strategy. Besides, the designed local structural sampling matrix can be jointly optimized with the other functional modules during training process. After sampling, the measurements with high correlations are produced, which are then coded into final bitstreams by the third-party image codec. At last, a Laplacian pyramid reconstruction network is proposed to efficiently recover the target image from the measurement domain to the image domain. Extensive experimental results demonstrate that the proposed scheme outperforms the existing state-of-the-art CS coding methods, while maintaining fast computational speed. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: Accepted by ACM Transactions on Multimedia Computing Communications and Applications (TOMM)

arXiv:2402.18784 [pdf, other]

Brain-inspired and Self-based Artificial Intelligence

Authors: Yi Zeng, Feifei Zhao, Yuxuan Zhao, Dongcheng Zhao, Enmeng Lu, Qian Zhang, Yuwei Wang, Hui Feng, Zhuoya Zhao, Jihang Wang, Qingqun Kong, Yinqian Sun, Yang Li, Guobin Shen, Bing Han, Yiting Dong, Wenxuan Pan, Xiang He, Aorigele Bao, ** Wang

Abstract: The question "Can machines think?" and the Turing Test to assess whether machines could achieve human-level intelligence is one of the roots of AI. With the philosophical argument "I think, therefore I am", this paper challenge the idea of a "thinking machine" supported by current AIs since there is no sense of self in them. Current artificial intelligence is only seemingly intelligent information… ▽ More The question "Can machines think?" and the Turing Test to assess whether machines could achieve human-level intelligence is one of the roots of AI. With the philosophical argument "I think, therefore I am", this paper challenge the idea of a "thinking machine" supported by current AIs since there is no sense of self in them. Current artificial intelligence is only seemingly intelligent information processing and does not truly understand or be subjectively aware of oneself and perceive the world with the self as human intelligence does. In this paper, we introduce a Brain-inspired and Self-based Artificial Intelligence (BriSe AI) paradigm. This BriSe AI paradigm is dedicated to coordinating various cognitive functions and learning strategies in a self-organized manner to build human-level AI models and robotic applications. Specifically, BriSe AI emphasizes the crucial role of the Self in sha** the future AI, rooted with a practical hierarchical Self framework, including Perception and Learning, Bodily Self, Autonomous Self, Social Self, and Conceptual Self. The hierarchical framework of the Self highlights self-based environment perception, self-bodily modeling, autonomous interaction with the environment, social interaction and collaboration with others, and even more abstract understanding of the Self. Furthermore, the positive mutual promotion and support among multiple levels of Self, as well as between Self and learning, enhance the BriSe AI's conscious understanding of information and flexible adaptation to complex environments, serving as a driving force propelling BriSe AI towards real Artificial General Intelligence. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.18593 [pdf, other]

doi 10.1145/3620678.3624793

Sustainable Supercomputing for AI: GPU Power Cap** at HPC Scale

Authors: Dan Zhao, Siddharth Samsi, Joseph McDonald, Baolin Li, David Bestor, Michael Jones, Devesh Tiwari, Vijay Gadepally

Abstract: As research and deployment of AI grows, the computational burden to support and sustain its progress inevitably does too. To train or fine-tune state-of-the-art models in NLP, computer vision, etc., some form of AI hardware acceleration is virtually a requirement. Recent large language models require considerable resources to train and deploy, resulting in significant energy usage, potential carbo… ▽ More As research and deployment of AI grows, the computational burden to support and sustain its progress inevitably does too. To train or fine-tune state-of-the-art models in NLP, computer vision, etc., some form of AI hardware acceleration is virtually a requirement. Recent large language models require considerable resources to train and deploy, resulting in significant energy usage, potential carbon emissions, and massive demand for GPUs and other hardware accelerators. However, this surge carries large implications for energy sustainability at the HPC/datacenter level. In this paper, we study the aggregate effect of power-cap** GPUs on GPU temperature and power draw at a research supercomputing center. With the right amount of power-cap**, we show significant decreases in both temperature and power draw, reducing power consumption and potentially improving hardware life-span with minimal impact on job performance. While power-cap** reduces power draw by design, the aggregate system-wide effect on overall energy consumption is less clear; for instance, if users notice job performance degradation from GPU power-caps, they may request additional GPU-jobs to compensate, negating any energy savings or even worsening energy consumption. To our knowledge, our work is the first to conduct and make available a detailed analysis of the effects of GPU power-cap** at the supercomputing scale. We hope our work will inspire HPCs/datacenters to further explore, evaluate, and communicate the impact of power-cap** AI hardware accelerators for more sustainable AI. △ Less

Submitted 24 February, 2024; originally announced February 2024.

arXiv:2402.17365 [pdf, ps, other]

doi 10.1088/0256-307X/41/2/029701

A Search for Radio Pulsars in Supernova Remnants Using FAST with One Pulsar Discovered

Authors: Zhen Zhang, Wen-Ming Yan, Jian-** Yuan, Na Wang, Jun-Tao Bai, Zhi-Gang Wen, Bao-Da Li, **-Tao Xie, De Zhao, Yu-Bin Wang, Nan-Nan Zhai

Abstract: We report on the results of a search for radio pulsars in five supernova remnants (SNRs) with FAST. The observations were made using the 19-beam receiver in the Snapshot mode. The integration time for each pointing is 10 min. We discovered a new pulsar PSR J1845$-$0306 which has a spin period of 983.6 ms and a dispersion measure of 444.6$\pm$2.0 cm$^{-3}$ pc in observations of SNR G29.6+0.1. To ju… ▽ More We report on the results of a search for radio pulsars in five supernova remnants (SNRs) with FAST. The observations were made using the 19-beam receiver in the Snapshot mode. The integration time for each pointing is 10 min. We discovered a new pulsar PSR J1845$-$0306 which has a spin period of 983.6 ms and a dispersion measure of 444.6$\pm$2.0 cm$^{-3}$ pc in observations of SNR G29.6+0.1. To judge the association between the pulsar and the SNR, further verification is needed. We also re-detected some known pulsars in the data from SNRs G29.6+0.1 and G29.7$-$0.3. No pulsars were detected in observations of other three SNRs. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 6 pages, 2 figures, 2 tables published in CPL

Journal ref: Chin. Phys. Lett. 2024, 41 (2): 029701 February 2024

arXiv:2402.17304 [pdf, ps, other]

Probing Multimodal Large Language Models for Global and Local Semantic Representations

Authors: Mingxu Tao, Quzhe Huang, Kun Xu, Liwei Chen, Yansong Feng, Dongyan Zhao

Abstract: The advancement of Multimodal Large Language Models (MLLMs) has greatly accelerated the development of applications in understanding integrated texts and images. Recent works leverage image-caption datasets to train MLLMs, achieving state-of-the-art performance on image-to-text tasks. However, there are few studies exploring which layers of MLLMs make the most effort to the global image informatio… ▽ More The advancement of Multimodal Large Language Models (MLLMs) has greatly accelerated the development of applications in understanding integrated texts and images. Recent works leverage image-caption datasets to train MLLMs, achieving state-of-the-art performance on image-to-text tasks. However, there are few studies exploring which layers of MLLMs make the most effort to the global image information, which plays vital roles in multimodal comprehension and generation. In this study, we find that the intermediate layers of models can encode more global semantic information, whose representation vectors perform better on visual-language entailment tasks, rather than the topmost layers. We further probe models regarding local semantic representations through object recognition tasks. We find that the topmost layers may excessively focus on local information, leading to a diminished ability to encode global information. Our code and data are released via https://github.com/kobayashikanna01/probing_MLLM_rep. △ Less

Submitted 26 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: Accepted by LREC-COLING 2024 as a short paper (Camera Ready)

arXiv:2402.16313 [pdf, other]

Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering

Authors: Mingxu Tao, Dongyan Zhao, Yansong Feng

Abstract: Open-ended question answering requires models to find appropriate evidence to form well-reasoned, comprehensive and helpful answers. In practical applications, models also need to engage in extended discussions on potential scenarios closely relevant to the question. With augmentation of retrieval module, open-source Large Language Models (LLMs) can produce coherent answers often with different fo… ▽ More Open-ended question answering requires models to find appropriate evidence to form well-reasoned, comprehensive and helpful answers. In practical applications, models also need to engage in extended discussions on potential scenarios closely relevant to the question. With augmentation of retrieval module, open-source Large Language Models (LLMs) can produce coherent answers often with different focuses, but are still sub-optimal in terms of reliable evidence selection and in-depth question analysis. In this paper, we propose a novel Chain-of-Discussion framework to leverage the synergy among multiple open-source LLMs aiming to provide \textbf{more correct} and \textbf{more comprehensive} answers for open-ended QA, although they are not strong enough individually. Our experiments show that discussions among multiple LLMs play a vital role in enhancing the quality of answers. We release our data and code at \url{https://github.com/kobayashikanna01/Chain-of-Discussion}. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: Under review

arXiv:2402.16050 [pdf, other]

LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding

Authors: Yuxuan Wang, Yueqian Wang, Pengfei Wu, Jianxin Liang, Dongyan Zhao, Zilong Zheng

Abstract: Despite progress in video-language modeling, the computational challenge of interpreting long-form videos in response to task-specific linguistic queries persists, largely due to the complexity of high-dimensional video data and the misalignment between language and visual cues over space and time. To tackle this issue, we introduce a novel approach called Language-guided Spatial-Temporal Prompt L… ▽ More Despite progress in video-language modeling, the computational challenge of interpreting long-form videos in response to task-specific linguistic queries persists, largely due to the complexity of high-dimensional video data and the misalignment between language and visual cues over space and time. To tackle this issue, we introduce a novel approach called Language-guided Spatial-Temporal Prompt Learning (LSTP). This approach features two key components: a Temporal Prompt Sampler (TPS) with optical flow prior that leverages temporal information to efficiently extract relevant video content, and a Spatial Prompt Solver (SPS) that adeptly captures the intricate spatial relationships between visual and textual elements. By harmonizing TPS and SPS with a cohesive training strategy, our framework significantly enhances computational efficiency, temporal understanding, and spatial-temporal alignment. Empirical evaluations across two challenging tasks--video question answering and temporal question grounding in videos--using a variety of video-language pretrainings (VLPs) and large language models (LLMs) demonstrate the superior performance, speed, and versatility of our proposed LSTP paradigm. △ Less

Submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.15999 [pdf]

Revelation of new magnetic domain wall category in the itinerant antiferromagnet Chromium

Authors: Yining Hu, Xu Wang, Chen Chen, Qingle Zhang, Dongming Zhao, Tianzhen Zhang, Chenxi Wang, Donglai Feng, Tong Zhang

Abstract: Conventional magnetic domain walls are characterized by the reorientation of local moments. However, what occurs at the boundary of itinerant magnets is largely unknown. Here using spin-sensitive scanning tunneling microscopy, we investigated the microscopic boundaries of spin-density-wave (SDW) state in a prototypical itinerant anti-ferromagnet of Cr. We find at the boundary of two incommensurate… ▽ More Conventional magnetic domain walls are characterized by the reorientation of local moments. However, what occurs at the boundary of itinerant magnets is largely unknown. Here using spin-sensitive scanning tunneling microscopy, we investigated the microscopic boundaries of spin-density-wave (SDW) state in a prototypical itinerant anti-ferromagnet of Cr. We find at the boundary of two incommensurate SDW domains, the spins display finite-scale decay rather than reorientation. A novel double-Q SDW is generated with a second-order charge modulation. In commensurate SDW domains, a clear SDW gap is observed. Screw dislocations induced novel "half" vortex and anti-vortex that are connected by antiphase domain wall. This domain wall is characterized by vanishing spin density, where intriguing SDW in-gap states emerge, resembling the Andreev bound states in superconductors. All these unique SDW boundary structures can be viewed as consequences of local interference of two SDW, either with different Q or reversed phases. Therefore, our study revealed a new category of magnetic domain wall, the "interference wall", with a mechanism rooted in itinerant nature. △ Less

Submitted 25 February, 2024; originally announced February 2024.

Comments: 19 pages, 10 figures, supplementary materials included

arXiv:2402.15275 [pdf, ps, other]

doi 10.1007/s10686-024-09924-0

Simulation Studies for the First Pathfinder of the CATCH Space Mission

Authors: Yiming Huang, Juan Zhang, Lian Tao, Zhengwei Li, Donghua Zhao, Qian-Qing Yin, Xiangyang Wen, **gyu Xiao, Chen Zhang, Shuang-Nan Zhang, Shaolin Xiong, Qingcui Bu, Jirong Cang, Dezhi Cao, Wen Chen, Siran Ding, Min Gao, Yang Gao, Shu** Hou, Li** Jia, Ge **, Dalin Li, **song Li, Pan** Li, Yajun Li , et al. (20 additional authors not shown)

Abstract: The Chasing All Transients Constellation Hunters (CATCH) space mission is an intelligent constellation consisting of 126 micro-satellites in three types (A, B, and C), designed for X-ray observation with the objective of studying the dynamic universe. Currently, we are actively develo** the first Pathfinder (CATCH-1) for the CATCH mission, specifically for type-A satellites. CATCH-1 is equipped… ▽ More The Chasing All Transients Constellation Hunters (CATCH) space mission is an intelligent constellation consisting of 126 micro-satellites in three types (A, B, and C), designed for X-ray observation with the objective of studying the dynamic universe. Currently, we are actively develo** the first Pathfinder (CATCH-1) for the CATCH mission, specifically for type-A satellites. CATCH-1 is equipped with Micro Pore Optics (MPO) and a 4-pixel Silicon Drift Detector (SDD) array. To assess its scientific performance, including the effective area of the optical system, on-orbit background, and telescope sensitivity, we employ the Monte Carlo software Geant4 for simulation in this study. The MPO optics exhibit an effective area of $41$ cm$^2$ at the focal spot for 1 keV X-rays, while the entire telescope system achieves an effective area of $29$ cm$^2$ at 1 keV when taking into account the SDD detector's detection efficiency. The primary contribution to the background is found to be from the Cosmic X-ray Background. Assuming a 625 km orbit with an inclination of $29^\circ$, the total background for CATCH-1 is estimated to be $8.13\times10^{-2}$ counts s$^{-1}$ in the energy range of 0.5--4 keV. Based on the background within the central detector and assuming a Crab-like source spectrum, the estimated ideal sensitivity could achieve $1.9\times10^{-12}$ erg cm$^{-2}$ s$^{-1}$ for an exposure of 10$^4$ s in the energy band of 0.5--4 keV. Furthermore, after simulating the background caused by low-energy charged particles near the geomagnetic equator, we have determined that there is no need to install a magnetic deflector. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.13628 [pdf, other]

Improving Building Temperature Forecasting: A Data-driven Approach with System Scenario Clustering

Authors: Dafang Zhao, Zheng Chen, Zhengmao Li, Xiaolei Yuan, Ittetsu Taniguchi

Abstract: Heat, Ventilation and Air Conditioning (HVAC) systems play a critical role in maintaining a comfortable thermal environment and cost approximately 40% of primary energy usage in the building sector. For smart energy management in buildings, usage patterns and their resulting profiles allow the improvement of control systems with prediction capabilities. However, for large-scale HVAC system managem… ▽ More Heat, Ventilation and Air Conditioning (HVAC) systems play a critical role in maintaining a comfortable thermal environment and cost approximately 40% of primary energy usage in the building sector. For smart energy management in buildings, usage patterns and their resulting profiles allow the improvement of control systems with prediction capabilities. However, for large-scale HVAC system management, it is difficult to construct a detailed model for each subsystem. In this paper, a new data-driven room temperature prediction model is proposed based on the k-means clustering method. The proposed data-driven temperature prediction approach extracts the system operation feature through historical data analysis and further simplifies the system-level model to improve generalization and computational efficiency. We evaluate the proposed approach in the real world. The results demonstrated that our approach can significantly reduce modeling time without reducing prediction accuracy. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: Accepted and will be published on IEEE PES GM 2024

arXiv:2402.12728 [pdf, other]

Modality-Aware Integration with Large Language Models for Knowledge-based Visual Question Answering

Authors: Junnan Dong, Qinggang Zhang, Huachi Zhou, Daochen Zha, Pai Zheng, Xiao Huang

Abstract: Knowledge-based visual question answering (KVQA) has been extensively studied to answer visual questions with external knowledge, e.g., knowledge graphs (KGs). While several attempts have been proposed to leverage large language models (LLMs) as an implicit knowledge source, it remains challenging since LLMs may generate hallucinations. Moreover, multiple knowledge sources, e.g., images, KGs and L… ▽ More Knowledge-based visual question answering (KVQA) has been extensively studied to answer visual questions with external knowledge, e.g., knowledge graphs (KGs). While several attempts have been proposed to leverage large language models (LLMs) as an implicit knowledge source, it remains challenging since LLMs may generate hallucinations. Moreover, multiple knowledge sources, e.g., images, KGs and LLMs, cannot be readily aligned for complex scenarios. To tackle these, we present a novel modality-aware integration with LLMs for KVQA (MAIL). It carefully leverages multimodal knowledge for both image understanding and knowledge reasoning. Specifically, (i) we propose a two-stage prompting strategy with LLMs to densely embody the image into a scene graph with detailed visual features; (ii) We construct a coupled concept graph by linking the mentioned entities with external facts. (iii) A tailored pseudo-siamese graph medium fusion is designed for sufficient multimodal fusion. We utilize the shared mentioned entities in two graphs as mediums to bridge a tight inter-modal exchange, while maximally preserving insightful intra-modal learning by constraining the fusion within mediums. Extensive experiments on two benchmark datasets show the superiority of MAIL with 24x less resources. △ Less

Submitted 2 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: 8 pages,3 figures and 1 page appendix; The processed graphs and codes will be avalibale

arXiv:2402.07627 [pdf]

Unveiling the GeI2-Assisted Oriented Growth of Perovskite Crystallite for High-Performance Flexible Sn Perovskite Solar Cells

Authors: Huagui Lai, Selina Olthof, Shengqiang Ren, Radha K. Kothandaraman, Matthias Diethelm, Quentin Jeangros, Roland Hany, Ayodhya N. Tiwari, Dewei Zhao, Fan Fu

Abstract: Tin perovskites are emerging as promising alternatives to their lead-based counterparts for high-performance and flexible perovskite solar cells (PSCs). However, their rapid crystallization often leads to inadequate film quality and poor device performance. In this study, the role of GeI2 as an additive is investigated for controlling the nucleation and crystallization processes of formamidium tin… ▽ More Tin perovskites are emerging as promising alternatives to their lead-based counterparts for high-performance and flexible perovskite solar cells (PSCs). However, their rapid crystallization often leads to inadequate film quality and poor device performance. In this study, the role of GeI2 as an additive is investigated for controlling the nucleation and crystallization processes of formamidium tin triiodide (FASnI3). The findings reveal the preferential formation of a Ge-rich layer at the bottom of the perovskite film upon the introduction of GeI2. It is proposed that the initial formation of the Ge-complex acts as a crystallization regulator, promoting oriented growth of subsequent FASnI3 crystals and enhancing overall crystallinity. Through the incorporation of an optimal amount of GeI2, flexible Sn PSCs with an efficiency of 10.8% were achieved. Furthermore, it was observed that the GeI2 additive ensures a remarkable shelf-life for the devices, with the rigid cells retaining 91% of their initial performance after more than 13,800 hours of storage in an N2 gas environment. This study elucidates the mechanistic role of GeI2 in regulating the nucleation and crystallization process of tin perovskites, providing valuable insights into the significance of additive engineering for the development of high-performance flexible tin PSCs. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.03992 [pdf, other]

Space Group Constrained Crystal Generation

Authors: Rui Jiao, Wenbing Huang, Yu Liu, Deli Zhao, Yang Liu

Abstract: Crystals are the foundation of numerous scientific and industrial applications. While various learning-based approaches have been proposed for crystal generation, existing methods seldom consider the space group constraint which is crucial in describing the geometry of crystals and closely relevant to many desirable properties. However, considering space group constraint is challenging owing to it… ▽ More Crystals are the foundation of numerous scientific and industrial applications. While various learning-based approaches have been proposed for crystal generation, existing methods seldom consider the space group constraint which is crucial in describing the geometry of crystals and closely relevant to many desirable properties. However, considering space group constraint is challenging owing to its diverse and nontrivial forms. In this paper, we reduce the space group constraint into an equivalent formulation that is more tractable to be handcrafted into the generation process. In particular, we translate the space group constraint into two parts: the basis constraint of the invariant logarithmic space of the lattice matrix and the Wyckoff position constraint of the fractional coordinates. Upon the derived constraints, we then propose DiffCSP++, a novel diffusion model that has enhanced a previous work DiffCSP by further taking space group constraint into account. Experiments on several popular datasets verify the benefit of the involvement of the space group constraint, and show that our DiffCSP++ achieves promising performance on crystal structure prediction, ab initio crystal generation and controllable generation with customized space groups. △ Less

Submitted 8 April, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: ICLR 2024 poster

arXiv:2402.02163 [pdf]

Exceptional point-based ultrasensitive surface acoustic wave gas sensor

Authors: Xingyu Lu, Yang Yuan, Fa Chen, Xiaoxiao Hou, Yanlong Guo, Leonhard Reindl, Wei Luo, Degang Zhao

Abstract: Exceptional points (EPs) refer to degeneracies in non-Hermitian systems where two or more eigenvalues and their corresponding eigenvectors coalesce. Recently, there has been growing interest in harnessing EPs to enhance the responsivity of sensors. Significant improvements in the sensitivity of sensors in optics and electronics have been developed. In this work, we present a novel ultrasensitive s… ▽ More Exceptional points (EPs) refer to degeneracies in non-Hermitian systems where two or more eigenvalues and their corresponding eigenvectors coalesce. Recently, there has been growing interest in harnessing EPs to enhance the responsivity of sensors. Significant improvements in the sensitivity of sensors in optics and electronics have been developed. In this work, we present a novel ultrasensitive surface acoustic wave (SAW) gas sensor based on EP. We demonstrate its ability to significantly respond to trace amount of hydrogen sulfide (H2S) gas by tuning additional loss to approach the EP, thereby enhancing the responsivity compared to the conventional delay line gas sensors. In addition to high sensitivity, our sensor is robust to temperature variation and exclusive to H2S gas. We propose an innovative method for designing a new generation of ultrasensitive gas sensor. △ Less

Submitted 3 February, 2024; originally announced February 2024.

arXiv:2402.01115 [pdf, other]

Interpretation of Intracardiac Electrograms Through Textual Representations

Authors: William Jongwon Han, Diana Gomez, Avi Alok, Chao**g Duan, Michael A. Rosenberg, Douglas Weber, Emerson Liu, Ding Zhao

Abstract: Understanding the irregular electrical activity of atrial fibrillation (AFib) has been a key challenge in electrocardiography. For serious cases of AFib, catheter ablations are performed to collect intracardiac electrograms (EGMs). EGMs offer intricately detailed and localized electrical activity of the heart and are an ideal modality for interpretable cardiac studies. Recent advancements in artif… ▽ More Understanding the irregular electrical activity of atrial fibrillation (AFib) has been a key challenge in electrocardiography. For serious cases of AFib, catheter ablations are performed to collect intracardiac electrograms (EGMs). EGMs offer intricately detailed and localized electrical activity of the heart and are an ideal modality for interpretable cardiac studies. Recent advancements in artificial intelligence (AI) has allowed some works to utilize deep learning frameworks to interpret EGMs during AFib. Additionally, language models (LMs) have shown exceptional performance in being able to generalize to unseen domains, especially in healthcare. In this study, we are the first to leverage pretrained LMs for finetuning of EGM interpolation and AFib classification via masked language modeling. We formulate the EGM as a textual sequence and present competitive performances on AFib classification compared against other representations. Lastly, we provide a comprehensive interpretability study to provide a multi-perspective intuition of the model's behavior, which could greatly benefit the clinical use. △ Less

Submitted 11 April, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

Comments: 18 pages, 9 figures; Accepted to CHIL 2024

ACM Class: I.2.7; J.3

arXiv:2402.00738 [pdf, other]

FM3Q: Factorized Multi-Agent MiniMax Q-Learning for Two-Team Zero-Sum Markov Game

Authors: Guangzheng Hu, Yuanheng Zhu, Haoran Li, Dongbin Zhao

Abstract: Many real-world applications involve some agents that fall into two teams, with payoffs that are equal within the same team but of opposite sign across the opponent team. The so-called two-team zero-sum Markov games (2t0sMGs) can be resolved with reinforcement learning in recent years. However, existing methods are thus inefficient in light of insufficient consideration of intra-team credit assign… ▽ More Many real-world applications involve some agents that fall into two teams, with payoffs that are equal within the same team but of opposite sign across the opponent team. The so-called two-team zero-sum Markov games (2t0sMGs) can be resolved with reinforcement learning in recent years. However, existing methods are thus inefficient in light of insufficient consideration of intra-team credit assignment, data utilization and computational intractability. In this paper, we propose the individual-global-minimax (IGMM) principle to ensure the coherence between two-team minimax behaviors and the individual greedy behaviors through Q functions in 2t0sMGs. Based on it, we present a novel multi-agent reinforcement learning framework, Factorized Multi-Agent MiniMax Q-Learning (FM3Q), which can factorize the joint minimax Q function into individual ones and iteratively solve for the IGMM-satisfied minimax Q functions for 2t0sMGs. Moreover, an online learning algorithm with neural networks is proposed to implement FM3Q and obtain the deterministic and decentralized minimax policies for two-team players. A theoretical analysis is provided to prove the convergence of FM3Q. Empirically, we use three environments to evaluate the learning efficiency and final performance of FM3Q and show its superiority on 2t0sMGs. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2402.00449 [pdf, other]

Parallel Spiking Unit for Efficient Training of Spiking Neural Networks

Authors: Yang Li, Yinqian Sun, Xiang He, Yiting Dong, Dongcheng Zhao, Yi Zeng

Abstract: Efficient parallel computing has become a pivotal element in advancing artificial intelligence. Yet, the deployment of Spiking Neural Networks (SNNs) in this domain is hampered by their inherent sequential computational dependency. This constraint arises from the need for each time step's processing to rely on the preceding step's outcomes, significantly impeding the adaptability of SNN models to… ▽ More Efficient parallel computing has become a pivotal element in advancing artificial intelligence. Yet, the deployment of Spiking Neural Networks (SNNs) in this domain is hampered by their inherent sequential computational dependency. This constraint arises from the need for each time step's processing to rely on the preceding step's outcomes, significantly impeding the adaptability of SNN models to massively parallel computing environments. Addressing this challenge, our paper introduces the innovative Parallel Spiking Unit (PSU) and its two derivatives, the Input-aware PSU (IPSU) and Reset-aware PSU (RPSU). These variants skillfully decouple the leaky integration and firing mechanisms in spiking neurons while probabilistically managing the reset process. By preserving the fundamental computational attributes of the spiking neuron model, our approach enables the concurrent computation of all membrane potential instances within the SNN, facilitating parallel spike output generation and substantially enhancing computational efficiency. Comprehensive testing across various datasets, including static and sequential images, Dynamic Vision Sensor (DVS) data, and speech datasets, demonstrates that the PSU and its variants not only significantly boost performance and simulation speed but also augment the energy efficiency of SNNs through enhanced sparsity in neural activity. These advancements underscore the potential of our method in revolutionizing SNN deployment for high-performance parallel computing applications. △ Less

Submitted 7 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

arXiv:2402.00401 [pdf, other]

Higgs boson pair production and decay at NLO in QCD: the $b\bar{b}γγ$ final state

Authors: Hai Tao Li, Zong-Guo Si, Jian Wang, Xiao Zhang, Dan Zhao

Abstract: The Higgs boson pair production at the LHC provides a probe to the Higgs boson self-coupling. The higher-order QCD corrections in this process are sizable and must be taken into account in comparison with data. Due to the small cross section, it is necessary to consider at least one of the Higgs bosons decaying to bottom quarks. The QCD corrections to the decay processes would also be important in… ▽ More The Higgs boson pair production at the LHC provides a probe to the Higgs boson self-coupling. The higher-order QCD corrections in this process are sizable and must be taken into account in comparison with data. Due to the small cross section, it is necessary to consider at least one of the Higgs bosons decaying to bottom quarks. The QCD corrections to the decay processes would also be important in such cases. We present a full calculation of the total and differential cross sections for the $b\bar{b}γγ$ final state with next-to-leading order (NLO) QCD corrections. After applying typical kinematic cuts in the final state, we find that QCD NLO corrections in the decay decrease the LO result by $19\%$ and reduce the scale uncertainties by a factor of two. The QCD corrections to the invariant mass $m_{jjγγ}$ distribution, the transverse momentum spectra of the leading bottom quark jet and photon are significant and can not be approximated by a constant factor. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: 19 pages, 4 figures

Showing 51–100 of 1,041 results for author: Zha, D