Search | arXiv e-print repository

Ultrafast Photocurrent Hysteresis in Photoferroelectric α-In2Se3

Authors: Zhen Lei, Jiawei Chang, Qiyi Zhao, Jian Zhou, Yuanyuan Huang, Qihua Xiong, Xinlong Xu

Abstract: The photon-electron interactions are generally volatile and the intricate multiphysics details of photoexcited carrier dynamics are not yet distinguished. How to nonvolatile control the physical state through all-optical means and clarify the intricate physical processes has been a long-term goal pursued in polar materials. Photoferroelectric α-In2Se3 holds the great potential for capturing multim… ▽ More The photon-electron interactions are generally volatile and the intricate multiphysics details of photoexcited carrier dynamics are not yet distinguished. How to nonvolatile control the physical state through all-optical means and clarify the intricate physical processes has been a long-term goal pursued in polar materials. Photoferroelectric α-In2Se3 holds the great potential for capturing multimodal nonvolatile states due to the spontaneous reversible in-plane and out-of-plane polarizations and its tunable light-matter interactions arising from the electronic degree of freedom. Here we uncover a nonvolatile zero-bias ultrafast photocurrent hysteresis response with an all-optical scheme, diagnosed by in-plane and out-of-plane terahertz waves emitted from the photoferroelectric α-In2Se3. The mechanism of such ultrafast photocurrent hysteresis emerges as a result of anomalous bulk linear and circular photovoltaic effect synchronously driven by local polarization rearrangement. Utilizing anisotropic ferroelectric kinetics-induced relative phase between the in-plane and out-of-plane directions, we further show flexibly selective chirality, tunable rotational angle, and optimizable ellipticity of terahertz wave polarizations. Our finding offers a promising avenue towards direct ultrafast nonvolatile processing of photocurrent signals through an all-optical scheme. △ Less

Submitted 30 April, 2024; originally announced May 2024.

arXiv:2404.19287 [pdf, other]

Revisiting the Adversarial Robustness of Vision Language Models: a Multimodal Perspective

Authors: Wanqi Zhou, Shuanghao Bai, Qibin Zhao, Badong Chen

Abstract: Pretrained vision-language models (VLMs) like CLIP have shown impressive generalization performance across various downstream tasks, yet they remain vulnerable to adversarial attacks. While prior research has primarily concentrated on improving the adversarial robustness of image encoders to guard against attacks on images, the exploration of text-based and multimodal attacks has largely been over… ▽ More Pretrained vision-language models (VLMs) like CLIP have shown impressive generalization performance across various downstream tasks, yet they remain vulnerable to adversarial attacks. While prior research has primarily concentrated on improving the adversarial robustness of image encoders to guard against attacks on images, the exploration of text-based and multimodal attacks has largely been overlooked. In this work, we initiate the first known and comprehensive effort to study adapting vision-language models for adversarial robustness under the multimodal attack. Firstly, we introduce a multimodal attack strategy and investigate the impact of different attacks. We then propose a multimodal contrastive adversarial training loss, aligning the clean and adversarial text embeddings with the adversarial and clean visual features, to enhance the adversarial robustness of both image and text encoders of CLIP. Extensive experiments on 15 datasets across two tasks demonstrate that our method significantly improves the adversarial robustness of CLIP. Interestingly, we find that the model fine-tuned against multimodal adversarial attacks exhibits greater robustness than its counterpart fine-tuned solely against image-based attacks, even in the context of image attacks, which may open up new possibilities for enhancing the security of VLMs. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: 16 pages, 14 figures

arXiv:2404.18540 [pdf, other]

Tunable coupling of a quantum phononic resonator to a transmon qubit with flip-chip architecture

Authors: Xinhui Ruan, Li Li, Guihan Liang, Silu Zhao, Jia-heng Wang, Yizhou Bu, Bingjie Chen, Xiaohui Song, Xiang Li, He Zhang, **zhe Wang, Qianchuan Zhao, Kai Xu, Heng Fan, Yu-xi Liu, **g Zhang, Zhihui Peng, Zhongcheng Xiang, Dongning Zheng

Abstract: A hybrid system with tunable coupling between phonons and qubits shows great potential for advancing quantum information processing. In this work, we demonstrate strong and tunable coupling between a surface acoustic wave (SAW) resonator and a transmon qubit based on galvanic-contact flip-chip technique. The coupling strength varies from $2π\times$7.0 MHz to -$2π\times$20.6 MHz, which is extracted… ▽ More A hybrid system with tunable coupling between phonons and qubits shows great potential for advancing quantum information processing. In this work, we demonstrate strong and tunable coupling between a surface acoustic wave (SAW) resonator and a transmon qubit based on galvanic-contact flip-chip technique. The coupling strength varies from $2π\times$7.0 MHz to -$2π\times$20.6 MHz, which is extracted from different vacuum Rabi oscillation frequencies. The phonon-induced ac Stark shift of the qubit at different coupling strengths is also shown. Our approach offers a good experimental platform for exploring quantum acoustics and hybrid systems. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18047 [pdf, other]

LIKO: LiDAR, Inertial, and Kinematic Odometry for Bipedal Robots

Authors: Qingrui Zhao, Mingyuan Li, Yongliang Shi, Xuechao Chen, Zhangguo Yu, Lianqiang Han, Zhenyuan Fu, **tao Zhang, Chao Li, Yuanxi Zhang, Qiang Huang

Abstract: High-frequency and accurate state estimation is crucial for biped robots. This paper presents a tightly-coupled LiDAR-Inertial-Kinematic Odometry (LIKO) for biped robot state estimation based on an iterated extended Kalman filter. Beyond state estimation, the foot contact position is also modeled and estimated. This allows for both position and velocity updates from kinematic measurement. Addition… ▽ More High-frequency and accurate state estimation is crucial for biped robots. This paper presents a tightly-coupled LiDAR-Inertial-Kinematic Odometry (LIKO) for biped robot state estimation based on an iterated extended Kalman filter. Beyond state estimation, the foot contact position is also modeled and estimated. This allows for both position and velocity updates from kinematic measurement. Additionally, the use of kinematic measurement results in an increased output state frequency of about 1kHz. This ensures temporal continuity of the estimated state and makes it practical for control purposes of biped robots. We also announce a biped robot dataset consisting of LiDAR, inertial measurement unit (IMU), joint encoders, force/torque (F/T) sensors, and motion capture ground truth to evaluate the proposed method. The dataset is collected during robot locomotion, and our approach reached the best quantitative result among other LIO-based methods and biped robot state estimation algorithms. The dataset and source code will be available at https://github.com/Mr-Zqr/LIKO. △ Less

Submitted 27 April, 2024; originally announced April 2024.

arXiv:2404.16414 [pdf, ps, other]

Validating a lutetium frequency reference

Authors: Kyle J. Arnold, Scott Bustabad, Qin Qichen, Zhao Zhang, Qi Zhao, Murray D. Barrett

Abstract: We review our progress in develo** a frequency reference with singly ionized lutetium and give estimates of the levels of inaccuracy we expect to achieve in the near future with both the $^1S_0\leftrightarrow{}^3D_1$ and $^1S_0\leftrightarrow{}^3D_2$ transitions. Based on established experimental results, we show that inaccuracies at the low $10^{-19}$ level are readily achievable for the… ▽ More We review our progress in develo** a frequency reference with singly ionized lutetium and give estimates of the levels of inaccuracy we expect to achieve in the near future with both the $^1S_0\leftrightarrow{}^3D_1$ and $^1S_0\leftrightarrow{}^3D_2$ transitions. Based on established experimental results, we show that inaccuracies at the low $10^{-19}$ level are readily achievable for the $^1S_0\leftrightarrow{}^3D_1$ transition, and the frequency ratio between the two transitions is limited almost entirely by the BBR shift. We argue that the frequency ratio measured within the one apparatus provides a well-defined metric to compare and establish the performance of remotely located systems. For the measurement of an in situ frequency ratio, relativistic shifts drop out and both transitions experience the same electromagnetic environment. Consequently, the uncertainty budget for the ratio is practically identical to the uncertainty budgets for the individual transitions. If the ratios for two or more systems disagree we can be certain at least one of the clock assessments is incorrect. If they agree, subsequent comparisons on one transition would only differ by relativistic effects. Since motional effects are easily assessed and typically small for a heavy ion, only the differential gravitational red-shift will significantly contribute and this can be confirmed by comparison on the second transition. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 10 pages

arXiv:2404.15000 [pdf, other]

EarPass: Secure and Implicit Call Receiver Authentication Using Ear Acoustic Sensing

Authors: Xi** Sun, **g Chen, Kun He, Zhixiang He, Ruiying Du, Yebo Feng, Qingchuan Zhao, Cong Wu

Abstract: Private voice communication often contains sensitive information, making it critical to ensure that only authorized users have access to such calls. Unfortunately, current authentication mechanisms, such as PIN-based passwords, fingerprint recognition, and face recognition, fail to authenticate the call receiver, leaving a gap in security. To fill the gap, we present EarPass, a secure and implicit… ▽ More Private voice communication often contains sensitive information, making it critical to ensure that only authorized users have access to such calls. Unfortunately, current authentication mechanisms, such as PIN-based passwords, fingerprint recognition, and face recognition, fail to authenticate the call receiver, leaving a gap in security. To fill the gap, we present EarPass, a secure and implicit call receiver authentication scheme designed for smartphones. EarPass sends inaudible acoustic signals through the earpiece speaker to actively sense the outer ear, and records echoes using the top microphone. It focuses on extracting ear-related signals from echoes and performs spectrogram analysis in the magnitude and phase domains. To overcome posture and position variability, EarPass utilizes a learning-based feature extractor for extracting representative features, and a one-class classifier for authentication. EarPass does not increase any burdens on users or change users' call answering habits. Furthermore, it does not require extra devices but only uses the speaker and microphone on the smartphone. We conducted comprehensive experiments to evaluate EarPass's effectiveness and security. Our results show that EarPass can achieve a balanced accuracy of 96.95% and an equal error rate of 1.53%. Additionally, EarPass exhibits resilience against potential attacks, including zero-effort attacks and mimicry attacks. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.14443 [pdf]

Evaluation of Machine Translation Based on Semantic Dependencies and Keywords

Authors: Kewei Yuan, Qiurong Zhao, Yang Xu, Xiao Zhang, Huansheng Ning

Abstract: In view of the fact that most of the existing machine translation evaluation algorithms only consider the lexical and syntactic information, but ignore the deep semantic information contained in the sentence, this paper proposes a computational method for evaluating the semantic correctness of machine translations based on reference translations and incorporating semantic dependencies and sentence… ▽ More In view of the fact that most of the existing machine translation evaluation algorithms only consider the lexical and syntactic information, but ignore the deep semantic information contained in the sentence, this paper proposes a computational method for evaluating the semantic correctness of machine translations based on reference translations and incorporating semantic dependencies and sentence keyword information. Use the language technology platform developed by the Social Computing and Information Retrieval Research Center of Harbin Institute of Technology to conduct semantic dependency analysis and keyword analysis on sentences, and obtain semantic dependency graphs, keywords, and weight information corresponding to keywords. It includes all word information with semantic dependencies in the sentence and keyword information that affects semantic information. Construct semantic association pairs including word and dependency multi-features. The key semantics of the sentence cannot be highlighted in the semantic information extracted through semantic dependence, resulting in vague semantics analysis. Therefore, the sentence keyword information is also included in the scope of machine translation semantic evaluation. To achieve a comprehensive and in-depth evaluation of the semantic correctness of sentences, the experimental results show that the accuracy of the evaluation algorithm has been improved compared with similar methods, and it can more accurately measure the semantic correctness of machine translation. △ Less

Submitted 20 April, 2024; originally announced April 2024.

arXiv:2404.13798 [pdf, other]

Enforcing Conditional Independence for Fair Representation Learning and Causal Image Generation

Authors: Jensen Hwa, Qingyu Zhao, Aditya Lahiri, Adnan Masood, Babak Salimi, Ehsan Adeli

Abstract: Conditional independence (CI) constraints are critical for defining and evaluating fairness in machine learning, as well as for learning unconfounded or causal representations. Traditional methods for ensuring fairness either blindly learn invariant features with respect to a protected variable (e.g., race when classifying sex from face images) or enforce CI relative to the protected attribute onl… ▽ More Conditional independence (CI) constraints are critical for defining and evaluating fairness in machine learning, as well as for learning unconfounded or causal representations. Traditional methods for ensuring fairness either blindly learn invariant features with respect to a protected variable (e.g., race when classifying sex from face images) or enforce CI relative to the protected attribute only on the model output (e.g., the sex label). Neither of these methods are effective in enforcing CI in high-dimensional feature spaces. In this paper, we focus on a nascent approach characterizing the CI constraint in terms of two Jensen-Shannon divergence terms, and we extend it to high-dimensional feature spaces using a novel dynamic sampling strategy. In doing so, we introduce a new training paradigm that can be applied to any encoder architecture. We are able to enforce conditional independence of the diffusion autoencoder latent representation with respect to any protected attribute under the equalized odds constraint and show that this approach enables causal image generation with controllable latent spaces. Our experimental results demonstrate that our approach can achieve high accuracy on downstream tasks while upholding equality of odds. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: To appear at the 2024 IEEE CVPR Workshop on Fair, Data-Efficient, and Trusted Computer Vision

arXiv:2404.13430 [pdf, other]

React-OT: Optimal Transport for Generating Transition State in Chemical Reactions

Authors: Chenru Duan, Guan-Horng Liu, Yuanqi Du, Tianrong Chen, Qiyuan Zhao, Haojun Jia, Carla P. Gomes, Evangelos A. Theodorou, Heather J. Kulik

Abstract: Transition states (TSs) are transient structures that are key in understanding reaction mechanisms and designing catalysts but challenging to be captured in experiments. Alternatively, many optimization algorithms have been developed to search for TSs computationally. Yet the cost of these algorithms driven by quantum chemistry methods (usually density functional theory) is still high, posing chal… ▽ More Transition states (TSs) are transient structures that are key in understanding reaction mechanisms and designing catalysts but challenging to be captured in experiments. Alternatively, many optimization algorithms have been developed to search for TSs computationally. Yet the cost of these algorithms driven by quantum chemistry methods (usually density functional theory) is still high, posing challenges for their applications in building large reaction networks for reaction exploration. Here we developed React-OT, an optimal transport approach for generating unique TS structures from reactants and products. React-OT generates highly accurate TS structures with a median structural root mean square deviation (RMSD) of 0.053Å and median barrier height error of 1.06 kcal/mol requiring only 0.4 second per reaction. The RMSD and barrier height error is further improved by roughly 25% through pretraining React-OT on a large reaction dataset obtained with a lower level of theory, GFN2-xTB. We envision the great accuracy and fast inference of React-OT useful in targeting TSs when exploring chemical reactions with unknown mechanisms. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: 5 figures, 1 table

arXiv:2404.12976 [pdf, other]

Insights from the Gaussian Processes Method for the FRB-associated X-ray Burst of SGR 1935+2154

Authors: Rui**g Tang, Dahai Yan, Haiyun Zhang, Qingchang Zhao, Lian Tao, Chengkui Li, Mingyu Ge, Xiaobo Li, Qianqing Yin, Ce Cai

Abstract: Gaussian processes method is employed to analyze the light curves of bursts detected by Insight-HXMT, NICER, and GECAM from SGR 1935+2154 between 2020 to 2022. It is found that a stochastically driven damped simple harmonic oscillator (SHO) is necessary to capture the characteristics of the X-ray bursts. Variability timescale of the X-ray bursts, corresponding to the broken frequencies in the SHO… ▽ More Gaussian processes method is employed to analyze the light curves of bursts detected by Insight-HXMT, NICER, and GECAM from SGR 1935+2154 between 2020 to 2022. It is found that a stochastically driven damped simple harmonic oscillator (SHO) is necessary to capture the characteristics of the X-ray bursts. Variability timescale of the X-ray bursts, corresponding to the broken frequencies in the SHO power spectral densities (PSDs), are extracted. In particular, a high broken frequency of 35 Hz where the index of the SHO PSD changes from -4 to -2 is constrained by the HXMT-HE burst associated with FRB 200428. It is suggested that the corresponding timescale of 0.03 s could be the retarding timescale of the system driven by some energy release, and the production of the HE photon should be quasi-simultaneous with the response. The other special event is a NICER burst with a retarding timescale of 1/39 Hz (0.02 s). In the normal X-ray bursts, no retarding timescale is constrained; a long relax/equilibrium timescale (corresponding to a broken frequency of 1-10 Hz where the index of the SHO PSD changing from -4/-2 to 0 in the SHO PSD) is obtained. The results indicate that the FRB-associated HXMT-HE X-ray burst could be produced immediately when the system is responding to the energy disturbance, far before the equilibrium state. △ Less

Submitted 19 June, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

Comments: 13 pages,17 figures,1 table

MSC Class: 85-02

arXiv:2404.12659 [pdf, ps, other]

SOS-1K: A Fine-grained Suicide Risk Classification Dataset for Chinese Social Media Analysis

Authors: Hongzhi Qi, Hanfei Liu, Jianqiang Li, Qing Zhao, Wei Zhai, Dan Luo, Tian Yu He, Shuo Liu, Bing Xiang Yang, Guanghui Fu

Abstract: In the social media, users frequently express personal emotions, a subset of which may indicate potential suicidal tendencies. The implicit and varied forms of expression in internet language complicate accurate and rapid identification of suicidal intent on social media, thus creating challenges for timely intervention efforts. The development of deep learning models for suicide risk detection is… ▽ More In the social media, users frequently express personal emotions, a subset of which may indicate potential suicidal tendencies. The implicit and varied forms of expression in internet language complicate accurate and rapid identification of suicidal intent on social media, thus creating challenges for timely intervention efforts. The development of deep learning models for suicide risk detection is a promising solution, but there is a notable lack of relevant datasets, especially in the Chinese context. To address this gap, this study presents a Chinese social media dataset designed for fine-grained suicide risk classification, focusing on indicators such as expressions of suicide intent, methods of suicide, and urgency of timing. Seven pre-trained models were evaluated in two tasks: high and low suicide risk, and fine-grained suicide risk classification on a level of 0 to 10. In our experiments, deep learning models show good performance in distinguishing between high and low suicide risk, with the best model achieving an F1 score of 88.39%. However, the results for fine-grained suicide risk classification were still unsatisfactory, with an weighted F1 score of 50.89%. To address the issues of data imbalance and limited dataset size, we investigated both traditional and advanced, large language model based data augmentation techniques, demonstrating that data augmentation can enhance model performance by up to 4.65% points in F1-score. Notably, the Chinese MentalBERT model, which was pre-trained on psychological domain data, shows superior performance in both tasks. This study provides valuable insights for automatic identification of suicidal individuals, facilitating timely psychological intervention on social media platforms. The source code and data are publicly available. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.12235 [pdf, other]

Beyond Average: Individualized Visual Scanpath Prediction

Authors: Xianyu Chen, Ming Jiang, Qi Zhao

Abstract: Understanding how attention varies across individuals has significant scientific and societal impacts. However, existing visual scanpath models treat attention uniformly, neglecting individual differences. To bridge this gap, this paper focuses on individualized scanpath prediction (ISP), a new attention modeling task that aims to accurately predict how different individuals shift their attention… ▽ More Understanding how attention varies across individuals has significant scientific and societal impacts. However, existing visual scanpath models treat attention uniformly, neglecting individual differences. To bridge this gap, this paper focuses on individualized scanpath prediction (ISP), a new attention modeling task that aims to accurately predict how different individuals shift their attention in diverse visual tasks. It proposes an ISP method featuring three novel technical components: (1) an observer encoder to characterize and integrate an observer's unique attention traits, (2) an observer-centric feature integration approach that holistically combines visual features, task guidance, and observer-specific characteristics, and (3) an adaptive fixation prioritization mechanism that refines scanpath predictions by dynamically prioritizing semantic feature maps based on individual observers' attention traits. These novel components allow scanpath models to effectively address the attention variations across different observers. Our method is generally applicable to different datasets, model architectures, and visual tasks, offering a comprehensive tool for transforming general scanpath models into individualized ones. Comprehensive evaluations using value-based and ranking-based metrics verify the method's effectiveness and generalizability. △ Less

Submitted 18 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

Comments: To appear in CVPR2024

arXiv:2404.11952 [pdf, other]

Generation of Ultrarelativistic Vortex Leptons with Large Orbital Angular Momenta

Authors: Mamutjan Ababekri, Jun-Lin Zhou, Ren-Tong Guo, Yong-Zheng Ren, Yu-Han Kou, Qian Zhao, Zhong-Peng Li, Jian-Xing Li

Abstract: Ultrarelativistic vortex leptons with intrinsic orbital angular momenta (OAM) have important applications in high energy particle physics, nuclear physics, astrophysics, etc. However, unfortunately, their generation still poses a great challenge. Here, we put forward a novel method for generating ultrarelativistic vortex positrons and electrons through nonlinear Breit-Wheeler (NBW) scattering of v… ▽ More Ultrarelativistic vortex leptons with intrinsic orbital angular momenta (OAM) have important applications in high energy particle physics, nuclear physics, astrophysics, etc. However, unfortunately, their generation still poses a great challenge. Here, we put forward a novel method for generating ultrarelativistic vortex positrons and electrons through nonlinear Breit-Wheeler (NBW) scattering of vortex $γ$ photons. For the first time, a complete angular momentum-resolved scattering theory has been formulated, introducing the angular momentum of laser photons and vortex particles into the conventional NBW scattering framework. We find that vortex positron (electron) can be produced when the outgoing electron (positron) is generated along the collision axis. By unveiling the angular momentum transfer mechanism, we clarify that OAM of the $γ$ photon and angular momenta of multiple laser photons are entirely transferred to the generated pairs, leading to the production of ultrarelativistic vortex positrons or electrons with large OAM. Furthermore, we find that the cone opening angle and superposition state of the vortex $γ$ photon, distinct characteristics aside from its intrinsic OAM, can be determined via the angular distribution of created pairs in NBW processes. Our method paves the way for investigating strong-field quantum electrodynamics processes concerning the generation and detection of vortex particle beams in intense lasers. △ Less

Submitted 24 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

Comments: 5

arXiv:2404.11449 [pdf, other]

AI-Enhanced Cognitive Behavioral Therapy: Deep Learning and Large Language Models for Extracting Cognitive Pathways from Social Media Texts

Authors: Meng Jiang, Yi **g Yu, Qing Zhao, Jianqiang Li, Changwei Song, Hongzhi Qi, Wei Zhai, Dan Luo, Xiaoqin Wang, Guanghui Fu, Bing Xiang Yang

Abstract: Cognitive Behavioral Therapy (CBT) is an effective technique for addressing the irrational thoughts stemming from mental illnesses, but it necessitates precise identification of cognitive pathways to be successfully implemented in patient care. In current society, individuals frequently express negative emotions on social media on specific topics, often exhibiting cognitive distortions, including… ▽ More Cognitive Behavioral Therapy (CBT) is an effective technique for addressing the irrational thoughts stemming from mental illnesses, but it necessitates precise identification of cognitive pathways to be successfully implemented in patient care. In current society, individuals frequently express negative emotions on social media on specific topics, often exhibiting cognitive distortions, including suicidal behaviors in extreme cases. Yet, there is a notable absence of methodologies for analyzing cognitive pathways that could aid psychotherapists in conducting effective interventions online. In this study, we gathered data from social media and established the task of extracting cognitive pathways, annotating the data based on a cognitive theoretical framework. We initially categorized the task of extracting cognitive pathways as a hierarchical text classification with four main categories and nineteen subcategories. Following this, we structured a text summarization task to help psychotherapists quickly grasp the essential information. Our experiments evaluate the performance of deep learning and large language models (LLMs) on these tasks. The results demonstrate that our deep learning method achieved a micro-F1 score of 62.34% in the hierarchical text classification task. Meanwhile, in the text summarization task, GPT-4 attained a Rouge-1 score of 54.92 and a Rouge-2 score of 30.86, surpassing the experimental deep learning model's performance. However, it may suffer from an issue of hallucination. We have made all models and codes publicly available to support further research in this field. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.09790 [pdf, other]

NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results

Authors: Zheng Chen, Zongwei Wu, Eduard Zamfir, Kai Zhang, Yulun Zhang, Radu Timofte, Xiaokang Yang, Hongyuan Yu, Cheng Wan, Yuxin Hong, Zhijuan Huang, Yajun Zou, Yuan Huang, Jiamin Lin, Bingnan Han, Xianyu Guan, Yongsheng Yu, Daoan Zhang, Xuanwu Yin, Kunlong Zuo, **hua Hao, Kai Zhao, Kun Yuan, Ming Sun, Chao Zhou , et al. (63 additional authors not shown)

Abstract: This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge i… ▽ More This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge is to obtain designs/solutions with the most advanced SR performance, with no constraints on computational resources (e.g., model size and FLOPs) or training data. The track of this challenge assesses performance with the PSNR metric on the DIV2K testing dataset. The competition attracted 199 registrants, with 20 teams submitting valid entries. This collective endeavour not only pushes the boundaries of performance in single-image SR but also offers a comprehensive overview of current trends in this field. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: NTIRE 2024 webpage: https://cvlai.net/ntire/2024. Code: https://github.com/zhengchen1999/NTIRE2024_ImageSR_x4

arXiv:2404.09712 [pdf, ps, other]

1/2$^-$ $α$ cluster resonances of $^{13}$C studied by the analytic continuation in the coupling constant

Authors: Seungheon Shin, Masaaki Kimura, Bo Zhou, Qing Zhao

Abstract: The 1/2$^-$ resonant states in $^{13}{\rm C}$ are investigated to search for the Hoyle-analog state. In order to treat the resonance states located around the 3$α+n$ threshold, the analytic continuation in the coupling constant (ACCC) has been combined with the real-time evolution method (REM). The properties of the 1/2$^-$ resonance states such as the radii and monopole transition probabilities a… ▽ More The 1/2$^-$ resonant states in $^{13}{\rm C}$ are investigated to search for the Hoyle-analog state. In order to treat the resonance states located around the 3$α+n$ threshold, the analytic continuation in the coupling constant (ACCC) has been combined with the real-time evolution method (REM). The properties of the 1/2$^-$ resonance states such as the radii and monopole transition probabilities are calculated. We show the 1/2$^-_3$ and 1/2$^-_4$ states are well-developed $α$ cluster states, and the 1/2$^-_4$ state is a candidate of the Hoyle-analog state. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.09149 [pdf, other]

Heuristic Solution to Joint Deployment and Beamforming Design for STAR-RIS Aided Networks

Authors: Bai Yan, Qi Zhao, ** Zhang, J. Andrew Zhang

Abstract: This paper tackles the deployment challenges of Simultaneous Transmitting and Reflecting Reconfigurable Intelligent Surface (STAR-RIS) in communication systems. Unlike existing works that use fixed deployment setups or solely optimize the location, this paper emphasizes the joint optimization of the location and orientation of STAR-RIS. This enables searching across all user grou** possibilities… ▽ More This paper tackles the deployment challenges of Simultaneous Transmitting and Reflecting Reconfigurable Intelligent Surface (STAR-RIS) in communication systems. Unlike existing works that use fixed deployment setups or solely optimize the location, this paper emphasizes the joint optimization of the location and orientation of STAR-RIS. This enables searching across all user grou** possibilities and fully boosting the system's performance. We consider a sum rate maximization problem with joint optimization and hybrid beamforming design. An offline heuristic solution is proposed for the problem, developed based on differential evolution and semi-definite programming methods. In particular, a point-point representation is proposed for characterizing and exploiting the user-grou**. A balanced grou** method is designed to achieve a desired user grou** with low complexity. Numerical results demonstrate the substantial performance gains achievable through optimal deployment design. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: 30 pages

arXiv:2404.08921 [pdf, other]

PNeRV: Enhancing Spatial Consistency via Pyramidal Neural Representation for Videos

Authors: Qi Zhao, M. Salman Asif, Zhan Ma

Abstract: The primary focus of Neural Representation for Videos (NeRV) is to effectively model its spatiotemporal consistency. However, current NeRV systems often face a significant issue of spatial inconsistency, leading to decreased perceptual quality. To address this issue, we introduce the Pyramidal Neural Representation for Videos (PNeRV), which is built on a multi-scale information connection and comp… ▽ More The primary focus of Neural Representation for Videos (NeRV) is to effectively model its spatiotemporal consistency. However, current NeRV systems often face a significant issue of spatial inconsistency, leading to decreased perceptual quality. To address this issue, we introduce the Pyramidal Neural Representation for Videos (PNeRV), which is built on a multi-scale information connection and comprises a lightweight rescaling operator, Kronecker Fully-connected layer (KFc), and a Benign Selective Memory (BSM) mechanism. The KFc, inspired by the tensor decomposition of the vanilla Fully-connected layer, facilitates low-cost rescaling and global correlation modeling. BSM merges high-level features with granular ones adaptively. Furthermore, we provide an analysis based on the Universal Approximation Theory of the NeRV system and validate the effectiveness of the proposed PNeRV.We conducted comprehensive experiments to demonstrate that PNeRV surpasses the performance of contemporary NeRV models, achieving the best results in video regression on UVG and DAVIS under various metrics (PSNR, SSIM, LPIPS, and FVD). Compared to vanilla NeRV, PNeRV achieves a +4.49 dB gain in PSNR and a 231% increase in FVD on UVG, along with a +3.28 dB PSNR and 634% FVD increase on DAVIS. △ Less

Submitted 13 April, 2024; originally announced April 2024.

arXiv:2404.08917 [pdf, other]

MAProtoNet: A Multi-scale Attentive Interpretable Prototypical Part Network for 3D Magnetic Resonance Imaging Brain Tumor Classification

Authors: Binghua Li, Jie Mao, Zhe Sun, Chao Li, Qibin Zhao, Toshihisa Tanaka

Abstract: Automated diagnosis with artificial intelligence has emerged as a promising area in the realm of medical imaging, while the interpretability of the introduced deep neural networks still remains an urgent concern. Although contemporary works, such as XProtoNet and MProtoNet, has sought to design interpretable prediction models for the issue, the localization precision of their resulting attribution… ▽ More Automated diagnosis with artificial intelligence has emerged as a promising area in the realm of medical imaging, while the interpretability of the introduced deep neural networks still remains an urgent concern. Although contemporary works, such as XProtoNet and MProtoNet, has sought to design interpretable prediction models for the issue, the localization precision of their resulting attribution maps can be further improved. To this end, we propose a Multi-scale Attentive Prototypical part Network, termed MAProtoNet, to provide more precise maps for attribution. Specifically, we introduce a concise multi-scale module to merge attentive features from quadruplet attention layers, and produces attribution maps. The proposed quadruplet attention layers can enhance the existing online class activation map** loss via capturing interactions between the spatial and channel dimension, while the multi-scale module then fuses both fine-grained and coarse-grained information for precise maps generation. We also apply a novel multi-scale map** loss for supervision on the proposed multi-scale module. Compared to existing interpretable prototypical part networks in medical imaging, MAProtoNet can achieve state-of-the-art performance in localization on brain tumor segmentation (BraTS) datasets, resulting in approximately 4% overall improvement on activation precision score (with a best score of 85.8%), without using additional annotated labels of segmentation. Our code will be released in https://github.com/TUAT-Novice/maprotonet. △ Less

Submitted 13 April, 2024; originally announced April 2024.

arXiv:2404.07019 [pdf, other]

Chiral Chaos Enhanced Sensing

Authors: Yun-Qiu Ge, Zhe Wang, Qian-Chuan Zhao, **g Zhang, Yu-xi Liu

Abstract: Chirality refers to the property that an object and its mirror image cannot overlap each other by spatial rotation and translation, and can be found in various research fields. We here propose chiral chaos and construct a chiral chaotic device via coupled whispering gallery mode resonators, where routes to chaos exhibit pronounced chirality for two opposite pum** directions. The mechanism respon… ▽ More Chirality refers to the property that an object and its mirror image cannot overlap each other by spatial rotation and translation, and can be found in various research fields. We here propose chiral chaos and construct a chiral chaotic device via coupled whispering gallery mode resonators, where routes to chaos exhibit pronounced chirality for two opposite pum** directions. The mechanism responsible for this phenomenon is that time-reversal symmetry of the traveling-wave light fields is broken by the Rayleigh scatterers inserted in resonators. Combining with the Lyapunov exponents, we propose metrics to measure the symmetry and chirality between different chaotic dynamics. We find that such a chiral chaotic device can be applied to achieve sensing with high sensitivity, wide detectable range, and strong robustness to the phase and orientation randomness of weak signals. Our work presents a promising candidate for on-chip sensing and may have applications in quantum networks and chaotic communications. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.06477 [pdf, other]

doi 10.1145/3656403

Mechanised Hypersafety Proofs about Structured Data: Extended Version

Authors: Vladimir Gladshtein, Qiyuan Zhao, Willow Ahrens, Saman Amarasinghe, Ilya Sergey

Abstract: Arrays are a fundamental abstraction to represent collections of data. It is often possible to exploit structural properties of the data stored in an array (e.g., repetition or sparsity) to develop a specialised representation optimised for space efficiency. Formally reasoning about correctness of manipulations with such structured data is challenging, as they are often composed of multiple loops… ▽ More Arrays are a fundamental abstraction to represent collections of data. It is often possible to exploit structural properties of the data stored in an array (e.g., repetition or sparsity) to develop a specialised representation optimised for space efficiency. Formally reasoning about correctness of manipulations with such structured data is challenging, as they are often composed of multiple loops with non-trivial invariants. In this work, we observe that specifications for structured data manipulations can be phrased as hypersafety properties, i.e., predicates that relate traces of $k$ programs. To turn this observation into an effective verification methodology, we developed the Logic for Graceful Tensor Manipulation (LGTM), a new Hoare-style relational separation logic for specifying and verifying computations over structured data. The key enabling idea of LGTM is that of parametrised hypersafety specifications that allow the number $k$ of the program components to depend on the program variables. We implemented LGTM as a foundational embedding into Coq, mechanising its rules, meta-theory, and the proof of soundness. Furthermore, we developed a library of domain-specific tactics that automate computer-aided hypersafety reasoning, resulting in pleasantly short proof scripts that enjoy a high degree of reuse. We argue for the effectiveness of relational reasoning about structured data in LGTM by specifying and mechanically proving correctness of 13 case studies including computations on compressed arrays and efficient operations over multiple kinds of sparse tensors. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: Extended version of the paper accepted at PLDI'24

arXiv:2404.05892 [pdf, other]

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Authors: Bo Peng, Daniel Goldstein, Quentin Anthony, Alon Albalak, Eric Alcaide, Stella Biderman, Eugene Cheah, Xingjian Du, Teddy Ferdinan, Haowen Hou, Przemysław Kazienko, Kranthi Kiran GV, Jan Kocoń, Bartłomiej Koptyra, Satyapriya Krishna, Ronald McClelland Jr., Niklas Muennighoff, Fares Obeid, Atsushi Saito, Guangyu Song, Haoqin Tu, Stanisław Woźniak, Ruichong Zhang, Bingchen Zhao, Qihang Zhao , et al. (3 additional authors not shown)

Abstract: We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokeni… ▽ More We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching for enhanced multilinguality. We trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two Finch models with 1.6 and 3.1 billion parameters and find that they achieve competitive performance across a wide variety of benchmarks. We release all our models on HuggingFace under the Apache 2.0 license. Models at: https://huggingface.co/RWKV Training code at: https://github.com/RWKV/RWKV-LM Inference code at: https://github.com/RWKV/ChatRWKV Time-parallel training code at: https://github.com/RWKV/RWKV-infctx-trainer △ Less

Submitted 10 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.05165 [pdf]

doi 10.1016/j.cej.2024.151111

Zincophilic armor: Phytate ammonium as a multifunctional additive for enhanced performance in aqueous zinc-ion batteries

Authors: Fangyuan Xiao, Xiaoke Wang, Kaitong Sun, Qian Zhao, Cui** Han, Hai-Feng Li

Abstract: Corrosion and the formation of by-products resulting from parasitic side reactions, as well as random dendrite growth, pose significant challenges for aqueous zinc-ion batteries (AZIBs). In this study, phytate ammonium is introduced into the traditional dilute Zinc sulfate electrolyte as a multi-functional additive. Leveraging the inherent zincophilic nature of the phytic anion, a protective layer… ▽ More Corrosion and the formation of by-products resulting from parasitic side reactions, as well as random dendrite growth, pose significant challenges for aqueous zinc-ion batteries (AZIBs). In this study, phytate ammonium is introduced into the traditional dilute Zinc sulfate electrolyte as a multi-functional additive. Leveraging the inherent zincophilic nature of the phytic anion, a protective layer is formed on the surface of the zinc anode. This layer can effectively manipulate the deposition process, mitigate parasitic reactions, and reduce the accumulation of detrimental by-products. Additionally, the competitive deposition between dissociated ammonium ions and Zn2+ promotes uniform deposition, thereby alleviating dendrite growth. Consequently, the modified electrolyte with a lower volume addition exhibits superior performance. The zinc symmetric battery demonstrates much more reversible plating/strip**, sustaining over 2000 hours at 5 mA cm-2 and 1 mA h cm-2. A high average deposition/strip** efficiency of 99.83% is achieved, indicating the significant boosting effect and practical potential of our strategy for high-performance aqueous zinc-ion batteries. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04421 [pdf, other]

PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations

Authors: Yang Zheng, Qingqing Zhao, Guandao Yang, Wang Yifan, Donglai Xiang, Florian Dubost, Dmitry Lagun, Thabo Beeler, Federico Tombari, Leonidas Guibas, Gordon Wetzstein

Abstract: Modeling and rendering photorealistic avatars is of crucial importance in many applications. Existing methods that build a 3D avatar from visual observations, however, struggle to reconstruct clothed humans. We introduce PhysAvatar, a novel framework that combines inverse rendering with inverse physics to automatically estimate the shape and appearance of a human from multi-view video data along w… ▽ More Modeling and rendering photorealistic avatars is of crucial importance in many applications. Existing methods that build a 3D avatar from visual observations, however, struggle to reconstruct clothed humans. We introduce PhysAvatar, a novel framework that combines inverse rendering with inverse physics to automatically estimate the shape and appearance of a human from multi-view video data along with the physical parameters of the fabric of their clothes. For this purpose, we adopt a mesh-aligned 4D Gaussian technique for spatio-temporal mesh tracking as well as a physically based inverse renderer to estimate the intrinsic material properties. PhysAvatar integrates a physics simulator to estimate the physical parameters of the garments using gradient-based optimization in a principled manner. These novel capabilities enable PhysAvatar to create high-quality novel-view renderings of avatars dressed in loose-fitting clothes under motions and lighting conditions not seen in the training data. This marks a significant advancement towards modeling photorealistic digital humans using physically based inverse rendering with physics in the loop. Our project website is at: https://qingqing-zhao.github.io/PhysAvatar △ Less

Submitted 9 April, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

Comments: Project Page: https://qingqing-zhao.github.io/PhysAvatar

arXiv:2404.03741 [pdf, other]

A High-Fidelity Simulation Framework for Gras** Stability Analysis in Human Casualty Manipulation

Authors: Qianwen Zhao, Rajarshi Roy, Chad Spurlock, Kevin Lister, Long Wang

Abstract: Recently, there has been a growing interest in rescue robots due to their vital role in addressing emergency scenarios and providing crucial support in challenging or hazardous situations where human intervention is difficult. However, very few of these robots are capable of actively engaging with humans and undertaking physical manipulation tasks. This limitation is largely attributed to the abse… ▽ More Recently, there has been a growing interest in rescue robots due to their vital role in addressing emergency scenarios and providing crucial support in challenging or hazardous situations where human intervention is difficult. However, very few of these robots are capable of actively engaging with humans and undertaking physical manipulation tasks. This limitation is largely attributed to the absence of tools that can realistically simulate physical interactions, especially the contact mechanisms between a robotic gripper and a human body. In this letter, we aim to address key limitations in current developments towards robotic casualty manipulation. Firstly, we present an integrative simulation framework for casualty manipulation. We adapt a finite element method (FEM) tool into the gras** and manipulation scenario, and the developed framework can provide accurate biomechanical reactions resulting from manipulation. Secondly, we conduct a detailed assessment of gras** stability during casualty gras** and manipulation simulations. To validate the necessity and superior performance of the proposed high-fidelity simulation framework, we conducted a qualitative and quantitative comparison of gras** stability analyses between the proposed framework and the state-of-the-art multi-body physics simulations. Through these efforts, we have taken the first step towards a feasible solution for robotic casualty manipulation. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: 8 pages, revision submitted to IEEE RA-L, under review

arXiv:2404.03162 [pdf, other]

LTRDetector: Exploring Long-Term Relationship for Advanced Persistent Threats Detection

Authors: Xiaoxiao Liu, Fan Xu, Nan Wang, Qinxin Zhao, Dalin Zhang, Xibin Zhao, Jiqiang Liu

Abstract: Advanced Persistent Threat (APT) is challenging to detect due to prolonged duration, infrequent occurrence, and adept concealment techniques. Existing approaches primarily concentrate on the observable traits of attack behaviors, neglecting the intricate relationships formed throughout the persistent attack lifecycle. Thus, we present an innovative APT detection framework named LTRDetector, implem… ▽ More Advanced Persistent Threat (APT) is challenging to detect due to prolonged duration, infrequent occurrence, and adept concealment techniques. Existing approaches primarily concentrate on the observable traits of attack behaviors, neglecting the intricate relationships formed throughout the persistent attack lifecycle. Thus, we present an innovative APT detection framework named LTRDetector, implementing an end-to-end holistic operation. LTRDetector employs an innovative graph embedding technique to retain comprehensive contextual information, then derives long-term features from these embedded provenance graphs. During the process, we compress the data of the system provenance graph for effective feature learning. Furthermore, in order to detect attacks conducted by using zero-day exploits, we captured the system's regular behavior and detects abnormal activities without relying on predefined attack signatures. We also conducted extensive evaluations using five prominent datasets, the efficacy evaluation of which underscores the superiority of LTRDetector compared to existing state-of-the-art techniques. △ Less

Submitted 3 April, 2024; originally announced April 2024.

arXiv:2404.03066 [pdf, other]

doi 10.1109/ACCESS.2024.3383436

Traffic Divergence Theory: An Analysis Formalism for Dynamic Networks

Authors: Matin Macktoobian, Zhan Shu, Qing Zhao

Abstract: Traffic dynamics is universally crucial in analyzing and designing almost any network. This article introduces a novel theoretical approach to analyzing network traffic dynamics. This theory's machinery is based on the notion of traffic divergence, which captures the flow (im)balance of network nodes and links. It features various analytical probes to investigate both spatial and temporal traffic… ▽ More Traffic dynamics is universally crucial in analyzing and designing almost any network. This article introduces a novel theoretical approach to analyzing network traffic dynamics. This theory's machinery is based on the notion of traffic divergence, which captures the flow (im)balance of network nodes and links. It features various analytical probes to investigate both spatial and temporal traffic dynamics. In particular, the maximal traffic distribution in a network can be characterized by spatial traffic divergence rate, which reveals the relative difference among node traffic divergence. To illustrate the usefulness, we apply the theory to two network-driven problems: throughput estimation of data center networks and power-optimized communication planning for robot networks, and show the merits of the proposed theory through simulations. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Journal ref: IEEE Access, 2024

arXiv:2404.01754 [pdf, other]

Peer-aided Repairer: Empowering Large Language Models to Repair Advanced Student Assignments

Authors: Qianhui Zhao, Fang Liu, Li Zhang, Yang Liu, Zhen Yan, Zhenghao Chen, Yufei Zhou, **g Jiang, Ge Li

Abstract: Automated generation of feedback on programming assignments holds significant benefits for programming education, especially when it comes to advanced assignments. Automated Program Repair techniques, especially Large Language Model based approaches, have gained notable recognition for their potential to fix introductory assignments. However, the programs used for evaluation are relatively simple.… ▽ More Automated generation of feedback on programming assignments holds significant benefits for programming education, especially when it comes to advanced assignments. Automated Program Repair techniques, especially Large Language Model based approaches, have gained notable recognition for their potential to fix introductory assignments. However, the programs used for evaluation are relatively simple. It remains unclear how existing approaches perform in repairing programs from higher-level programming courses. To address these limitations, we curate a new advanced student assignment dataset named Defects4DS from a higher-level programming course. Subsequently, we identify the challenges related to fixing bugs in advanced assignments. Based on the analysis, we develop a framework called PaR that is powered by the LLM. PaR works in three phases: Peer Solution Selection, Multi-Source Prompt Generation, and Program Repair. Peer Solution Selection identifies the closely related peer programs based on lexical, semantic, and syntactic criteria. Then Multi-Source Prompt Generation adeptly combines multiple sources of information to create a comprehensive and informative prompt for the last Program Repair stage. The evaluation on Defects4DS and another well-investigated ITSP dataset reveals that PaR achieves a new state-of-the-art performance, demonstrating impressive improvements of 19.94% and 15.2% in repair rate compared to prior state-of-the-art LLM- and symbolic-based approaches, respectively △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: On-going work

arXiv:2403.17712 [pdf, other]

Invisible Gas Detection: An RGB-Thermal Cross Attention Network and A New Benchmark

Authors: Jue Wang, Yuxiang Lin, Qi Zhao, Dong Luo, Shuaibao Chen, Wei Chen, Xiaojiang Peng

Abstract: The widespread use of various chemical gases in industrial processes necessitates effective measures to prevent their leakage during transportation and storage, given their high toxicity. Thermal infrared-based computer vision detection techniques provide a straightforward approach to identify gas leakage areas. However, the development of high-quality algorithms has been challenging due to the lo… ▽ More The widespread use of various chemical gases in industrial processes necessitates effective measures to prevent their leakage during transportation and storage, given their high toxicity. Thermal infrared-based computer vision detection techniques provide a straightforward approach to identify gas leakage areas. However, the development of high-quality algorithms has been challenging due to the low texture in thermal images and the lack of open-source datasets. In this paper, we present the RGB-Thermal Cross Attention Network (RT-CAN), which employs an RGB-assisted two-stream network architecture to integrate texture information from RGB images and gas area information from thermal images. Additionally, to facilitate the research of invisible gas detection, we introduce Gas-DB, an extensive open-source gas detection database including about 1.3K well-annotated RGB-thermal images with eight variant collection scenes. Experimental results demonstrate that our method successfully leverages the advantages of both modalities, achieving state-of-the-art (SOTA) performance among RGB-thermal methods, surpassing single-stream SOTA models in terms of accuracy, Intersection of Union (IoU), and F2 metrics by 4.86%, 5.65%, and 4.88%, respectively. The code and data will be made available soon. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.17297 [pdf, other]

InternLM2 Technical Report

Authors: Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang , et al. (75 additional authors not shown)

Abstract: The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context m… ▽ More The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context modeling, and open-ended subjective evaluations through innovative pre-training and optimization techniques. The pre-training process of InternLM2 is meticulously detailed, highlighting the preparation of diverse data types including text, code, and long-context data. InternLM2 efficiently captures long-term dependencies, initially trained on 4k tokens before advancing to 32k tokens in pre-training and fine-tuning stages, exhibiting remarkable performance on the 200k ``Needle-in-a-Haystack" test. InternLM2 is further aligned using Supervised Fine-Tuning (SFT) and a novel Conditional Online Reinforcement Learning from Human Feedback (COOL RLHF) strategy that addresses conflicting human preferences and reward hacking. By releasing InternLM2 models in different training stages and model sizes, we provide the community with insights into the model's evolution. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.17235 [pdf, ps, other]

A Discrete-Time Least-Squares Adaptive State Tracking Control Scheme with A Mobile-Robot System Study

Authors: Qianhong Zhao, Gang Tao

Abstract: This paper develops an adaptive state tracking control scheme for discrete-time systems, using the least-squares algorithm, as the new solution to the long-standing discrete-time adaptive state tracking control problem to which the Lyapunov method (well-developed for the continuous-time adaptive state tracking problem) is not applicable. The new adaptive state tracking scheme is based on a recentl… ▽ More This paper develops an adaptive state tracking control scheme for discrete-time systems, using the least-squares algorithm, as the new solution to the long-standing discrete-time adaptive state tracking control problem to which the Lyapunov method (well-developed for the continuous-time adaptive state tracking problem) is not applicable. The new adaptive state tracking scheme is based on a recently-developed new discrete-time error model which has been used for gradient algorithm based state tracking control schemes, and uses the least-squares algorithm for parameter adaptation. The new least-squares algorithm is derived to minimize an accumulative estimation error, to ensure certain optimality for parameter estimation. The system stability and output tracking properties are studied. Technical results are presented in terms of plant-model matching, error model, adaptive law, optimality formulation, and stability and tracking analysis. The developed adaptive control scheme is applied to a discrete-time multiple mobile robot system to meet an adaptive state tracking objective. In addition, a collision avoidance mechanism is proposed to prevent collisions in the whole tracking process. Simulation results are presented, which verify the desired system state tracking properties under the developed least-squares algorithm based adaptive control scheme. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16649 [pdf, other]

CLHA: A Simple yet Effective Contrastive Learning Framework for Human Alignment

Authors: Feiteng Fang, Liang Zhu, Min Yang, Xi Feng, **chang Hou, Qixuan Zhao, Chengming Li, Xi** Hu, Ruifeng Xu

Abstract: Reinforcement learning from human feedback (RLHF) is a crucial technique in aligning large language models (LLMs) with human preferences, ensuring these LLMs behave in beneficial and comprehensible ways to users. However, a longstanding challenge in human alignment techniques based on reinforcement learning lies in their inherent complexity and difficulty in training. To address this challenge, we… ▽ More Reinforcement learning from human feedback (RLHF) is a crucial technique in aligning large language models (LLMs) with human preferences, ensuring these LLMs behave in beneficial and comprehensible ways to users. However, a longstanding challenge in human alignment techniques based on reinforcement learning lies in their inherent complexity and difficulty in training. To address this challenge, we present a simple yet effective Contrastive Learning Framework for Human Alignment (CLHA) to align LLMs with human preferences directly. CLHA employs a novel rescoring strategy to evaluate the noise within the data by considering its inherent quality and dynamically adjusting the training process. Simultaneously, CLHA utilizes pairwise contrastive loss and adaptive supervised fine-tuning loss to adaptively modify the likelihood of generating responses, ensuring enhanced alignment with human preferences. Using advanced methods, CLHA surpasses other algorithms, showcasing superior performance in terms of reward model scores, automatic evaluations, and human assessments on the widely used ``Helpful and Harmless'' dataset. △ Less

Submitted 26 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16067 [pdf, other]

Robust Diffusion Models for Adversarial Purification

Authors: Guang Lin, Zerui Tao, Jianhai Zhang, Toshihisa Tanaka, Qibin Zhao

Abstract: Diffusion models (DMs) based adversarial purification (AP) has shown to be the most powerful alternative to adversarial training (AT). However, these methods neglect the fact that pre-trained diffusion models themselves are not robust to adversarial attacks as well. Additionally, the diffusion process can easily destroy semantic information and generate a high quality image but totally different f… ▽ More Diffusion models (DMs) based adversarial purification (AP) has shown to be the most powerful alternative to adversarial training (AT). However, these methods neglect the fact that pre-trained diffusion models themselves are not robust to adversarial attacks as well. Additionally, the diffusion process can easily destroy semantic information and generate a high quality image but totally different from the original input image after the reverse process, leading to degraded standard accuracy. To overcome these issues, a natural idea is to harness adversarial training strategy to retrain or fine-tune the pre-trained diffusion model, which is computationally prohibitive. We propose a novel robust reverse process with adversarial guidance, which is independent of given pre-trained DMs and avoids retraining or fine-tuning the DMs. This robust guidance can not only ensure to generate purified examples retaining more semantic content but also mitigate the accuracy-robustness trade-off of DMs for the first time, which also provides DM-based AP an efficient adaptive ability to new attacks. Extensive experiments are conducted on CIFAR-10, CIFAR-100 and ImageNet to demonstrate that our method achieves the state-of-the-art results and exhibits generalization against different attacks. △ Less

Submitted 24 May, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.15574 [pdf, other]

SensoryT5: Infusing Sensorimotor Norms into T5 for Enhanced Fine-grained Emotion Classification

Authors: Yuhan Xia, Qingqing Zhao, Yunfei Long, Ge Xu, Jia Wang

Abstract: In traditional research approaches, sensory perception and emotion classification have traditionally been considered separate domains. Yet, the significant influence of sensory experiences on emotional responses is undeniable. The natural language processing (NLP) community has often missed the opportunity to merge sensory knowledge with emotion classification. To address this gap, we propose Sens… ▽ More In traditional research approaches, sensory perception and emotion classification have traditionally been considered separate domains. Yet, the significant influence of sensory experiences on emotional responses is undeniable. The natural language processing (NLP) community has often missed the opportunity to merge sensory knowledge with emotion classification. To address this gap, we propose SensoryT5, a neuro-cognitive approach that integrates sensory information into the T5 (Text-to-Text Transfer Transformer) model, designed specifically for fine-grained emotion classification. This methodology incorporates sensory cues into the T5's attention mechanism, enabling a harmonious balance between contextual understanding and sensory awareness. The resulting model amplifies the richness of emotional representations. In rigorous tests across various detailed emotion classification datasets, SensoryT5 showcases improved performance, surpassing both the foundational T5 model and current state-of-the-art works. Notably, SensoryT5's success signifies a pivotal change in the NLP domain, highlighting the potential influence of neuro-cognitive data in refining machine learning models' emotional sensitivity. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: Accepted by CogALex 2024 conference

arXiv:2403.14474 [pdf, ps, other]

doi 10.1016/j.scib.2024.05.013

Tensor-force effects on nuclear matter in relativistic ab initio theory

Authors: Sibo Wang, Hui Tong, Chencan Wang, Qiang Zhao, Peter Ring, Jie Meng

Abstract: Within the relativistic Brueckner-Hartree-Fock theory in the full Dirac space, the tensor-force effects on infinite nuclear matter are elucidated by subtracting the matrix elements of tensor forces from the realistic nucleon-nucleon interaction. The tensor-force effects for the binding energy per particle of symmetric nuclear matter (SNM) as well as the symmetry energy are attractive and are more… ▽ More Within the relativistic Brueckner-Hartree-Fock theory in the full Dirac space, the tensor-force effects on infinite nuclear matter are elucidated by subtracting the matrix elements of tensor forces from the realistic nucleon-nucleon interaction. The tensor-force effects for the binding energy per particle of symmetric nuclear matter (SNM) as well as the symmetry energy are attractive and are more pronounced around the empirical saturation density, while the tensor forces have little impact on the pure neutron matter. By tuning the tensor-force strength, an infinite (negative) scattering length in the spin-triplet channel is found. This locates the dilute SNM with only the $^3S_1$-$^3D_1$ channel interaction at the unitary limit. Its ground-state energy is found proportional to the energy of a free Fermi gas with a scaling factor 0.38, revealing good universal properties. This work paves the way to study the tensor-force effects in neutron stars as well as finite nuclei from realistic nucleon-nucleon interactions, highlights the role of the tensor force on the deviation of the nuclear physics to the unitary limit, and provides valuable reference for studies of the four-component unitary Fermi gas. △ Less

Submitted 3 June, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: 5 pages, 2 figures, discussion on four-component unitary Fermi gas is updated, accepted by Science Bulletin

arXiv:2403.11473 [pdf, other]

Word Order's Impacts: Insights from Reordering and Generation Analysis

Authors: Qinghua Zhao, Jiaang Li, Lei Li, Zenghui Zhou, Junfeng Liu

Abstract: Existing works have studied the impacts of the order of words within natural text. They usually analyze it by destroying the original order of words to create a scrambled sequence, and then comparing the models' performance between the original and scrambled sequences. The experimental results demonstrate marginal drops. Considering this findings, different hypothesis about word order is proposed,… ▽ More Existing works have studied the impacts of the order of words within natural text. They usually analyze it by destroying the original order of words to create a scrambled sequence, and then comparing the models' performance between the original and scrambled sequences. The experimental results demonstrate marginal drops. Considering this findings, different hypothesis about word order is proposed, including ``the order of words is redundant with lexical semantics'', and ``models do not rely on word order''. In this paper, we revisit the aforementioned hypotheses by adding a order reconstruction perspective, and selecting datasets of different spectrum. Specifically, we first select four different datasets, and then design order reconstruction and continuing generation tasks. Empirical findings support that ChatGPT relies on word order to infer, but cannot support or negate the redundancy relations between word order lexical semantics. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.11405 [pdf, other]

A Deep Learning Method for Beat-Level Risk Analysis and Interpretation of Atrial Fibrillation Patients during Sinus Rhythm

Authors: Jun Lei, Yuxi Zhou, Xue Tian, Qinghao Zhao, Qi Zhang, Shijia Geng, Qingbo Wu, Shenda Hong

Abstract: Atrial Fibrillation (AF) is a common cardiac arrhythmia. Many AF patients experience complications such as stroke and other cardiovascular issues. Early detection of AF is crucial. Existing algorithms can only distinguish ``AF rhythm in AF patients'' from ``sinus rhythm in normal individuals'' . However, AF patients do not always exhibit AF rhythm, posing a challenge for diagnosis when the AF rhyt… ▽ More Atrial Fibrillation (AF) is a common cardiac arrhythmia. Many AF patients experience complications such as stroke and other cardiovascular issues. Early detection of AF is crucial. Existing algorithms can only distinguish ``AF rhythm in AF patients'' from ``sinus rhythm in normal individuals'' . However, AF patients do not always exhibit AF rhythm, posing a challenge for diagnosis when the AF rhythm is absent. To address this, this paper proposes a novel artificial intelligence (AI) algorithm to distinguish ``sinus rhythm in AF patients'' and ``sinus rhythm in normal individuals'' in beat-level. We introduce beat-level risk interpreters, trend risk interpreters, addressing the interpretability issues of deep learning models and the difficulty in explaining AF risk trends. Additionally, the beat-level information fusion decision is presented to enhance model accuracy. The experimental results demonstrate that the average AUC for single beats used as testing data from CPSC 2021 dataset is 0.7314. By employing 150 beats for information fusion decision algorithm, the average AUC can reach 0.7591. Compared to previous segment-level algorithms, we utilized beats as input, reducing data dimensionality and making the model more lightweight, facilitating deployment on portable medical devices. Furthermore, we draw new and interesting findings through average beat analysis and subgroup analysis, considering varying risk levels. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.11142 [pdf, other]

Dynamics and Resonance Fluorescence from a Superconducting Artificial Atom Doubly Driven by Quantized and Classical Fields

Authors: Xinhui Ruan, Jia-Heng Wang, Dong He, Pengtao Song, Shengyong Li, Qianchuan Zhao, L. M. Kuang, Jaw-Shen Tsai, Chang-Ling Zou, **g Zhang, Dongning Zheng, O. V. Astafiev, Yu-xi Liu, Zhihui Peng

Abstract: We report an experimental demonstration of resonance fluorescence in a two-level superconducting artificial atom under two driving fields coupled to a detuned cavity. One of the fields is classical and the other is varied from quantum (vacuum fluctuations) to classical one by controlling the photon number inside the cavity. The device consists of a transmon qubit strongly coupled to a one-dimensio… ▽ More We report an experimental demonstration of resonance fluorescence in a two-level superconducting artificial atom under two driving fields coupled to a detuned cavity. One of the fields is classical and the other is varied from quantum (vacuum fluctuations) to classical one by controlling the photon number inside the cavity. The device consists of a transmon qubit strongly coupled to a one-dimensional transmission line and a coplanar waveguide resonator. We observe a sideband anti-crossing and asymmetry in the emission spectra of the system through a one-dimensional transmission line, which is fundamentally different from the weak coupling case. By changing the photon number inside the cavity, the emission spectrum of our doubly driven system approaches to the case when the atom is driven by two classical bichromatic fields. We also measure the dynamical evolution of the system through the transmission line and study the properties of the first-order correlation function, Rabi oscillations and energy relaxation in the system. The study of resonance fluorescence from an atom driven by two fields promotes understanding decoherence in superconducting quantum circuits and may find applications in superconducting quantum computing and quantum networks. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.11101 [pdf, other]

Hierarchical Generative Network for Face Morphing Attacks

Authors: Zuyuan He, Zongyong Deng, Qiaoyun He, Qijun Zhao

Abstract: Face morphing attacks circumvent face recognition systems (FRSs) by creating a morphed image that contains multiple identities. However, existing face morphing attack methods either sacrifice image quality or compromise the identity preservation capability. Consequently, these attacks fail to bypass FRSs verification well while still managing to deceive human observers. These methods typically rel… ▽ More Face morphing attacks circumvent face recognition systems (FRSs) by creating a morphed image that contains multiple identities. However, existing face morphing attack methods either sacrifice image quality or compromise the identity preservation capability. Consequently, these attacks fail to bypass FRSs verification well while still managing to deceive human observers. These methods typically rely on global information from contributing images, ignoring the detailed information from effective facial regions. To address the above issues, we propose a novel morphing attack method to improve the quality of morphed images and better preserve the contributing identities. Our proposed method leverages the hierarchical generative network to capture both local detailed and global consistency information. Additionally, a mask-guided image blending module is dedicated to removing artifacts from areas outside the face to improve the image's visual quality. The proposed attack method is compared to state-of-the-art methods on three public datasets in terms of FRSs' vulnerability, attack detectability, and image quality. The results show our method's potential threat of deceiving FRSs while being capable of passing multiple morphing attack detection (MAD) scenarios. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: Accepted by FG2024

arXiv:2403.10831 [pdf, other]

DUE: Dynamic Uncertainty-Aware Explanation Supervision via 3D Imputation

Authors: Qilong Zhao, Yifei Zhang, Mengdan Zhu, Siyi Gu, Yuyang Gao, Xiaofeng Yang, Liang Zhao

Abstract: Explanation supervision aims to enhance deep learning models by integrating additional signals to guide the generation of model explanations, showcasing notable improvements in both the predictability and explainability of the model. However, the application of explanation supervision to higher-dimensional data, such as 3D medical images, remains an under-explored domain. Challenges associated wit… ▽ More Explanation supervision aims to enhance deep learning models by integrating additional signals to guide the generation of model explanations, showcasing notable improvements in both the predictability and explainability of the model. However, the application of explanation supervision to higher-dimensional data, such as 3D medical images, remains an under-explored domain. Challenges associated with supervising visual explanations in the presence of an additional dimension include: 1) spatial correlation changed, 2) lack of direct 3D annotations, and 3) uncertainty varies across different parts of the explanation. To address these challenges, we propose a Dynamic Uncertainty-aware Explanation supervision (DUE) framework for 3D explanation supervision that ensures uncertainty-aware explanation guidance when dealing with sparsely annotated 3D data with diffusion-based 3D interpolation. Our proposed framework is validated through comprehensive experiments on diverse real-world medical imaging datasets. The results demonstrate the effectiveness of our framework in enhancing the predictability and explainability of deep learning models in the context of medical imaging diagnosis applications. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: 9 pages,6 figures

arXiv:2403.10481 [pdf, other]

Tensor Star Decomposition

Authors: Wuyang Zhou, Yu-Bang Zheng, Qibin Zhao, Danilo Mandic

Abstract: A novel tensor decomposition framework, termed Tensor Star (TS) decomposition, is proposed which represents a new type of tensor network decomposition based on tensor contractions. This is achieved by connecting the core tensors in a ring shape, whereby the core tensors act as skip connections between the factor tensors and allow for direct correlation characterisation between any two arbitrary di… ▽ More A novel tensor decomposition framework, termed Tensor Star (TS) decomposition, is proposed which represents a new type of tensor network decomposition based on tensor contractions. This is achieved by connecting the core tensors in a ring shape, whereby the core tensors act as skip connections between the factor tensors and allow for direct correlation characterisation between any two arbitrary dimensions. Uniquely, this makes it possible to decompose an order-$N$ tensor into $N$ order-$3$ factor tensors $\{\mathcal{G}_{k}\}_{k=1}^{N}$ and $N$ order-$4$ core tensors $\{\mathcal{C}_{k}\}_{k=1}^{N}$, which are arranged in a star shape. Unlike the class of Tensor Train (TT) decompositions, these factor tensors are not directly connected to one another. The so obtained core tensors also enable consecutive factor tensors to have different latent ranks. In this way, the TS decomposition alleviates the "curse of dimensionality" and controls the "curse of ranks", exhibiting a storage complexity which scales linearly with the number of dimensions and as the fourth power of the ranks. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.09037 [pdf, other]

The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?

Authors: Qinyu Zhao, Ming Xu, Kartik Gupta, Akshay Asthana, Liang Zheng, Stephen Gould

Abstract: Large vision-language models (LVLMs), designed to interpret and respond to human instructions, occasionally generate hallucinated or harmful content due to inappropriate instructions. This study uses linear probing to shed light on the hidden knowledge at the output layer of LVLMs. We demonstrate that the logit distributions of the first tokens contain sufficient information to determine whether t… ▽ More Large vision-language models (LVLMs), designed to interpret and respond to human instructions, occasionally generate hallucinated or harmful content due to inappropriate instructions. This study uses linear probing to shed light on the hidden knowledge at the output layer of LVLMs. We demonstrate that the logit distributions of the first tokens contain sufficient information to determine whether to respond to the instructions, including recognizing unanswerable visual questions, defending against multi-modal jailbreaking attack, and identifying deceptive questions. Such hidden knowledge is gradually lost in logits of subsequent tokens during response generation. Then, we illustrate a simple decoding strategy at the generation of the first token, effectively improving the generated content. In experiments, we find a few interesting insights: First, the CLIP model already contains a strong signal for solving these tasks, indicating potential bias in the existing datasets. Second, we observe performance improvement by utilizing the first logit distributions on three additional tasks, including indicting uncertainty in math solving, mitigating hallucination, and image classification. Last, with the same training data, simply finetuning LVLMs improve models' performance but is still inferior to linear probing on these tasks. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: Under review. Project page: https://github.com/Qinyu-Allen-Zhao/LVLM-LP

arXiv:2403.08334 [pdf, other]

DONAPI: Malicious NPM Packages Detector using Behavior Sequence Knowledge Map**

Authors: Cheng Huang, Nannan Wang, Ziyan Wang, Siqi Sun, Lingzi Li, Junren Chen, Qianchong Zhao, Jiaxuan Han, Zhen Yang, Lei Shi

Abstract: With the growing popularity of modularity in software development comes the rise of package managers and language ecosystems. Among them, npm stands out as the most extensive package manager, hosting more than 2 million third-party open-source packages that greatly simplify the process of building code. However, this openness also brings security risks, as evidenced by numerous package poisoning i… ▽ More With the growing popularity of modularity in software development comes the rise of package managers and language ecosystems. Among them, npm stands out as the most extensive package manager, hosting more than 2 million third-party open-source packages that greatly simplify the process of building code. However, this openness also brings security risks, as evidenced by numerous package poisoning incidents. In this paper, we synchronize a local package cache containing more than 3.4 million packages in near real-time to give us access to more package code details. Further, we perform manual inspection and API call sequence analysis on packages collected from public datasets and security reports to build a hierarchical classification framework and behavioral knowledge base covering different sensitive behaviors. In addition, we propose the DONAPI, an automatic malicious npm packages detector that combines static and dynamic analysis. It makes preliminary judgments on the degree of maliciousness of packages by code reconstruction techniques and static analysis, extracts dynamic API call sequences to confirm and identify obfuscated content that static analysis can not handle alone, and finally tags malicious software packages based on the constructed behavior knowledge base. To date, we have identified and manually confirmed 325 malicious samples and discovered 2 unusual API calls and 246 API call sequences that have not appeared in known samples. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 18 pages, accepted for publication at USENIX Security 2024

arXiv:2403.06942 [pdf, other]

Grid Monitoring and Protection with Continuous Point-on-Wave Measurements and Generative AI

Authors: Lang Tong, Xinyi Wang, Qing Zhao

Abstract: Purpose This article presents a case for a next-generation grid monitoring and control system, leveraging recent advances in generative artificial intelligence (AI), machine learning, and statistical inference. Advancing beyond earlier generations of wide-area monitoring systems built upon supervisory control and data acquisition (SCADA) and synchrophasor technologies, we argue for a monitoring an… ▽ More Purpose This article presents a case for a next-generation grid monitoring and control system, leveraging recent advances in generative artificial intelligence (AI), machine learning, and statistical inference. Advancing beyond earlier generations of wide-area monitoring systems built upon supervisory control and data acquisition (SCADA) and synchrophasor technologies, we argue for a monitoring and control framework based on the streaming of continuous point-on-wave (CPOW) measurements with AI-powered data compression and fault detection. Methods and Results: The architecture of the proposed design originates from the Wiener-Kallianpur innovation representation of a random process that transforms causally a stationary random process into an innovation sequence with independent and identically distributed random variables. This work presents a generative AI approach that (i) learns an innovation autoencoder that extracts innovation sequence from CPOW time series, (ii) compresses the CPOW streaming data with innovation autoencoder and subband coding, and (iii) detects unknown faults and novel trends via nonparametric sequential hypothesis testing. Conclusion: This work argues that conventional monitoring using SCADA and phasor measurement unit (PMU) technologies is ill-suited for a future grid with deep penetration of inverter-based renewable generations and distributed energy resources. A monitoring system based on CPOW data streaming and AI data analytics should be the basic building blocks for situational awareness of a highly dynamic future grid. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.05854 [pdf, other]

LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content

Authors: Qihao Zhao, Yalun Dai, Hao Li, Wei Hu, Fan Zhang, Jun Liu

Abstract: Long-tail recognition is challenging because it requires the model to learn good representations from tail categories and address imbalances across all categories. In this paper, we propose a novel generative and fine-tuning framework, LTGC, to handle long-tail recognition via leveraging generated content. Firstly, inspired by the rich implicit knowledge in large-scale models (e.g., large language… ▽ More Long-tail recognition is challenging because it requires the model to learn good representations from tail categories and address imbalances across all categories. In this paper, we propose a novel generative and fine-tuning framework, LTGC, to handle long-tail recognition via leveraging generated content. Firstly, inspired by the rich implicit knowledge in large-scale models (e.g., large language models, LLMs), LTGC leverages the power of these models to parse and reason over the original tail data to produce diverse tail-class content. We then propose several novel designs for LTGC to ensure the quality of the generated data and to efficiently fine-tune the model using both the generated and original data. The visualization demonstrates the effectiveness of the generation module in LTGC, which produces accurate and diverse tail data. Additionally, the experimental results demonstrate that our LTGC outperforms existing state-of-the-art methods on popular long-tailed benchmarks. △ Less

Submitted 26 May, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

Comments: CVPR 2024, Oral

arXiv:2403.05808 [pdf, other]

Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution

Authors: Junxiong Lin, Yan Wang, Zeng Tao, Boyang Wang, Qing Zhao, Haorang Wang, Xuan Tong, Xinji Mai, Yuxuan Lin, Wei Song, Jiawen Yu, Shaoqi Yan, Wenqiang Zhang

Abstract: Pre-trained diffusion models utilized for image generation encapsulate a substantial reservoir of a priori knowledge pertaining to intricate textures. Harnessing the potential of leveraging this a priori knowledge in the context of image super-resolution presents a compelling avenue. Nonetheless, prevailing diffusion-based methodologies presently overlook the constraints imposed by degradation inf… ▽ More Pre-trained diffusion models utilized for image generation encapsulate a substantial reservoir of a priori knowledge pertaining to intricate textures. Harnessing the potential of leveraging this a priori knowledge in the context of image super-resolution presents a compelling avenue. Nonetheless, prevailing diffusion-based methodologies presently overlook the constraints imposed by degradation information on the diffusion process. Furthermore, these methods fail to consider the spatial variability inherent in the estimated blur kernel, stemming from factors such as motion jitter and out-of-focus elements in open-environment scenarios. This oversight results in a notable deviation of the image super-resolution effect from fundamental realities. To address these concerns, we introduce a framework known as Adaptive Multi-modal Fusion of \textbf{S}patially Variant Kernel Refinement with Diffusion Model for Blind Image \textbf{S}uper-\textbf{R}esolution (SSR). Within the SSR framework, we propose a Spatially Variant Kernel Refinement (SVKR) module. SVKR estimates a Depth-Informed Kernel, which takes the depth information into account and is spatially variant. Additionally, SVKR enhance the accuracy of depth information acquired from LR images, allowing for mutual enhancement between the depth map and blur kernel estimates. Finally, we introduce the Adaptive Multi-Modal Fusion (AMF) module to align the information from three modalities: low-resolution images, depth maps, and blur kernels. This alignment can constrain the diffusion model to generate more authentic SR results. Quantitative and qualitative experiments affirm the superiority of our approach, while ablation experiments corroborate the effectiveness of the modules we have proposed. △ Less

Submitted 9 March, 2024; originally announced March 2024.

arXiv:2403.05743 [pdf, ps, other]

Forecasting Electricity Market Signals via Generative AI

Authors: Xinyi Wang, Qing Zhao, Lang Tong

Abstract: This paper presents a generative artificial intelligence approach to probabilistic forecasting of electricity market signals, such as real-time locational marginal prices and area control error signals. Inspired by the Wiener-Kallianpur innovation representation of nonparametric time series, we propose a weak innovation autoencoder architecture and a novel deep learning algorithm that extracts the… ▽ More This paper presents a generative artificial intelligence approach to probabilistic forecasting of electricity market signals, such as real-time locational marginal prices and area control error signals. Inspired by the Wiener-Kallianpur innovation representation of nonparametric time series, we propose a weak innovation autoencoder architecture and a novel deep learning algorithm that extracts the canonical independent and identically distributed innovation sequence of the time series, from which samples of future time series are generated. The validity of the proposed approach is established by proving that, under ideal training conditions, the generated samples have the same conditional probability distribution as that of the ground truth. Three applications involving highly dynamic and volatile time series in real-time market operations are considered: (i) locational marginal price forecasting for self-scheduled resources such as battery storage participants, (ii) interregional price spread forecasting for virtual bidders in interchange markets, and (iii) area control error forecasting for frequency regulations. Numerical studies based on market data from multiple independent system operators demonstrate the superior performance of the proposed generative forecaster over leading classical and modern machine learning techniques under both probabilistic and point forecasting metrics. △ Less

Submitted 27 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.05444 [pdf, other]

Chlorine and zinc co-do** effects on the electronic structure and optical properties of γ-CuI

Authors: Chao Li, Meicong Li, Zhuli Zhang, Qiang Zhao, Naixin Liu, Kailei Wang, Fan Zhang, ** Ouyang

Abstract: The effects of chlorine (Cl) and zinc (Zn) co-do** on the electronic structure and optical properties of the zinc blende (γ) phase of copper iodide (γ-CuI) scintillator material are investigated by using first-principles density functional theory calculations. The band structure, density of states, dielectric function, absorption coefficients, and reflectivity were analyzed before and after dopi… ▽ More The effects of chlorine (Cl) and zinc (Zn) co-do** on the electronic structure and optical properties of the zinc blende (γ) phase of copper iodide (γ-CuI) scintillator material are investigated by using first-principles density functional theory calculations. The band structure, density of states, dielectric function, absorption coefficients, and reflectivity were analyzed before and after do**. Results show co-do** significantly modifies the band structure, reduces the band gap, and generates impurity energy levels. Cl do** enhances absorption in the high energy region while reducing visible light absorption. Zn do** induces a redshift in absorption and n-type conductivity at high concentrations. With suitable co-do** ratios, the absorption coefficient and reflectivity of γ-CuI can be optimized in the visible range to improve scintillation light yield. The calculations provide guidance for co-do** γ-CuI scintillators to achieve superior detection performance. The n-type conductivity also makes doped γ-CuI promising for optoelectronic applications. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.04299 [pdf, other]

LitSim: A Conflict-aware Policy for Long-term Interactive Traffic Simulation

Authors: Haojie Xin, Xiaodong Zhang, Renzhi Tang, Songyang Yan, Qianrui Zhao, Chunze Yang, Wen Cui, Zijiang Yang

Abstract: Simulation is pivotal in evaluating the performance of autonomous driving systems due to the advantages of high efficiency and low cost compared to on-road testing. Bridging the gap between simulation and the real world requires realistic agent behaviors. However, the existing works have the following shortcomings in achieving this goal: (1) log replay offers realistic scenarios but often leads to… ▽ More Simulation is pivotal in evaluating the performance of autonomous driving systems due to the advantages of high efficiency and low cost compared to on-road testing. Bridging the gap between simulation and the real world requires realistic agent behaviors. However, the existing works have the following shortcomings in achieving this goal: (1) log replay offers realistic scenarios but often leads to collisions due to the absence of dynamic interactions, and (2) both heuristic-based and data-based solutions, which are parameterized and trained on real-world datasets, encourage interactions but often deviate from real-world data over long horizons. In this work, we propose LitSim, a long-term interactive simulation approach that maximizes realism by minimizing the interventions in the log. Specifically, our approach primarily uses log replay to ensure realism and intervenes only when necessary to prevent potential conflicts. We then encourage interactions among the agents and resolve the conflicts, thereby reducing the risk of unrealistic behaviors. We train and validate our model on the real-world dataset NGSIM, and the experimental results demonstrate that LitSim outperforms the currently popular approaches in terms of realism and reactivity. △ Less

Submitted 1 May, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

Comments: 9 pages, 6 figures, under review

arXiv:2403.04294 [pdf, other]

A$^{3}$lign-DFER: Pioneering Comprehensive Dynamic Affective Alignment for Dynamic Facial Expression Recognition with CLIP

Authors: Zeng Tao, Yan Wang, Junxiong Lin, Haoran Wang, Xinji Mai, Jiawen Yu, Xuan Tong, Ziheng Zhou, Shaoqi Yan, Qing Zhao, Liyuan Han, Wenqiang Zhang

Abstract: The performance of CLIP in dynamic facial expression recognition (DFER) task doesn't yield exceptional results as observed in other CLIP-based classification tasks. While CLIP's primary objective is to achieve alignment between images and text in the feature space, DFER poses challenges due to the abstract nature of text and the dynamic nature of video, making label representation limited and perf… ▽ More The performance of CLIP in dynamic facial expression recognition (DFER) task doesn't yield exceptional results as observed in other CLIP-based classification tasks. While CLIP's primary objective is to achieve alignment between images and text in the feature space, DFER poses challenges due to the abstract nature of text and the dynamic nature of video, making label representation limited and perfect alignment difficult. To address this issue, we have designed A$^{3}$lign-DFER, which introduces a new DFER labeling paradigm to comprehensively achieve alignment, thus enhancing CLIP's suitability for the DFER task. Specifically, our A$^{3}$lign-DFER method is designed with multiple modules that work together to obtain the most suitable expanded-dimensional embeddings for classification and to achieve alignment in three key aspects: affective, dynamic, and bidirectional. We replace the input label text with a learnable Multi-Dimensional Alignment Token (MAT), enabling alignment of text to facial expression video samples in both affective and dynamic dimensions. After CLIP feature extraction, we introduce the Joint Dynamic Alignment Synchronizer (JAS), further facilitating synchronization and alignment in the temporal dimension. Additionally, we implement a Bidirectional Alignment Training Paradigm (BAP) to ensure gradual and steady training of parameters for both modalities. Our insightful and concise A$^{3}$lign-DFER method achieves state-of-the-art results on multiple DFER datasets, including DFEW, FERV39k, and MAFW. Extensive ablation experiments and visualization studies demonstrate the effectiveness of A$^{3}$lign-DFER. The code will be available in the future. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Showing 51–100 of 1,678 results for author: Zhao, Q