Search | arXiv e-print repository

arXiv:2404.19243 [pdf, other]

Co-occurrence order-preserving pattern mining

Authors: Youxi Wu, Zhen Wang, Yan Li, Yingchun Guo, He Jiang, Xingquan Zhu, Xindong Wu

Abstract: Recently, order-preserving pattern (OPP) mining has been proposed to discover some patterns, which can be seen as trend changes in time series. Although existing OPP mining algorithms have achieved satisfactory performance, they discover all frequent patterns. However, in some cases, users focus on a particular trend and its associated trends. To efficiently discover trend information related to a… ▽ More Recently, order-preserving pattern (OPP) mining has been proposed to discover some patterns, which can be seen as trend changes in time series. Although existing OPP mining algorithms have achieved satisfactory performance, they discover all frequent patterns. However, in some cases, users focus on a particular trend and its associated trends. To efficiently discover trend information related to a specific prefix pattern, this paper addresses the issue of co-occurrence OPP mining (COP) and proposes an algorithm named COP-Miner to discover COPs from historical time series. COP-Miner consists of three parts: extracting keypoints, preparation stage, and iteratively calculating supports and mining frequent COPs. Extracting keypoints is used to obtain local extreme points of patterns and time series. The preparation stage is designed to prepare for the first round of mining, which contains four steps: obtaining the suffix OPP of the keypoint sub-time series, calculating the occurrences of the suffix OPP, verifying the occurrences of the keypoint sub-time series, and calculating the occurrences of all fusion patterns of the keypoint sub-time series. To further improve the efficiency of support calculation, we propose a support calculation method with an ending strategy that uses the occurrences of prefix and suffix patterns to calculate the occurrences of superpatterns. Experimental results indicate that COP-Miner outperforms the other competing algorithms in running time and scalability. Moreover, COPs with keypoint alignment yield better prediction performance. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.19225 [pdf, other]

doi 10.1140/epjc/s10052-024-12804-8

Electromagnetic field and chaotic charged-particle motion around hairy black holes in Horndeski gravity

Authors: Wenfu Cao, Xin Wu, Jun Lyu

Abstract: The Wald vector potential is an exact solution of the source-less Maxwell equations regarding an electromagnetic field of a vacuum uncharged black hole like the Kerr background black hole in an asymptotically uniform magnetic field. However, it is not if the black hole is a nonvacuum solution in a theory of modified gravity with extra fields or a charged Kerr-Newman spacetime. To satisfy the sourc… ▽ More The Wald vector potential is an exact solution of the source-less Maxwell equations regarding an electromagnetic field of a vacuum uncharged black hole like the Kerr background black hole in an asymptotically uniform magnetic field. However, it is not if the black hole is a nonvacuum solution in a theory of modified gravity with extra fields or a charged Kerr-Newman spacetime. To satisfy the source-less Maxwell equations in this case, the Wald vector potential must be modified and generalized appropriately. Following this idea, we derive an expression for the vector potential of an electromagnetic field surrounding a hairy black hole in the Horndeski modified gravity theory. Explicit symplectic integrators with excellent long-term behaviour are used to simulate the motion of charged particles around the hairy black hole immersed in the external magnetic field. The recurrence plot method based on the recurrence quantification analysis uses diagonal structures parallel to the main diagonal to show regular dynamics, but adopts no diagonal structures to indicate chaotic dynamics. The method is efficient to detect chaos from order in the curved spacetime, as the Poincare map and the fast Lyapunov indicator are. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 18 pages,9 figures, EPJC (2024)

arXiv:2404.18670 [pdf, other]

Enhancing Uncertain Demand Prediction in Hospitals Using Simple and Advanced Machine Learning

Authors: Annie Hu, Samuel Stockman, Xun Wu, Richard Wood, Bangdong Zhi, Oliver Y. Chén

Abstract: Early and timely prediction of patient care demand not only affects effective resource allocation but also influences clinical decision-making as well as patient experience. Accurately predicting patient care demand, however, is a ubiquitous challenge for hospitals across the world due, in part, to the demand's time-varying temporal variability, and, in part, to the difficulty in modelling trends… ▽ More Early and timely prediction of patient care demand not only affects effective resource allocation but also influences clinical decision-making as well as patient experience. Accurately predicting patient care demand, however, is a ubiquitous challenge for hospitals across the world due, in part, to the demand's time-varying temporal variability, and, in part, to the difficulty in modelling trends in advance. To address this issue, here, we develop two methods, a relatively simple time-vary linear model, and a more advanced neural network model. The former forecasts patient arrivals hourly over a week based on factors such as day of the week and previous 7-day arrival patterns. The latter leverages a long short-term memory (LSTM) model, capturing non-linear relationships between past data and a three-day forecasting window. We evaluate the predictive capabilities of the two proposed approaches compared to two naïve approaches - a reduced-rank vector autoregressive (VAR) model and the TBATS model. Using patient care demand data from Rambam Medical Center in Israel, our results show that both proposed models effectively capture hourly variations of patient demand. Additionally, the linear model is more explainable thanks to its simple architecture, whereas, by accurately modelling weekly seasonal trends, the LSTM model delivers lower prediction errors. Taken together, our explorations suggest the utility of machine learning in predicting time-varying patient care demand; additionally, it is possible to predict patient care demand with good accuracy (around 4 patients) three days or a week in advance using machine learning. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18472 [pdf]

Direct observation of anisotropic Cooper pairing in kagome superconductor CsV3Sb5

Authors: Akifumi Mine, Yigui Zhong, **** Liu, Takeshi Suzuki, Sahand Najafzadeh, Takumi Uchiyama, Jia-Xin Yin, Xianxin Wu, Xun Shi, Zhiwei Wang, Yugui Yao, Kozo Okazaki

Abstract: In the recently discovered kagome superconductor AV3Sb5 (A = K, Rb, and Cs), the superconductivity is intertwined with an unconventional charge density wave order. Its pairing symmetry remains elusive owing to the lack of direct measurement of the superconducting gap in the momentum space. In this letter, utilizing laser-based ultra-high-resolution and low-temperature angle-resolved photoemission… ▽ More In the recently discovered kagome superconductor AV3Sb5 (A = K, Rb, and Cs), the superconductivity is intertwined with an unconventional charge density wave order. Its pairing symmetry remains elusive owing to the lack of direct measurement of the superconducting gap in the momentum space. In this letter, utilizing laser-based ultra-high-resolution and low-temperature angle-resolved photoemission spectroscopy, we observe anisotropic Cooper pairing in kagome superconductor CsV3Sb5. We detect a highly anisotropic superconducting gap structure with an anisotropy over 80% and the gap maximum along the V-V bond direction on a Fermi surface originated from the 3d-orbital electrons of the V kagome lattice. It is in stark contrast to the isotropic superconducting gap structure on the other Fermi surface that is occupied by Sb 5p-orbital electrons. Our observation of the anisotropic Cooper pairing in pristine CsV3Sb5 is fundamental for understanding intertwined orders in the ground state of kagome superconductors. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18426 [pdf, other]

Efficient Meta-Learning Enabled Lightweight Multiscale Few-Shot Object Detection in Remote Sensing Images

Authors: Wenbin Guan, Zijiu Yang, Xiaohong Wu, Liqiong Chen, Feng Huang, Xiaohai He, Honggang Chen

Abstract: Presently, the task of few-shot object detection (FSOD) in remote sensing images (RSIs) has become a focal point of attention. Numerous few-shot detectors, particularly those based on two-stage detectors, face challenges when dealing with the multiscale complexities inherent in RSIs. Moreover, these detectors present impractical characteristics in real-world applications, mainly due to their unwie… ▽ More Presently, the task of few-shot object detection (FSOD) in remote sensing images (RSIs) has become a focal point of attention. Numerous few-shot detectors, particularly those based on two-stage detectors, face challenges when dealing with the multiscale complexities inherent in RSIs. Moreover, these detectors present impractical characteristics in real-world applications, mainly due to their unwieldy model parameters when handling large amount of data. In contrast, we recognize the advantages of one-stage detectors, including high detection speed and a global receptive field. Consequently, we choose the YOLOv7 one-stage detector as a baseline and subject it to a novel meta-learning training framework. This transformation allows the detector to adeptly address FSOD tasks while capitalizing on its inherent advantage of lightweight. Additionally, we thoroughly investigate the samples generated by the meta-learning strategy and introduce a novel meta-sampling approach to retain samples produced by our designed meta-detection head. Coupled with our devised meta-cross loss, we deliberately utilize "negative samples" that are often overlooked to extract valuable knowledge from them. This approach serves to enhance detection accuracy and efficiently refine the overall meta-learning strategy. To validate the effectiveness of our proposed detector, we conducted performance comparisons with current state-of-the-art detectors using the DIOR and NWPU VHR-10.v2 datasets, yielding satisfactory results. △ Less

Submitted 16 June, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18149 [pdf, other]

Compressed Deepfake Video Detection Based on 3D Spatiotemporal Trajectories

Authors: Zongmei Chen, Xin Liao, Xiaoshuai Wu, Yanxiang Chen

Abstract: The misuse of deepfake technology by malicious actors poses a potential threat to nations, societies, and individuals. However, existing methods for detecting deepfakes primarily focus on uncompressed videos, such as noise characteristics, local textures, or frequency statistics. When applied to compressed videos, these methods experience a decrease in detection performance and are less suitable f… ▽ More The misuse of deepfake technology by malicious actors poses a potential threat to nations, societies, and individuals. However, existing methods for detecting deepfakes primarily focus on uncompressed videos, such as noise characteristics, local textures, or frequency statistics. When applied to compressed videos, these methods experience a decrease in detection performance and are less suitable for real-world scenarios. In this paper, we propose a deepfake video detection method based on 3D spatiotemporal trajectories. Specifically, we utilize a robust 3D model to construct spatiotemporal motion features, integrating feature details from both 2D and 3D frames to mitigate the influence of large head rotation angles or insufficient lighting within frames. Furthermore, we separate facial expressions from head movements and design a sequential analysis method based on phase space motion trajectories to explore the feature differences between genuine and fake faces in deepfake videos. We conduct extensive experiments to validate the performance of our proposed method on several compressed deepfake benchmarks. The robustness of the well-designed features is verified by calculating the consistent distribution of facial landmarks before and after video compression.Our method yields satisfactory results and showcases its potential for practical applications. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.18146 [pdf, other]

doi 10.1088/2053-1583/ad3b12

Tailoring coercive fields and the Curie temperature via proximity coupling in WSe$_2$/Fe$_3$GeTe$_2$ van der Waals heterostructures

Authors: Guodong Ma, Renjun Du, Fuzhuo Lian, Song Bao, Zi**g Guo, Xiaofan Cai, **gkuan Xiao, Yaqing Han, Di Zhang, Siqi Jiang, Jiabei Huang, Xinglong Wu, Alexander S. Mayorov, **sheng Wen, Lei Wang, Geliang Yu

Abstract: Hybrid structures consisting of two-dimensional (2D) magnets and semiconductors have exhibited extensive functionalities in spintronics and opto-spintronics. In this work, we have fabricated WSe$_2$/Fe$_3$GeTe$_2$ van der Waals (vdW) heterostructures and investigated the proximity effects on 2D magnetism. Through reflective magnetic circular dichroism (RMCD), we have observed a temperature-depende… ▽ More Hybrid structures consisting of two-dimensional (2D) magnets and semiconductors have exhibited extensive functionalities in spintronics and opto-spintronics. In this work, we have fabricated WSe$_2$/Fe$_3$GeTe$_2$ van der Waals (vdW) heterostructures and investigated the proximity effects on 2D magnetism. Through reflective magnetic circular dichroism (RMCD), we have observed a temperature-dependent modulation of magnetic order in the heterostructure. For temperatures above $40$ K, WSe$_2$-covered Fe$_3$GeTe$_2$ exhibits a larger coercive field than that observed in bare Fe$_3$GeTe$_2$, accompanied by a noticeable enhancement of the Curie temperature by $21$ K. This strengthening suggests an increase in magnetic anisotropy in the interfacial Fe$_3$GeTe$_2$ layer, which can be attributed to the spin-orbit coupling (SOC) proximity effect induced by the adjacent WSe$_2$ layers. However, at much lower temperatures ($T<20$ K), a non-monotonic modification of the coercive field is observed, showing both reduction and enhancement, which depends on the thickness of the WSe$_2$ and Fe$_3$GeTe$_2$ layers. Moreover, an unconventional two-step magnetization process emerges in the heterostructure, indicating the short-range nature of SOC proximity effects. Our findings revealing proximity effects on 2D magnetism may shed light on the design of future spintronic and memory devices based on 2D magnetic heterostructures. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.18136 [pdf, other]

SafePaint: Anti-forensic Image Inpainting with Domain Adaptation

Authors: Dunyun Chen, Xin Liao, Xiaoshuai Wu, Shiwei Chen

Abstract: Existing image inpainting methods have achieved remarkable accomplishments in generating visually appealing results, often accompanied by a trend toward creating more intricate structural textures. However, while these models excel at creating more realistic image content, they often leave noticeable traces of tampering, posing a significant threat to security. In this work, we take the anti-foren… ▽ More Existing image inpainting methods have achieved remarkable accomplishments in generating visually appealing results, often accompanied by a trend toward creating more intricate structural textures. However, while these models excel at creating more realistic image content, they often leave noticeable traces of tampering, posing a significant threat to security. In this work, we take the anti-forensic capabilities into consideration, firstly proposing an end-to-end training framework for anti-forensic image inpainting named SafePaint. Specifically, we innovatively formulated image inpainting as two major tasks: semantically plausible content completion and region-wise optimization. The former is similar to current inpainting methods that aim to restore the missing regions of corrupted images. The latter, through domain adaptation, endeavors to reconcile the discrepancies between the inpainted region and the unaltered area to achieve anti-forensic goals. Through comprehensive theoretical analysis, we validate the effectiveness of domain adaptation for anti-forensic performance. Furthermore, we meticulously crafted a region-wise separated attention (RWSA) module, which not only aligns with our objective of anti-forensics but also enhances the performance of the model. Extensive qualitative and quantitative evaluations show our approach achieves comparable results to existing image inpainting methods while offering anti-forensic capabilities not available in other methods. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.18135 [pdf, other]

Dexterous Grasp Transformer

Authors: Guo-Hao Xu, Yi-Lin Wei, Dian Zheng, Xiao-Ming Wu, Wei-Shi Zheng

Abstract: In this work, we propose a novel discriminative framework for dexterous grasp generation, named Dexterous Grasp TRansformer (DGTR), capable of predicting a diverse set of feasible grasp poses by processing the object point cloud with only one forward pass. We formulate dexterous grasp generation as a set prediction task and design a transformer-based gras** model for it. However, we identify tha… ▽ More In this work, we propose a novel discriminative framework for dexterous grasp generation, named Dexterous Grasp TRansformer (DGTR), capable of predicting a diverse set of feasible grasp poses by processing the object point cloud with only one forward pass. We formulate dexterous grasp generation as a set prediction task and design a transformer-based gras** model for it. However, we identify that this set prediction paradigm encounters several optimization challenges in the field of dexterous gras** and results in restricted performance. To address these issues, we propose progressive strategies for both the training and testing phases. First, the dynamic-static matching training (DSMT) strategy is presented to enhance the optimization stability during the training phase. Second, we introduce the adversarial-balanced test-time adaptation (AB-TTA) with a pair of adversarial losses to improve gras** quality during the testing phase. Experimental results on the DexGraspNet dataset demonstrate the capability of DGTR to predict dexterous grasp poses with both high quality and diversity. Notably, while kee** high quality, the diversity of grasp poses predicted by DGTR significantly outperforms previous works in multiple metrics without any data pre-processing. Codes are available at https://github.com/iSEE-Laboratory/DGTR . △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: Accepted to CVPR 2024

arXiv:2404.18045 [pdf, other]

doi 10.1021/acsanm.4c00914

Blood Works for Graphene Production

Authors: Xiaofan Cai, Ming Li, Chao Chen, Renjun Du, Zi**g Guo, ** Wang, Guodong Ma, Xinglong Wu, Zhiyuan Wang, Yaqing Han, Fuzhuo Lian, **gkuan Xiao, Siqi Jiang, Lei Wang, Alexander S. Mayorov, Libo Gao, Kostya S. Novoselov, Geliang Yu

Abstract: Blood, a ubiquitous and fundamental carbohydrate material composed of plasma, red blood cells, white blood cells, and platelets, has been playing an important role in biology, life science, history, and religious study, while graphene has garnered significant attention due to its exceptional properties and extensive range of potential applications. Achieving environmentally friendly, cost-effectiv… ▽ More Blood, a ubiquitous and fundamental carbohydrate material composed of plasma, red blood cells, white blood cells, and platelets, has been playing an important role in biology, life science, history, and religious study, while graphene has garnered significant attention due to its exceptional properties and extensive range of potential applications. Achieving environmentally friendly, cost-effective growth using hybrid precursors and obtaining high-quality graphene through a straightforward CVD process has been traditionally considered mutually exclusive. This study demonstrates that we can produce high-quality graphene domains with controlled thickness through a one-step growth process at atmospheric pressure using blood as a precursor. Raman spectroscopy confirms the uniformity of the blood-grown graphene films, and observing the half-integer quantum Hall effect in the measured devices highlights its outstanding electronic properties. This unprecedented approach opens possibilities for blood application, facilitating an unconventional route in graphene growth applications. △ Less

Submitted 27 April, 2024; originally announced April 2024.

arXiv:2404.17867 [pdf, other]

Are Watermarks Bugs for Deepfake Detectors? Rethinking Proactive Forensics

Authors: Xiaoshuai Wu, Xin Liao, Bo Ou, Yuling Liu, Zheng Qin

Abstract: AI-generated content has accelerated the topic of media synthesis, particularly Deepfake, which can manipulate our portraits for positive or malicious purposes. Before releasing these threatening face images, one promising forensics solution is the injection of robust watermarks to track their own provenance. However, we argue that current watermarking models, originally devised for genuine images… ▽ More AI-generated content has accelerated the topic of media synthesis, particularly Deepfake, which can manipulate our portraits for positive or malicious purposes. Before releasing these threatening face images, one promising forensics solution is the injection of robust watermarks to track their own provenance. However, we argue that current watermarking models, originally devised for genuine images, may harm the deployed Deepfake detectors when directly applied to forged images, since the watermarks are prone to overlap with the forgery signals used for detection. To bridge this gap, we thus propose AdvMark, on behalf of proactive forensics, to exploit the adversarial vulnerability of passive detectors for good. Specifically, AdvMark serves as a plug-and-play procedure for fine-tuning any robust watermarking into adversarial watermarking, to enhance the forensic detectability of watermarked images; meanwhile, the watermarks can still be extracted for provenance tracking. Extensive experiments demonstrate the effectiveness of the proposed AdvMark, leveraging robust watermarking to fool Deepfake detectors, which can help improve the accuracy of downstream Deepfake detection without tuning the in-the-wild detectors. We believe this work will shed some light on the harmless proactive forensics against Deepfake. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Comments: Accepted by IJCAI 2024

arXiv:2404.17735 [pdf, other]

Causal Diffusion Autoencoders: Toward Counterfactual Generation via Diffusion Probabilistic Models

Authors: Aneesh Komanduri, Chen Zhao, Feng Chen, Xintao Wu

Abstract: Diffusion probabilistic models (DPMs) have become the state-of-the-art in high-quality image generation. However, DPMs have an arbitrary noisy latent space with no interpretable or controllable semantics. Although there has been significant research effort to improve image sample quality, there is little work on representation-controlled generation using diffusion models. Specifically, causal mode… ▽ More Diffusion probabilistic models (DPMs) have become the state-of-the-art in high-quality image generation. However, DPMs have an arbitrary noisy latent space with no interpretable or controllable semantics. Although there has been significant research effort to improve image sample quality, there is little work on representation-controlled generation using diffusion models. Specifically, causal modeling and controllable counterfactual generation using DPMs is an underexplored area. In this work, we propose CausalDiffAE, a diffusion-based causal representation learning framework to enable counterfactual generation according to a specified causal model. Our key idea is to use an encoder to extract high-level semantically meaningful causal variables from high-dimensional data and model stochastic variation using reverse diffusion. We propose a causal encoding mechanism that maps high-dimensional data to causally related latent factors and parameterize the causal mechanisms among latent factors using neural networks. To enforce the disentanglement of causal variables, we formulate a variational objective and leverage auxiliary label information in a prior to regularize the latent space. We propose a DDIM-based counterfactual generation procedure subject to do-interventions. Finally, to address the limited label supervision scenario, we also study the application of CausalDiffAE when a part of the training data is unlabeled, which also enables granular control over the strength of interventions in generating counterfactuals during inference. We empirically show that CausalDiffAE learns a disentangled latent space and is capable of generating high-quality counterfactual images. △ Less

Submitted 8 May, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

Comments: Short version accepted to CVPR 2024 Workshop on Generative Models for Computer Vision

arXiv:2404.17611 [pdf]

MetaSD: A Unified Framework for Scalable Downscaling of Meteorological Variables in Diverse Situations

Authors: **g Hu, Honghu Zhang, Peng Zheng, Jialin Mu, Xiaomeng Huang, Xi Wu

Abstract: Addressing complex meteorological processes at a fine spatial resolution requires substantial computational resources. To accelerate meteorological simulations, researchers have utilized neural networks to downscale meteorological variables from low-resolution simulations. Despite notable advancements, contemporary cutting-edge downscaling algorithms tailored to specific variables. Addressing mete… ▽ More Addressing complex meteorological processes at a fine spatial resolution requires substantial computational resources. To accelerate meteorological simulations, researchers have utilized neural networks to downscale meteorological variables from low-resolution simulations. Despite notable advancements, contemporary cutting-edge downscaling algorithms tailored to specific variables. Addressing meteorological variables in isolation overlooks their interconnectedness, leading to an incomplete understanding of atmospheric dynamics. Additionally, the laborious processes of data collection, annotation, and computational resources required for individual variable downscaling are significant hurdles. Given the limited versatility of existing models across different meteorological variables and their failure to account for inter-variable relationships, this paper proposes a unified downscaling approach leveraging meta-learning. This framework aims to facilitate the downscaling of diverse meteorological variables derived from various numerical models and spatiotemporal scales. Trained at variables consisted of temperature, wind, surface pressure and total precipitation from ERA5 and GFS, the proposed method can be extended to downscale convective precipitation, potential energy, height, humidity and ozone from CFS, S2S and CMIP6 at different spatiotemporal scales, which demonstrating its capability to capture the interconnections among diverse variables. Our approach represents the initial effort to create a generalized downscaling model. Experimental evidence demonstrates that the proposed model outperforms existing top downscaling methods in both quantitative and qualitative assessments. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.17227 [pdf, other]

Trust Dynamics and Market Behavior in Cryptocurrency: A Comparative Study of Centralized and Decentralized Exchanges

Authors: Xintong Wu, Wanling Deng, Yuotng Quan, Luyao Zhang

Abstract: In the evolving landscape of digital finance, the transition from centralized to decentralized trust mechanisms, primarily driven by blockchain technology, plays a critical role in sha** the cryptocurrency ecosystem. This paradigm shift raises questions about the traditional reliance on centralized trust and introduces a novel, decentralized trust framework built upon distributed networks. Our r… ▽ More In the evolving landscape of digital finance, the transition from centralized to decentralized trust mechanisms, primarily driven by blockchain technology, plays a critical role in sha** the cryptocurrency ecosystem. This paradigm shift raises questions about the traditional reliance on centralized trust and introduces a novel, decentralized trust framework built upon distributed networks. Our research delves into the consequences of this shift, particularly focusing on how incidents influence trust within cryptocurrency markets, thereby affecting trade behaviors in centralized (CEXs) and decentralized exchanges (DEXs). We conduct a comprehensive analysis of various events, assessing their effects on market dynamics, including token valuation and trading volumes in both CEXs and DEXs. Our findings highlight the pivotal role of trust in directing user preferences and the fluidity of trust transfer between centralized and decentralized platforms. Despite certain anomalies, the results largely align with our initial hypotheses, revealing the intricate nature of user trust in cryptocurrency markets. This study contributes significantly to interdisciplinary research, bridging distributed systems, behavioral finance, and Decentralized Finance (DeFi). It offers valuable insights for the distributed computing community, particularly in understanding and applying distributed trust mechanisms in digital economies, paving the way for future research that could further explore the socio-economic dimensions and leverage blockchain data in this dynamic domain. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.16687 [pdf, other]

NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Content (AIGC). The challenge is divided into the image track and the video track. The image track uses the AIGIQA-20K, which contains 20,000 AI-Generated Images (AIGIs) generated by 15 popular generative models. The image track has a total of 318 registered participants. A total of 1,646 submissions are received in the development phase, and 221 submissions are received in the test phase. Finally, 16 participating teams submitted their models and fact sheets. The video track uses the T2VQA-DB, which contains 10,000 AI-Generated Videos (AIGVs) generated by 9 popular Text-to-Video (T2V) models. A total of 196 participants have registered in the video track. A total of 991 submissions are received in the development phase, and 185 submissions are received in the test phase. Finally, 12 participating teams submitted their models and fact sheets. Some methods have achieved better results than baseline methods, and the winning methods in both tracks have demonstrated superior prediction performance on AIGC. △ Less

Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.16425 [pdf, other]

Soft X-ray prompt emission from a high-redshift gamma-ray burst EP240315a

Authors: Y. Liu, H. Sun, D. Xu, D. S. Svinkin, J. Delaunay, N. R. Tanvir, H. Gao, C. Zhang, Y. Chen, X. -F. Wu, B. Zhang, W. Yuan, J. An, G. Bruni, D. D. Frederiks, G. Ghirlanda, J. -W. Hu, A. Li, C. -K. Li, J. -D. Li, D. B. Malesani, L. Piro, G. Raman, R. Ricci, E. Troja , et al. (170 additional authors not shown)

Abstract: Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a,… ▽ More Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a, whose bright peak was also detected by the Swift Burst Alert Telescope and Konus-Wind through off-line analyses. At a redshift of $z=4.859$, EP240315a showed a much longer and more complicated light curve in the soft X-ray band than in gamma-rays. Benefiting from a large field-of-view ($\sim$3600 deg$^2$) and a high sensitivity, EP-WXT captured the earlier engine activation and extended late engine activity through a continuous detection. With a peak X-ray flux at the faint end of previously known high-$z$ GRBs, the detection of EP240315a demonstrates the great potential for EP to study the early universe via GRBs. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 41 pages, 8 figures, 7 tables

arXiv:2404.16359 [pdf, other]

An Improved Graph Pooling Network for Skeleton-Based Action Recognition

Authors: Cong Wu, Xiao-Jun Wu, Tianyang Xu, Josef Kittler

Abstract: Pooling is a crucial operation in computer vision, yet the unique structure of skeletons hinders the application of existing pooling strategies to skeleton graph modelling. In this paper, we propose an Improved Graph Pooling Network, referred to as IGPN. The main innovations include: Our method incorporates a region-awareness pooling strategy based on structural partitioning. The correlation matri… ▽ More Pooling is a crucial operation in computer vision, yet the unique structure of skeletons hinders the application of existing pooling strategies to skeleton graph modelling. In this paper, we propose an Improved Graph Pooling Network, referred to as IGPN. The main innovations include: Our method incorporates a region-awareness pooling strategy based on structural partitioning. The correlation matrix of the original feature is used to adaptively adjust the weight of information in different regions of the newly generated features, resulting in more flexible and effective processing. To prevent the irreversible loss of discriminative information, we propose a cross fusion module and an information supplement module to provide block-level and input-level information respectively. As a plug-and-play structure, the proposed operation can be seamlessly combined with existing GCN-based models. We conducted extensive evaluations on several challenging benchmarks, and the experimental results indicate the effectiveness of our proposed solutions. For example, in the cross-subject evaluation of the NTU-RGB+D 60 dataset, IGPN achieves a significant improvement in accuracy compared to the baseline while reducing Flops by nearly 70%; a heavier version has also been introduced to further boost accuracy. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.16134 [pdf, ps, other]

doi 10.1109/SmartGridComm57358.2023.10333943

Power Failure Cascade Prediction using Graph Neural Networks

Authors: Sathwik Chadaga, Xinyu Wu, Eytan Modiano

Abstract: We consider the problem of predicting power failure cascades due to branch failures. We propose a flow-free model based on graph neural networks that predicts grid states at every generation of a cascade process given an initial contingency and power injection values. We train the proposed model using a cascade sequence data pool generated from simulations. We then evaluate our model at various le… ▽ More We consider the problem of predicting power failure cascades due to branch failures. We propose a flow-free model based on graph neural networks that predicts grid states at every generation of a cascade process given an initial contingency and power injection values. We train the proposed model using a cascade sequence data pool generated from simulations. We then evaluate our model at various levels of granularity. We present several error metrics that gauge the model's ability to predict the failure size, the final grid state, and the failure time steps of each branch within the cascade. We benchmark the graph neural network model against influence models. We show that, in addition to being generic over randomly scaled power injection values, the graph neural network model outperforms multiple influence models that are built specifically for their corresponding loading profiles. Finally, we show that the proposed model reduces the computational time by almost two orders of magnitude. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 2023 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm). Oct. 31, 2023. See implementations at https://github.com/sathwikchadaga/failure-cascade

arXiv:2404.15815 [pdf, other]

Single-View Scene Point Cloud Human Grasp Generation

Authors: Yan-Kang Wang, Chengyi Xing, Yi-Lin Wei, Xiao-Ming Wu, Wei-Shi Zheng

Abstract: In this work, we explore a novel task of generating human grasps based on single-view scene point clouds, which more accurately mirrors the typical real-world situation of observing objects from a single viewpoint. Due to the incompleteness of object point clouds and the presence of numerous scene points, the generated hand is prone to penetrating into the invisible parts of the object and the mod… ▽ More In this work, we explore a novel task of generating human grasps based on single-view scene point clouds, which more accurately mirrors the typical real-world situation of observing objects from a single viewpoint. Due to the incompleteness of object point clouds and the presence of numerous scene points, the generated hand is prone to penetrating into the invisible parts of the object and the model is easily affected by scene points. Thus, we introduce S2HGrasp, a framework composed of two key modules: the Global Perception module that globally perceives partial object point clouds, and the DiffuGrasp module designed to generate high-quality human grasps based on complex inputs that include scene points. Additionally, we introduce S2HGD dataset, which comprises approximately 99,000 single-object single-view scene point clouds of 1,668 unique objects, each annotated with one human grasp. Our extensive experiments demonstrate that S2HGrasp can not only generate natural human grasps regardless of scene points, but also effectively prevent penetration between the hand and invisible parts of the object. Moreover, our model showcases strong generalization capability when applied to unseen objects. Our code and dataset are available at https://github.com/iSEE-Laboratory/S2HGrasp. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.15413 [pdf, other]

A Spatially Resolved [CII] Survey of 31 $z\sim7$ Massive Galaxies Hosting Luminous Quasars

Authors: Feige Wang, **yi Yang, Xiaohui Fan, Bram Venemans, Roberto Decarli, Eduardo Bañados, Fabian Walter, Aaron J. Barth, Fuyan Bian, Frederick B. Davies, Anna-Christina Eilers, Emanuele Paolo Farina, Joseph F. Hennawi, Jiang-Tao Li, Chiara Mazzucchelli, Ran Wang, Xue-Bing Wu, Minghao Yue

Abstract: The [CII] 158 $μ$m emission line and the underlying far-infrared (FIR) dust continuum are important tracers for studying star formation and kinematic properties of early galaxies. We present a survey of the [CII] emission lines and FIR continua of 31 luminous quasars at $z>6.5$ using the Atacama Large Millimeter Array (ALMA) and the NOrthern Extended Millimeter Array (NOEMA) at sub-arcsec resoluti… ▽ More The [CII] 158 $μ$m emission line and the underlying far-infrared (FIR) dust continuum are important tracers for studying star formation and kinematic properties of early galaxies. We present a survey of the [CII] emission lines and FIR continua of 31 luminous quasars at $z>6.5$ using the Atacama Large Millimeter Array (ALMA) and the NOrthern Extended Millimeter Array (NOEMA) at sub-arcsec resolution. This survey more than doubles the number of quasars with [CII] and FIR observations at these redshifts and enables statistical studies of quasar host galaxies deep into the epoch of reionization. We detect [CII] emission in 27 quasar hosts with a luminosity range of $L_{\rm [CII]}=(0.3-5.5)\times10^9~L_\odot$ and detect the FIR continuum of 28 quasar hosts with a luminosity range of $L_{\rm FIR}=(0.5-13.0)\times10^{12}~L_\odot$. Both $L_{\rm [CII]}$ and $L_{\rm FIR}$ are correlated ($ρ\simeq0.4$) with the quasar bolometric luminosity, albeit with substantial scatter. The quasar hosts detected by ALMA are clearly resolved with a median diameter of $\sim$5 kpc. About 40% of the quasar host galaxies show a velocity gradient in [CII] emission, while the rest show either dispersion-dominated or disturbed kinematics. Basic estimates of the dynamical masses of the rotation-dominated host galaxies yield $M_{\rm dyn}=(0.1-7.5)\times10^{11}~M_\odot$. Considering our findings alongside those of literature studies, we found that the ratio between $M_{\rm BH}$ and $M_{\rm dyn}$ is about ten times higher than that of local $M_{\rm BH}-M_{\rm dyn}$ relation on average but with substantial scatter (the ratio difference ranging from $\sim$0.6 to 60) and large uncertainties. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: accepted for publication in ApJ

arXiv:2404.15174 [pdf, other]

Fourier-enhanced Implicit Neural Fusion Network for Multispectral and Hyperspectral Image Fusion

Authors: Yu-Jie Liang, Zihan Cao, Liang-Jian Deng, Xiao Wu

Abstract: Recently, implicit neural representations (INR) have made significant strides in various vision-related domains, providing a novel solution for Multispectral and Hyperspectral Image Fusion (MHIF) tasks. However, INR is prone to losing high-frequency information and is confined to the lack of global perceptual capabilities. To address these issues, this paper introduces a Fourier-enhanced Implicit… ▽ More Recently, implicit neural representations (INR) have made significant strides in various vision-related domains, providing a novel solution for Multispectral and Hyperspectral Image Fusion (MHIF) tasks. However, INR is prone to losing high-frequency information and is confined to the lack of global perceptual capabilities. To address these issues, this paper introduces a Fourier-enhanced Implicit Neural Fusion Network (FeINFN) specifically designed for MHIF task, targeting the following phenomena: The Fourier amplitudes of the HR-HSI latent code and LR-HSI are remarkably similar; however, their phases exhibit different patterns. In FeINFN, we innovatively propose a spatial and frequency implicit fusion function (Spa-Fre IFF), hel** INR capture high-frequency information and expanding the receptive field. Besides, a new decoder employing a complex Gabor wavelet activation function, called Spatial-Frequency Interactive Decoder (SFID), is invented to enhance the interaction of INR features. Especially, we further theoretically prove that the Gabor wavelet activation possesses a time-frequency tightness property that favors learning the optimal bandwidths in the decoder. Experiments on two benchmark MHIF datasets verify the state-of-the-art (SOTA) performance of the proposed method, both visually and quantitatively. Also, ablation studies demonstrate the mentioned contributions. The code will be available on Anonymous GitHub (https://anonymous.4open.science/r/FeINFN-15C9/) after possible acceptance. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.15100 [pdf, other]

Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation

Authors: Xun Wu, Shaohan Huang, Furu Wei

Abstract: Recent studies have demonstrated the exceptional potentials of leveraging human preference datasets to refine text-to-image generative models, enhancing the alignment between generated images and textual prompts. Despite these advances, current human preference datasets are either prohibitively expensive to construct or suffer from a lack of diversity in preference dimensions, resulting in limited… ▽ More Recent studies have demonstrated the exceptional potentials of leveraging human preference datasets to refine text-to-image generative models, enhancing the alignment between generated images and textual prompts. Despite these advances, current human preference datasets are either prohibitively expensive to construct or suffer from a lack of diversity in preference dimensions, resulting in limited applicability for instruction tuning in open-source text-to-image generative models and hinder further exploration. To address these challenges and promote the alignment of generative models through instruction tuning, we leverage multimodal large language models to create VisionPrefer, a high-quality and fine-grained preference dataset that captures multiple preference aspects. We aggregate feedback from AI annotators across four aspects: prompt-following, aesthetic, fidelity, and harmlessness to construct VisionPrefer. To validate the effectiveness of VisionPrefer, we train a reward model VP-Score over VisionPrefer to guide the training of text-to-image generative models and the preference prediction accuracy of VP-Score is comparable to human annotators. Furthermore, we use two reinforcement learning methods to supervised fine-tune generative models to evaluate the performance of VisionPrefer, and extensive experimental results demonstrate that VisionPrefer significantly improves text-image alignment in compositional image generation across diverse aspects, e.g., aesthetic, and generalizes better than previous human-preference metrics across various image distributions. Moreover, VisionPrefer indicates that the integration of AI-generated synthetic data as a supervisory signal is a promising avenue for achieving improved alignment with human preferences in vision generative models. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.15045 [pdf, other]

Multi-Head Mixture-of-Experts

Authors: Xun Wu, Shaohan Huang, Wenhui Wang, Furu Wei

Abstract: Sparse Mixtures of Experts (SMoE) scales model capacity without significant increases in training and inference costs, but exhibits the following two issues: (1) Low expert activation, where only a small subset of experts are activated for optimization. (2) Lacking fine-grained analytical capabilities for multiple semantic concepts within individual tokens. We propose Multi-Head Mixture-of-Experts… ▽ More Sparse Mixtures of Experts (SMoE) scales model capacity without significant increases in training and inference costs, but exhibits the following two issues: (1) Low expert activation, where only a small subset of experts are activated for optimization. (2) Lacking fine-grained analytical capabilities for multiple semantic concepts within individual tokens. We propose Multi-Head Mixture-of-Experts (MH-MoE), which employs a multi-head mechanism to split each token into multiple sub-tokens. These sub-tokens are then assigned to and processed by a diverse set of experts in parallel, and seamlessly reintegrated into the original token form. The multi-head mechanism enables the model to collectively attend to information from various representation spaces within different experts, while significantly enhances expert activation, thus deepens context understanding and alleviate overfitting. Moreover, our MH-MoE is straightforward to implement and decouples from other SMoE optimization methods, making it easy to integrate with other SMoE models for enhanced performance. Extensive experimental results across three tasks: English-focused language modeling, Multi-lingual language modeling and Masked multi-modality modeling tasks, demonstrate the effectiveness of MH-MoE. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.15002 [pdf, other]

doi 10.1063/5.0203421

Nanoscale single-electron box with a floating lead for quantum sensing: modelling and device characterization

Authors: Nikolaos Petropoulos, Xutong Wu, Andrii Sokolov, Panagiotis Giounanlis, Imran Bashir, Mike Asker, Dirk Leipold, Andrew K. Mitchell, Robert B. Staszewski, Elena Blokhina

Abstract: We present an in-depth analysis of a single-electron box (SEB) biased through a floating node technique that is common in charge-coupled devices (CCDs). The device is analyzed and characterized in the context of single-electron charge-sensing techniques for integrated silicon quantum dots (QD). The unique aspect of our SEB design is the incorporation of a metallic floating node, strategically empl… ▽ More We present an in-depth analysis of a single-electron box (SEB) biased through a floating node technique that is common in charge-coupled devices (CCDs). The device is analyzed and characterized in the context of single-electron charge-sensing techniques for integrated silicon quantum dots (QD). The unique aspect of our SEB design is the incorporation of a metallic floating node, strategically employed for sensing and precise injection of electrons into an electrostatically formed QD. To analyse the SEB, we propose an extended multi-orbital Anderson impurity model (MOAIM), adapted to our nanoscale SEB system, that is used to predict theoretically the behaviour of the SEB in the context of a charge-sensing application. The validation of the model and the sensing technique has been carried out on a QD fabricated in a fully depleted silicon on insulator (FDSOI) process on a 22-nm technological node. We demonstrate the MOAIM's efficacy in predicting the observed electronic behavior and elucidating the complex electron dynamics and correlations in the SEB. The results of our study reinforce the versatility and precision of the model in the realm of nanoelectronics and highlight the practical utility of the metallic floating node as a mechanism for charge injection and detection in integrated QDs. Finally, we identify the limitations of our model in capturing higher-order effects observed in our measurements and propose future outlooks to reconcile some of these discrepancies. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 7 pages, 3 figures

Journal ref: Appl. Phys. Lett. 124, 173503 (2024)

arXiv:2404.14775 [pdf, ps, other]

Properties of quark stars based on the density-dependent MIT bag model

Authors: Min Ju, Pengcheng Chu, Xuhao Wu, He Liu

Abstract: In this study, we extend the MIT bag model by incorporating the vector interaction among quarks and introducing a density-dependent bag pressure. Then we proceed to investigate the thermodynamic properties of strange quark matter (SQM) and pure up-down quark matter (udQM) in quark stars. The results demonstrate that the vector interaction among quarks and the densitydependent bag pressure have sig… ▽ More In this study, we extend the MIT bag model by incorporating the vector interaction among quarks and introducing a density-dependent bag pressure. Then we proceed to investigate the thermodynamic properties of strange quark matter (SQM) and pure up-down quark matter (udQM) in quark stars. The results demonstrate that the vector interaction among quarks and the densitydependent bag pressure have significant impacts on the equation of state for both SQM and udQM. The inclusion of GV , which represents the strength of vector interactions, results in a stiffening of equation of state while maintaining causality. This allows for the description of massive compact stars such as those observed in GW190814 and PSR J0740+6620 as quark stars. Ultimately, we utilize the vMIT bag model to derive a series of mass-radius relations of quark stars (QSs) which is consistent with the astronomical observations from HESS J1731-347, 4U 1702-429, PSR J0740+6620, GW170817 and GW190814. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.14368 [pdf, other]

Graphic Design with Large Multimodal Model

Authors: Yutao Cheng, Zhao Zhang, Maoke Yang, Hui Nie, Chunyuan Li, Xinglong Wu, Jie Shao

Abstract: In the field of graphic design, automating the integration of design elements into a cohesive multi-layered artwork not only boosts productivity but also paves the way for the democratization of graphic design. One existing practice is Graphic Layout Generation (GLG), which aims to layout sequential design elements. It has been constrained by the necessity for a predefined correct sequence of laye… ▽ More In the field of graphic design, automating the integration of design elements into a cohesive multi-layered artwork not only boosts productivity but also paves the way for the democratization of graphic design. One existing practice is Graphic Layout Generation (GLG), which aims to layout sequential design elements. It has been constrained by the necessity for a predefined correct sequence of layers, thus limiting creative potential and increasing user workload. In this paper, we present Hierarchical Layout Generation (HLG) as a more flexible and pragmatic setup, which creates graphic composition from unordered sets of design elements. To tackle the HLG task, we introduce Graphist, the first layout generation model based on large multimodal models. Graphist efficiently reframes the HLG as a sequence generation problem, utilizing RGB-A images as input, outputs a JSON draft protocol, indicating the coordinates, size, and order of each element. We develop new evaluation metrics for HLG. Graphist outperforms prior arts and establishes a strong baseline for this field. Project homepage: https://github.com/graphic-design-ai/graphist △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.13840 [pdf, other]

Study of $e^+e^-\toωX(3872)$ and $γX(3872)$ from 4.66 to 4.95 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: Using data samples with an integrated luminosity of $4.5~\text{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies ranging from 4.66 to 4.95 GeV, we study the processes of $e^+e^-\toωX(3872)$ and $e^+e^-\toγX(3872)$. With the $e^+e^-\toωX(3872)$ process, the branching fraction ratio $R\equiv\frac{\mathcal{B}(X(3872)\toγJ/ψ)}{\mathcal{B}(X(3872)\toπ^+π^- J/ψ)}$ is measured to be… ▽ More Using data samples with an integrated luminosity of $4.5~\text{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies ranging from 4.66 to 4.95 GeV, we study the processes of $e^+e^-\toωX(3872)$ and $e^+e^-\toγX(3872)$. With the $e^+e^-\toωX(3872)$ process, the branching fraction ratio $R\equiv\frac{\mathcal{B}(X(3872)\toγJ/ψ)}{\mathcal{B}(X(3872)\toπ^+π^- J/ψ)}$ is measured to be $0.38\pm0.20_\text{stat.}\pm0.01_\text{syst.}$ ($R< 0.83$ at 90\% confidence level). In addition, we measure the ratio of the average cross section of $e^+e^-\toωX(3872)$ to $e^+e^-\toωχ_{c1}(ωχ_{c2})$ to be $σ_{ωX(3872)}/σ_{ωχ_{c1}}~(σ_{ωX(3872)}/σ_{ωχ_{c2}})=5.2\pm1.0_\text{stat.}\pm1.9_\text{syst.}~ (5.5\pm1.1_\text{stat.}\pm2.4_\text{syst.})$. Finally, we search for the process of $e^+e^-\toγX(3872)$, and no obvious signal is observed. The upper limit on the ratio of the average cross section of $e^+e^-\toγX(3872)$ to $e^+e^-\toωX(3872)$ is set as $σ_{γX(3872)}/σ_{ωX(3872)}<0.23$ at 90\% confidence level. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 19 pages, 10 figures

arXiv:2404.13628 [pdf, other]

Mixture of LoRA Experts

Authors: Xun Wu, Shaohan Huang, Furu Wei

Abstract: LoRA has gained widespread acceptance in the fine-tuning of large pre-trained models to cater to a diverse array of downstream tasks, showcasing notable effectiveness and efficiency, thereby solidifying its position as one of the most prevalent fine-tuning techniques. Due to the modular nature of LoRA's plug-and-play plugins, researchers have delved into the amalgamation of multiple LoRAs to empow… ▽ More LoRA has gained widespread acceptance in the fine-tuning of large pre-trained models to cater to a diverse array of downstream tasks, showcasing notable effectiveness and efficiency, thereby solidifying its position as one of the most prevalent fine-tuning techniques. Due to the modular nature of LoRA's plug-and-play plugins, researchers have delved into the amalgamation of multiple LoRAs to empower models to excel across various downstream tasks. Nonetheless, extant approaches for LoRA fusion grapple with inherent challenges. Direct arithmetic merging may result in the loss of the original pre-trained model's generative capabilities or the distinct identity of LoRAs, thereby yielding suboptimal outcomes. On the other hand, Reference tuning-based fusion exhibits limitations concerning the requisite flexibility for the effective combination of multiple LoRAs. In response to these challenges, this paper introduces the Mixture of LoRA Experts (MoLE) approach, which harnesses hierarchical control and unfettered branch selection. The MoLE approach not only achieves superior LoRA fusion performance in comparison to direct arithmetic merging but also retains the crucial flexibility for combining LoRAs effectively. Extensive experimental evaluations conducted in both the Natural Language Processing (NLP) and Vision & Language (V&L) domains substantiate the efficacy of MoLE. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 17 pages, 11 figures

arXiv:2404.12739 [pdf, other]

The Solution for the CVPR2024 NICE Image Captioning Challenge

Authors: Longfei Huang, Shupeng Zhong, Xiangyu Wu, Ruoxuan Li

Abstract: This report introduces a solution to the Topic 1 Zero-shot Image Captioning of 2024 NICE : New frontiers for zero-shot Image Captioning Evaluation. In contrast to NICE 2023 datasets, this challenge involves new annotations by humans with significant differences in caption style and content. Therefore, we enhance image captions effectively through retrieval augmentation and caption grading methods.… ▽ More This report introduces a solution to the Topic 1 Zero-shot Image Captioning of 2024 NICE : New frontiers for zero-shot Image Captioning Evaluation. In contrast to NICE 2023 datasets, this challenge involves new annotations by humans with significant differences in caption style and content. Therefore, we enhance image captions effectively through retrieval augmentation and caption grading methods. At the data level, we utilize high-quality captions generated by image caption models as training data to address the gap in text styles. At the model level, we employ OFA (a large-scale visual-language pre-training model based on handcrafted templates) to perform the image captioning task. Subsequently, we propose caption-level strategy for the high-quality caption data generated by the image caption models and integrate them with retrieval augmentation strategy into the template to compel the model to generate higher quality, more matching, and semantically enriched captions based on the retrieval augmentation prompts. Our approach achieves a CIDEr score of 234.11. △ Less

Submitted 29 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.12270 [pdf, other]

doi 10.1093/pasj/psae037

Efficient Identification of Broad Absorption Line Quasars using Dimensionality Reduction and Machine Learning

Authors: Wei-Bo Kao, Yanxia Zhang, Xue-Bing Wu

Abstract: Broad Absorption Line Quasars (BALQSOs) displaying distinct blue-shifted broad absorption lines. These serve as invaluable probes for unraveling the intricate structure and evolution of quasars, shedding light on the profound influence exerted by supermassive black holes on galaxy formation. The proliferation of large-scale spectroscopic surveys such as LAMOST, SDSS, and DESI has exponentially exp… ▽ More Broad Absorption Line Quasars (BALQSOs) displaying distinct blue-shifted broad absorption lines. These serve as invaluable probes for unraveling the intricate structure and evolution of quasars, shedding light on the profound influence exerted by supermassive black holes on galaxy formation. The proliferation of large-scale spectroscopic surveys such as LAMOST, SDSS, and DESI has exponentially expanded the repository of quasar spectra at our disposal. In this study, we present an innovative approach to streamline the identification of BALQSOs, leveraging the power of dimensionality reduction and machine learning algorithms. Our dataset is curated from the SDSS DR16, amalgamating quasar spectra with classification labels sourced from the DR16Q quasar catalog. We employ a diverse array of dimensionality reduction techniques, including Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), Locally Linear Embedding (LLE), and Isometric Map** (ISOMAP), to distill the essence of the original spectral data. The resultant low-dimensional representations serve as inputs for a suite of machine learning classifiers, including XGBoost and Random Forest models. Through experimentation, we unveil PCA as the most effective dimensionality reduction methodology, adeptly navigating the intricate balance between dimensionality reduction and preservation of vital spectral information. Notably, the synergistic fusion of PCA with the XGBoost classifier emerges as the pinnacle of efficacy in the BALQSO classification endeavor, boasting impressive accuracy rates of 97.60% by 10-cross validation and 96.92% on the outer test sample. This study not only introduces a novel machine learning-based paradigm for quasar classification but also offers invaluable insights transferrable to a myriad of spectral classification challenges pervasive in the realm of astronomy. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 17 pages, 6 figures, accepted for publication in PASJ

arXiv:2404.11576 [pdf, other]

State-space Decomposition Model for Video Prediction Considering Long-term Motion Trend

Authors: Fei Cui, Jiaojiao Fang, Xiaojiang Wu, Zelong Lai, Mengke Yang, Menghan Jia, Guizhong Liu

Abstract: Stochastic video prediction enables the consideration of uncertainty in future motion, thereby providing a better reflection of the dynamic nature of the environment. Stochastic video prediction methods based on image auto-regressive recurrent models need to feed their predictions back into the latent space. Conversely, the state-space models, which decouple frame synthesis and temporal prediction… ▽ More Stochastic video prediction enables the consideration of uncertainty in future motion, thereby providing a better reflection of the dynamic nature of the environment. Stochastic video prediction methods based on image auto-regressive recurrent models need to feed their predictions back into the latent space. Conversely, the state-space models, which decouple frame synthesis and temporal prediction, proves to be more efficient. However, inferring long-term temporal information about motion and generalizing to dynamic scenarios under non-stationary assumptions remains an unresolved challenge. In this paper, we propose a state-space decomposition stochastic video prediction model that decomposes the overall video frame generation into deterministic appearance prediction and stochastic motion prediction. Through adaptive decomposition, the model's generalization capability to dynamic scenarios is enhanced. In the context of motion prediction, obtaining a prior on the long-term trend of future motion is crucial. Thus, in the stochastic motion prediction branch, we infer the long-term motion trend from conditional frames to guide the generation of future frames that exhibit high consistency with the conditional frames. Experimental results demonstrate that our model outperforms baselines on multiple datasets. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.11537 [pdf, other]

SSDiff: Spatial-spectral Integrated Diffusion Model for Remote Sensing Pansharpening

Authors: Yu Zhong, Xiao Wu, Liang-Jian Deng, Zihan Cao

Abstract: Pansharpening is a significant image fusion technique that merges the spatial content and spectral characteristics of remote sensing images to generate high-resolution multispectral images. Recently, denoising diffusion probabilistic models have been gradually applied to visual tasks, enhancing controllable image generation through low-rank adaptation (LoRA). In this paper, we introduce a spatial-… ▽ More Pansharpening is a significant image fusion technique that merges the spatial content and spectral characteristics of remote sensing images to generate high-resolution multispectral images. Recently, denoising diffusion probabilistic models have been gradually applied to visual tasks, enhancing controllable image generation through low-rank adaptation (LoRA). In this paper, we introduce a spatial-spectral integrated diffusion model for the remote sensing pansharpening task, called SSDiff, which considers the pansharpening process as the fusion process of spatial and spectral components from the perspective of subspace decomposition. Specifically, SSDiff utilizes spatial and spectral branches to learn spatial details and spectral features separately, then employs a designed alternating projection fusion module (APFM) to accomplish the fusion. Furthermore, we propose a frequency modulation inter-branch module (FMIM) to modulate the frequency distribution between branches. The two components of SSDiff can perform favorably against the APFM when utilizing a LoRA-like branch-wise alternative fine-tuning method. It refines SSDiff to capture component-discriminating features more sufficiently. Finally, extensive experiments on four commonly used datasets, i.e., WorldView-3, WorldView-2, GaoFen-2, and QuickBird, demonstrate the superiority of SSDiff both visually and quantitatively. The code will be made open source after possible acceptance. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.11495 [pdf, other]

H$α$ Reverberation Map** from Broad-band Photometry of Dwarf Seyfert 1 Galaxy NGC 4395

Authors: Huapeng Gu, Xue-Bing Wu, Yuhan Wen, Qinchun Ma, Hengxiao Guo

Abstract: NGC 4395 is a dwarf Seyfert 1 galaxy with a possible intermediate-mass black hole of several $\rm{10^4}$ solar masses in its center. As a well-studied object, its broad line region size has been measured via H$\rmα$ time lag in numerous spectroscopic reverberation map** (SRM) and narrow-band photometric reverberation map** (PRM) campaigns. Here we present its H$\rmα$ time lag measurement using… ▽ More NGC 4395 is a dwarf Seyfert 1 galaxy with a possible intermediate-mass black hole of several $\rm{10^4}$ solar masses in its center. As a well-studied object, its broad line region size has been measured via H$\rmα$ time lag in numerous spectroscopic reverberation map** (SRM) and narrow-band photometric reverberation map** (PRM) campaigns. Here we present its H$\rmα$ time lag measurement using broad-band photometric data, with the application of our newly-developed ICCF-Cut method as well as the JAVELIN and $χ^2$ methods. utilizing the minute-cadence multi-band light curves obtained from the $\rm{2}$m FTN and $\rm{10.4}$m GTC telescopes in recent works, we measured its H$\rmα$ lag as approximately $40 \sim 90$ minutes from broad-band PRM. With the H$\rmα$ emission line velocity dispersion, we calculated its central black hole mass as $\rm M_{\rm BH} = (8\pm4) \times 10^3\, M_{\rm \odot}$. These results are comparable with previous results obtained by narrow-band PRM and SRM, providing further support to an intermediate-mass black hole in NGC 4395. In addition, our study also validates the ICCF-Cut as an effective method for broad-band PRM, which holds the potential for widespread application in the era of large multi-epoch, high-cadence photometric surveys. △ Less

Submitted 17 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

Comments: 11 pages, 9 figures, accepted for publication in MNRAS

arXiv:2404.11416 [pdf, other]

Neural Shrödinger Bridge Matching for Pansharpening

Authors: Zihan Cao, Xiao Wu, Liang-Jian Deng

Abstract: Recent diffusion probabilistic models (DPM) in the field of pansharpening have been gradually gaining attention and have achieved state-of-the-art (SOTA) performance. In this paper, we identify shortcomings in directly applying DPMs to the task of pansharpening as an inverse problem: 1) initiating sampling directly from Gaussian noise neglects the low-resolution multispectral image (LRMS) as a pri… ▽ More Recent diffusion probabilistic models (DPM) in the field of pansharpening have been gradually gaining attention and have achieved state-of-the-art (SOTA) performance. In this paper, we identify shortcomings in directly applying DPMs to the task of pansharpening as an inverse problem: 1) initiating sampling directly from Gaussian noise neglects the low-resolution multispectral image (LRMS) as a prior; 2) low sampling efficiency often necessitates a higher number of sampling steps. We first reformulate pansharpening into the stochastic differential equation (SDE) form of an inverse problem. Building upon this, we propose a Schrödinger bridge matching method that addresses both issues. We design an efficient deep neural network architecture tailored for the proposed SB matching. In comparison to the well-established DL-regressive-based framework and the recent DPM framework, our method demonstrates SOTA performance with fewer sampling steps. Moreover, we discuss the relationship between SB matching and other methods based on SDEs and ordinary differential equations (ODEs), as well as its connection with optimal transport. Code will be available. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.11133 [pdf, ps, other]

Improved analysis of the decay width of $t\to Wb$ up to N$^{3}$LO QCD corrections

Authors: Jiang Yan, Xing-Gang Wu, Hua Zhou, Hong-Tai Li, **g-Hao Shan

Abstract: In this paper, we analyze the top-quark decay $t\to Wb$ up to next-to-next-to-next-to-leading order (N$^{3}$LO) QCD corrections. For the purpose, we first adopt the principle of maximum conformality (PMC) to deal with the initial pQCD series. Then we adopt the Bayesian analysis approach, which quantifies the unknown higher-order terms' contributions in terms of a probability distribution, to estim… ▽ More In this paper, we analyze the top-quark decay $t\to Wb$ up to next-to-next-to-next-to-leading order (N$^{3}$LO) QCD corrections. For the purpose, we first adopt the principle of maximum conformality (PMC) to deal with the initial pQCD series. Then we adopt the Bayesian analysis approach, which quantifies the unknown higher-order terms' contributions in terms of a probability distribution, to estimate the possible magnitude of the uncalculated N$^{4}$LO-terms. In our calculation, an effective strong coupling constant $α_{s}(Q_{*})$ is determined by using all non-conformal $\{β_{i}\}$ terms associated with the renormalization group equation. This leads to a next-to-leading-log PMC scale $Q_{*}^{(\rm NLL)}=10.3048$ GeV, which can be regarded as the correct momentum flow of the process. Consequently, we obtain an improved scale-invariant pQCD prediction for the top-quark decay width, e.g. $Γ_{t}^{\rm tot} = 1.3120 \pm 0.0038$ GeV, whose error is the squared average of the uncertainties from the decay width of $W$-boson $ΔΓ_{W} = \pm 0.042$ GeV, the coupling constant $Δα_{s}(m_{Z}) = \pm 0.0009$, and the predicted N$^{4}$LO-terms. The magnitude of the top-quark pole mass greatly affects the total decay width. By further taking the PDG top-quark pole mass error from cross-section measurements into consideration, e.g. $Δm_{t} = \pm 0.7$ GeV, we obtain $Γ_{t}^{\rm tot} = 1.3120 ^{+0.0194}_{-0.0192}$ GeV. △ Less

Submitted 29 May, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

Comments: 9 pages, 4 figures

arXiv:2404.09814 [pdf, other]

A Novel HARQ-CC Assisted SCMA Scheme

Authors: Man Wang, Zheng Shi, Yunfei Li, Xianda Wu, Weiqiang Tan, Xinrong Ye

Abstract: This letter proposes a novel hybrid automatic repeat request with chase combining assisted sparse code multiple access (HARQ-CC-SCMA) scheme. Depending on whether the same superimposed packet are retransmitted, synchronous and asynchronous modes are considered for retransmissions. Moreover, factor graph aggregation (FGA) and Log-likelihood ratio combination (LLRC) are proposed for multi-user detec… ▽ More This letter proposes a novel hybrid automatic repeat request with chase combining assisted sparse code multiple access (HARQ-CC-SCMA) scheme. Depending on whether the same superimposed packet are retransmitted, synchronous and asynchronous modes are considered for retransmissions. Moreover, factor graph aggregation (FGA) and Log-likelihood ratio combination (LLRC) are proposed for multi-user detection. Regarding FGA, a large-scale factor graph is constructed by combining all the received superimposed signals and message passing algorithm (MAP) is applied to calculate log-likelihood ratio (LLR). Whereas, owing to the same unsuccessful messages required to be retransmitted, LLRC adds up LLRs of erroneously received packets in previous HARQ rounds together with currently received packets for channel decoding and saves the LLRs for failed users. Finally, Monte Carlo simulations are preformed to show that FGA surpasses LLRC and HARQ with incremental redundancy (HARQ-IR) in synchronous mode. However, LLRC performs better than FGA at low signal-to-noise ratio (SNR) in asynchronous mode. This is because failed messages after the maximum allowable HARQ rounds in this mode can yield significant error propagation in low SNR regime. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.09613 [pdf, other]

Efficient and accurate neural field reconstruction using resistive memory

Authors: Yifei Yu, Shaocong Wang, Woyu Zhang, Xinyuan Zhang, Xiuzhe Wu, Yangu He, Jichang Yang, Yue Zhang, Ning Lin, Bo Wang, Xi Chen, Songqi Wang, Xumeng Zhang, Xiaojuan Qi, Zhongrui Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

Abstract: Human beings construct perception of space by integrating sparse observations into massively interconnected synapses and neurons, offering a superior parallelism and efficiency. Replicating this capability in AI finds wide applications in medical imaging, AR/VR, and embodied AI, where input data is often sparse and computing resources are limited. However, traditional signal reconstruction methods… ▽ More Human beings construct perception of space by integrating sparse observations into massively interconnected synapses and neurons, offering a superior parallelism and efficiency. Replicating this capability in AI finds wide applications in medical imaging, AR/VR, and embodied AI, where input data is often sparse and computing resources are limited. However, traditional signal reconstruction methods on digital computers face both software and hardware challenges. On the software front, difficulties arise from storage inefficiencies in conventional explicit signal representation. Hardware obstacles include the von Neumann bottleneck, which limits data transfer between the CPU and memory, and the limitations of CMOS circuits in supporting parallel processing. We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs. Software-wise, we employ neural field to implicitly represent signals via neural networks, which is further compressed using low-rank decomposition and structured pruning. Hardware-wise, we design a resistive memory-based computing-in-memory (CIM) platform, featuring a Gaussian Encoder (GE) and an MLP Processing Engine (PE). The GE harnesses the intrinsic stochasticity of resistive memory for efficient input encoding, while the PE achieves precise weight map** through a Hardware-Aware Quantization (HAQ) circuit. We demonstrate the system's efficacy on a 40nm 256Kb resistive memory-based in-memory computing macro, achieving huge energy efficiency and parallelism improvements without compromising reconstruction quality in tasks like 3D CT sparse reconstruction, novel view synthesis, and novel view synthesis for dynamic scenes. This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.09458 [pdf, other]

CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting

Authors: Xiangrui Liu, Xinju Wu, **** Zhang, Shiqi Wang, Zhu Li, Sam Kwong

Abstract: Gaussian splatting, renowned for its exceptional rendering quality and efficiency, has emerged as a prominent technique in 3D scene representation. However, the substantial data volume of Gaussian splatting impedes its practical utility in real-world applications. Herein, we propose an efficient 3D scene representation, named Compressed Gaussian Splatting (CompGS), which harnesses compact Gaussian… ▽ More Gaussian splatting, renowned for its exceptional rendering quality and efficiency, has emerged as a prominent technique in 3D scene representation. However, the substantial data volume of Gaussian splatting impedes its practical utility in real-world applications. Herein, we propose an efficient 3D scene representation, named Compressed Gaussian Splatting (CompGS), which harnesses compact Gaussian primitives for faithful 3D scene modeling with a remarkably reduced data size. To ensure the compactness of Gaussian primitives, we devise a hybrid primitive structure that captures predictive relationships between each other. Then, we exploit a small set of anchor primitives for prediction, allowing the majority of primitives to be encapsulated into highly compact residual forms. Moreover, we develop a rate-constrained optimization scheme to eliminate redundancies within such hybrid primitives, steering our CompGS towards an optimal trade-off between bitrate consumption and representation efficacy. Experimental results show that the proposed CompGS significantly outperforms existing methods, achieving superior compactness in 3D scene representation without compromising model accuracy and rendering quality. Our code will be released on GitHub for further research. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: Submitted to a conference

arXiv:2404.09452 [pdf, other]

Python-Based Quantum Chemistry Calculations with GPU Acceleration

Authors: Xiaojie Wu, Qiming Sun, Zhichen Pu, Tianze Zheng, Wenzhi Ma, Wen Yan, Xia Yu, Zhengxiao Wu, Mian Huo, Xiang Li, Weiluo Ren, Sheng Gong, Yumin Zhang, Weihao Gao

Abstract: To meet the increasing demand of quantum chemistry calculations in data-driven chemical research, the collaboration between industrial stakeholders and the quantum chemistry community has led to the development of GPU4PySCF, a GPU-accelerated Python package. This open-source project is accessible via its public GitHub repository at \url{https://github.com/pyscf/gpu4pyscf}. This paper outlines the… ▽ More To meet the increasing demand of quantum chemistry calculations in data-driven chemical research, the collaboration between industrial stakeholders and the quantum chemistry community has led to the development of GPU4PySCF, a GPU-accelerated Python package. This open-source project is accessible via its public GitHub repository at \url{https://github.com/pyscf/gpu4pyscf}. This paper outlines the primary features, innovations, and advantages of this package. When performing Density Functional Theory (DFT) calculations on modern GPU platforms, GPU4PySCF delivers 30 times speedup over a 32-core CPU node, resulting in approximately 90% cost savings for most DFT tasks. The performance advantages and productivity improvements have been found in multiple industrial applications, such as generating potential energy surfaces, analyzing molecular properties, calculating solvation free energy, identifying chemical reactions in lithium-ion batteries, and accelerating neural-network methods. To make the package easy to extend and integrate with other Python packages, it is designed with PySCF-compatible interfaces and Pythonic implementations. This design choice enhances its coordination with the Python ecosystem. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 32 pages, 14 figures

arXiv:2404.09432 [pdf, other]

The 8th AI City Challenge

Authors: Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Meenakshi S. Arya, Anuj Sharma, Pranamesh Chakraborty, Sanjita Prajapati, Quan Kong, Norimasa Kobori, Munkhjargal Gochoo, Munkh-Erdene Otgonbold, Fady Alnajjar, Ganzorig Batnasan, **-Yang Chen, Jun-Wei Hsieh, Xunlei Wu, Sameer Satish Pusegaonkar, Yizhou Wang, Sujit Biswas, Rama Chellappa

Abstract: The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities. The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions. Track 1 dealt with multi-target multi-camera (MTMC)… ▽ More The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities. The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions. Track 1 dealt with multi-target multi-camera (MTMC) people tracking, highlighting significant enhancements in camera count, character number, 3D annotation, and camera matrices, alongside new rules for 3D tracking and online tracking algorithm encouragement. Track 2 introduced dense video captioning for traffic safety, focusing on pedestrian accidents using multi-camera feeds to improve insights for insurance and prevention. Track 3 required teams to classify driver actions in a naturalistic driving analysis. Track 4 explored fish-eye camera analytics using the FishEye8K dataset. Track 5 focused on motorcycle helmet rule violation detection. The challenge utilized two leaderboards to showcase methods, with participants setting new benchmarks, some surpassing existing state-of-the-art achievements. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: Summary of the 8th AI City Challenge Workshop in conjunction with CVPR 2024

arXiv:2404.09293 [pdf, other]

A Novel State Space Model with Local Enhancement and State Sharing for Image Fusion

Authors: Zihan Cao, Xiao Wu, Liang-Jian Deng, Yu Zhong

Abstract: In image fusion tasks, images from different sources possess distinct characteristics. This has driven the development of numerous methods to explore better ways of fusing them while preserving their respective characteristics. Mamba, as a state space model, has emerged in the field of natural language processing. Recently, many studies have attempted to extend Mamba to vision tasks. However, due… ▽ More In image fusion tasks, images from different sources possess distinct characteristics. This has driven the development of numerous methods to explore better ways of fusing them while preserving their respective characteristics. Mamba, as a state space model, has emerged in the field of natural language processing. Recently, many studies have attempted to extend Mamba to vision tasks. However, due to the nature of images different from casual language sequences, the limited state capacity of Mamba weakens its ability to model image information. Additionally, the sequence modeling ability of Mamba is only capable of spatial information and cannot effectively capture the rich spectral information in images. Motivated by these challenges, we customize and improve the vision Mamba network designed for the image fusion task. Specifically, we propose the local-enhanced vision Mamba block, dubbed as LEVM. The LEVM block can improve local information perception of the network and simultaneously learn local and global spatial information. Furthermore, we propose the state sharing technique to enhance spatial details and integrate spatial and spectral information. Finally, the overall network is a multi-scale structure based on vision Mamba, called LE-Mamba. Extensive experiments show the proposed methods achieve state-of-the-art results on multispectral pansharpening and multispectral and hyperspectral image fusion datasets, and demonstrate the effectiveness of the proposed approach. Code will be made available. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.09219 [pdf, ps, other]

Observation of $D \to a_{0}(980)π$ in the decays $D^{0} \rightarrow π^{+}π^{-}η$ and $D^{+} \rightarrow π^{+}π^{0}η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: We report the first amplitude analysis of the decays $D^{0} \to π^{+} π^{-} η$ and $D^{+} \rightarrow π^{+}π^{0}η$ using a data sample taken with the BESIII detector at the center-of-mass energy of 3.773 GeV, corresponding to an integrated luminosity of 7.9 ${\rm fb}^{-1}$. The contribution from the process $D^{0(+)} \to a_{0}(980)^{+} π^{-(0)}$ is significantly larger than the… ▽ More We report the first amplitude analysis of the decays $D^{0} \to π^{+} π^{-} η$ and $D^{+} \rightarrow π^{+}π^{0}η$ using a data sample taken with the BESIII detector at the center-of-mass energy of 3.773 GeV, corresponding to an integrated luminosity of 7.9 ${\rm fb}^{-1}$. The contribution from the process $D^{0(+)} \to a_{0}(980)^{+} π^{-(0)}$ is significantly larger than the $D^{0(+)} \to a_{0}(980)^{-(0)} π^{+}$ contribution. The ratios $\mathcal{B}(D^{0} \rightarrow a_{0}(980)^{+}π^{-})/\mathcal{B}(D^{0} \rightarrow a_{0}(980)^{-}π^{+})$ and $\mathcal{B}(D^{+} \rightarrow a_{0}(980)^{+}π^{0})/\mathcal{B}(D^{+} \rightarrow a_{0}(980)^{0}π^{+})$ are measured to be $7.5^{+2.5}_{-0.8\,\mathrm{stat.}}\pm1.7_{\mathrm{syst.}}$ and $2.6\pm0.6_{\mathrm{stat.}}\pm0.3_{\mathrm{syst.}}$, respectively. The measured $D^{0}$ ratio disagrees with the theoretical predictions by orders of magnitudes, thus implying a substantial contribution from final-state interactions. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.07777 [pdf, ps, other]

doi 10.1140/epjc/s10052-024-12887-3

Improved analysis of double $J/ψ$ production in $Z$-boson decay

Authors: Guang-Yu Wang, Xing-Gang Wu, Xu-Chang Zheng, Jiang Yan, Jia-Wei Zhang

Abstract: In this paper, we present an improved calculation for the decay rate of the rare $Z$-boson decay into $J/ψ+ J/ψ$. This decay is dominated by the photon fragmentation mechanism, i.e., the transition $Z\to J/ψ+ γ^{*}$ followed by the fragmentation $γ^{*}\to J/ψ$. In our calculation, the amplitude of $γ^{*}\to J/ψ$ is extracted from the measured value of $Γ(J/ψ\to e^+ e^-)$, and the amplitude of… ▽ More In this paper, we present an improved calculation for the decay rate of the rare $Z$-boson decay into $J/ψ+ J/ψ$. This decay is dominated by the photon fragmentation mechanism, i.e., the transition $Z\to J/ψ+ γ^{*}$ followed by the fragmentation $γ^{*}\to J/ψ$. In our calculation, the amplitude of $γ^{*}\to J/ψ$ is extracted from the measured value of $Γ(J/ψ\to e^+ e^-)$, and the amplitude of $Z\to J/ψ+ γ^{*}$ is calculate through the light-cone approach. The higher-order QCD and relativistic corrections in the amplitude of $γ^{*}\to J/ψ$ and the large logarithms of $m_{_Z}^2/m_c^2$ that appear in the amplitude of $Z\to J/ψ+ γ^{*}$ are resummed in our calculation. Besides, the non-fragmentation amplitude is calculated based on the NRQCD factorization, and the next-to-leading order QCD and relativistic corrections are included. The obtained branching fraction for this $Z$ decay channel is $8.66 ^{+1.48} _{-0.69}\times 10^{-11}$. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 10 pages, 2 figures

Journal ref: Eur. Phys. J. C 84, 544 (2024)

arXiv:2404.07773 [pdf, other]

ConsistencyDet: A Robust Object Detector with a Denoising Paradigm of Consistency Model

Authors: Lifan Jiang, Zhihui Wang, Changmiao Wang, Ming Li, Jiaxu Leng, Xindong Wu

Abstract: Object detection, a quintessential task in the realm of perceptual computing, can be tackled using a generative methodology. In the present study, we introduce a novel framework designed to articulate object detection as a denoising diffusion process, which operates on the perturbed bounding boxes of annotated entities. This framework, termed ConsistencyDet, leverages an innovative denoising conce… ▽ More Object detection, a quintessential task in the realm of perceptual computing, can be tackled using a generative methodology. In the present study, we introduce a novel framework designed to articulate object detection as a denoising diffusion process, which operates on the perturbed bounding boxes of annotated entities. This framework, termed ConsistencyDet, leverages an innovative denoising concept known as the Consistency Model. The hallmark of this model is its self-consistency feature, which empowers the model to map distorted information from any temporal stage back to its pristine state, thereby realizing a "one-step denoising" mechanism. Such an attribute markedly elevates the operational efficiency of the model, setting it apart from the conventional Diffusion Model. Throughout the training phase, ConsistencyDet initiates the diffusion sequence with noise-infused boxes derived from the ground-truth annotations and conditions the model to perform the denoising task. Subsequently, in the inference stage, the model employs a denoising sampling strategy that commences with bounding boxes randomly sampled from a normal distribution. Through iterative refinement, the model transforms an assortment of arbitrarily generated boxes into definitive detections. Comprehensive evaluations employing standard benchmarks, such as MS-COCO and LVIS, corroborate that ConsistencyDet surpasses other leading-edge detectors in performance metrics. Our code is available at https://github.com/Tankowa/ConsistencyDet. △ Less

Submitted 14 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.07707 [pdf, ps, other]

Tree Splitting Based Rounding Scheme for Weighted Proportional Allocations with Subsidy

Authors: Xiaowei Wu, Shengwei Zhou

Abstract: We consider the problem of allocating $m$ indivisible items to a set of $n$ heterogeneous agents, aiming at computing a proportional allocation by introducing subsidy (money). It has been shown by Wu et al. (WINE 2023) that when agents are unweighted a total subsidy of $n/4$ suffices (assuming that each item has value/cost at most $1$ to every agent) to ensure proportionality. When agents have gen… ▽ More We consider the problem of allocating $m$ indivisible items to a set of $n$ heterogeneous agents, aiming at computing a proportional allocation by introducing subsidy (money). It has been shown by Wu et al. (WINE 2023) that when agents are unweighted a total subsidy of $n/4$ suffices (assuming that each item has value/cost at most $1$ to every agent) to ensure proportionality. When agents have general weights, they proposed an algorithm that guarantees a weighted proportional allocation requiring a total subsidy of $(n-1)/2$, by rounding the fractional bid-and-take algorithm. In this work, we revisit the problem and the fractional bid-and-take algorithm. We show that by formulating the fractional allocation returned by the algorithm as a directed tree connecting the agents and splitting the tree into canonical components, there is a rounding scheme that requires a total subsidy of at most $n/3 - 1/6$. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 30 pages, 11 figures

arXiv:2404.07609 [pdf, other]

Achieving violation-free distributed optimization under coupling constraints

Authors: Changxin Liu, Xiao Tan, Xuyang Wu, Dimos V. Dimarogonas, Karl H. Johansson

Abstract: Constraint satisfaction is a critical component in a wide range of engineering applications, including but not limited to safe multi-agent control and economic dispatch in power systems. This study explores violation-free distributed optimization techniques for problems characterized by separable objective functions and coupling constraints. First, we incorporate auxiliary decision variables toget… ▽ More Constraint satisfaction is a critical component in a wide range of engineering applications, including but not limited to safe multi-agent control and economic dispatch in power systems. This study explores violation-free distributed optimization techniques for problems characterized by separable objective functions and coupling constraints. First, we incorporate auxiliary decision variables together with a network-dependent linear map** to each coupling constraint. For the reformulated problem, we show that the projection of its feasible set onto the space of primal variables is identical to that of the original problem, which is the key to achieving all-time constraint satisfaction. Upon treating the reformulated problem as a min-min optimization problem with respect to auxiliary and primal variables, we demonstrate that the gradients in the outer minimization problem have a locally computable closed-form. Then, two violation-free distributed optimization algorithms are developed and their convergence under reasonable assumptions is analyzed. Finally, the proposed algorithm is applied to implement a control barrier function based controller in a distributed manner, and the results verify its effectiveness. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 13 pages, 6 figures

arXiv:2404.07543 [pdf, other]

Content-Adaptive Non-Local Convolution for Remote Sensing Pansharpening

Authors: Yule Duan, Xiao Wu, Haoyu Deng, Liang-Jian Deng

Abstract: Currently, machine learning-based methods for remote sensing pansharpening have progressed rapidly. However, existing pansharpening methods often do not fully exploit differentiating regional information in non-local spaces, thereby limiting the effectiveness of the methods and resulting in redundant learning parameters. In this paper, we introduce a so-called content-adaptive non-local convolutio… ▽ More Currently, machine learning-based methods for remote sensing pansharpening have progressed rapidly. However, existing pansharpening methods often do not fully exploit differentiating regional information in non-local spaces, thereby limiting the effectiveness of the methods and resulting in redundant learning parameters. In this paper, we introduce a so-called content-adaptive non-local convolution (CANConv), a novel method tailored for remote sensing image pansharpening. Specifically, CANConv employs adaptive convolution, ensuring spatial adaptability, and incorporates non-local self-similarity through the similarity relationship partition (SRP) and the partition-wise adaptive convolution (PWAC) sub-modules. Furthermore, we also propose a corresponding network architecture, called CANNet, which mainly utilizes the multi-scale self-similarity. Extensive experiments demonstrate the superior performance of CANConv, compared with recent promising fusion methods. Besides, we substantiate the method's effectiveness through visualization, ablation experiments, and comparison with existing methods on multiple test sets. The source code is publicly available at https://github.com/duanyll/CANConv. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR 2024

arXiv:2404.07436 [pdf, other]

Measurement of $e^{+}e^{-}\to ωη^{\prime}$ cross sections at $\sqrt{s}=$ 2.000 to 3.080 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (599 additional authors not shown)

Abstract: The Born cross sections for the process $e^{+}e^{-}\to ωη^{\prime}$ are measured at 22 center-of-mass energies from 2.000 to 3.080 GeV using data collected with the BESIII detector at the BEPCII collider. A resonant structure is observed with a statistical significance of 9.6$σ$. A Breit-Wigner fit determines its mass to be $M_R=(2153\pm30\pm31)~{\rm{MeV}}/c^{2}$ and its width to be… ▽ More The Born cross sections for the process $e^{+}e^{-}\to ωη^{\prime}$ are measured at 22 center-of-mass energies from 2.000 to 3.080 GeV using data collected with the BESIII detector at the BEPCII collider. A resonant structure is observed with a statistical significance of 9.6$σ$. A Breit-Wigner fit determines its mass to be $M_R=(2153\pm30\pm31)~{\rm{MeV}}/c^{2}$ and its width to be $Γ_{R}=(167\pm77\pm7)~\rm{MeV}$, where the first uncertainties are statistical and the second are systematic. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.07415 [pdf, other]

doi 10.1109/TPWRS.2024.3393866

Grou** of $N-1$ Contingencies for Controller Synthesis: A Study for Power Line Failures

Authors: Neelay Junnarkar, Emily Jensen, Xiaofan Wu, Suat Gumussoy, Murat Arcak

Abstract: The problem of maintaining power system stability and performance after the failure of any single line in a power system (an "N-1 contingency") is investigated. Due to the large number of possible N-1 contingencies for a power network, it is impractical to optimize controller parameters for each possible contingency a priori. A method to partition a set of contingencies into groups of contingencie… ▽ More The problem of maintaining power system stability and performance after the failure of any single line in a power system (an "N-1 contingency") is investigated. Due to the large number of possible N-1 contingencies for a power network, it is impractical to optimize controller parameters for each possible contingency a priori. A method to partition a set of contingencies into groups of contingencies that are similar to each other from a control perspective is presented. Design of a single controller for each group, rather than for each contingency, provides a computationally tractable method for maintaining stability and performance after element failures. The choice of number of groups tunes a trade-off between computation time and controller performance for a given set of contingencies. Results are simulated on the IEEE 39-bus and 68-bus systems, illustrating that, with controllers designed for a relatively small number of groups, power system stability may be significantly improved after an N-1 contingency compared to continued use of the nominal controller. Furthermore, performance is comparable to that of controllers designed for each contingency individually. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: Submitted to the journal IEEE Transactions on Power Systems, 12 pages, 11 figures, 1 table

arXiv:2404.07181 [pdf, other]

BAMBOO: a predictive and transferable machine learning force field framework for liquid electrolyte development

Authors: Sheng Gong, Yumin Zhang, Zhenliang Mu, Zhichen Pu, Hongyi Wang, Zhiao Yu, Mengyi Chen, Tianze Zheng, Zhi Wang, Lifei Chen, Xiaojie Wu, Shaochen Shi, Weihao Gao, Wen Yan, Liang Xiang

Abstract: Despite the widespread applications of machine learning force field (MLFF) on solids and small molecules, there is a notable gap in applying MLFF to complex liquid electrolytes. In this work, we introduce BAMBOO (ByteDance AI Molecular Simulation Booster), a novel framework for molecular dynamics (MD) simulations, with a demonstration of its capabilities in the context of liquid electrolytes for l… ▽ More Despite the widespread applications of machine learning force field (MLFF) on solids and small molecules, there is a notable gap in applying MLFF to complex liquid electrolytes. In this work, we introduce BAMBOO (ByteDance AI Molecular Simulation Booster), a novel framework for molecular dynamics (MD) simulations, with a demonstration of its capabilities in the context of liquid electrolytes for lithium batteries. We design a physics-inspired graph equivariant transformer architecture as the backbone of BAMBOO to learn from quantum mechanical simulations. Additionally, we pioneer an ensemble knowledge distillation approach and apply it on MLFFs to improve the stability of MD simulations. Finally, we propose the density alignment algorithm to align BAMBOO with experimental measurements. BAMBOO demonstrates state-of-the-art accuracy in predicting key electrolyte properties such as density, viscosity, and ionic conductivity across various solvents and salt combinations. Our current model, trained on more than 15 chemical species, achieves the average density error of 0.01 g/cm$^3$ on various compositions compared with experimental data. Moreover, our model demonstrates transferability to molecules not included in the quantum mechanical dataset. We envision this work as paving the way to a "universal MLFF" capable of simulating properties of common organic liquids. △ Less

Submitted 22 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

Showing 201–250 of 4,623 results for author: Wu, X