Search | arXiv e-print repository

arXiv:2404.19055 [pdf, other]

Plan of Thoughts: Heuristic-Guided Problem Solving with Large Language Models

Abstract: While language models (LMs) offer significant capability in zero-shot reasoning tasks across a wide range of domains, they do not perform satisfactorily in problems which requires multi-step reasoning. Previous approaches to mitigate this involves breaking a larger, multi-step task into sub-tasks and asking the language model to generate proposals ("thoughts") for each sub-task and using exhaustiv… ▽ More While language models (LMs) offer significant capability in zero-shot reasoning tasks across a wide range of domains, they do not perform satisfactorily in problems which requires multi-step reasoning. Previous approaches to mitigate this involves breaking a larger, multi-step task into sub-tasks and asking the language model to generate proposals ("thoughts") for each sub-task and using exhaustive planning approaches such as DFS to compose a solution. In this work, we leverage this idea to introduce two new contributions: first, we formalize a planning-based approach to perform multi-step problem solving with LMs via Partially Observable Markov Decision Processes (POMDPs), with the LM's own reflections about the value of a state used as a search heuristic; second, leveraging the online POMDP solver POMCP, we demonstrate a superior success rate of 89.4% on the Game of 24 task as compared to existing approaches while also offering better anytime performance characteristics than fixed tree-search which is used previously. Taken together, these contributions allow modern LMs to decompose and solve larger-scale reasoning tasks more effectively. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 7 pages, 2 figures

arXiv:2404.18415 [pdf, other]

Photo-dynamical Analysis of Circumbinary Multi-planet system TOI-1338: a Fully Coplanar Configuration with a Puffy Planet

Authors: Mu-Tian Wang, Hui-Gen Liu

Abstract: TOI-1338 is the first circumbinary planet system discovered by TESS. It has one transiting planet at P$\sim$95 day and an outer non-transiting planet at P$\sim$215 day complemented by RV observation. Here we present a global photo-dynamical modeling of the TOI-1338 system that self-consistently accounts for the mutual gravitational interactions between all known bodies in the system. As a result,… ▽ More TOI-1338 is the first circumbinary planet system discovered by TESS. It has one transiting planet at P$\sim$95 day and an outer non-transiting planet at P$\sim$215 day complemented by RV observation. Here we present a global photo-dynamical modeling of the TOI-1338 system that self-consistently accounts for the mutual gravitational interactions between all known bodies in the system. As a result, the three-dimensional architecture of the system can be established by comparing the model with additional data from TESS Extended Mission and published HARPS/ESPRESSO radial velocity data. We report an inconsistency of binary RV signal between HARPS and ESPRESSO, which could be due to the contamination of the secondary star. According to stability analysis, the RV data via ESPRESSO is preferred. Our results are summarized as follows: (1) the inner transiting planet is extremely coplanar to the binary plane $ΔI_b \sim 0.12 ^\circ$, making it a permanently transiting circumbinary planet at any nodal precession phases. We updated the future transit ephemerides with improved precisions. (2) The outer planet, despite its non-transiting nature, is also coplanar with the binary plane by $ΔI_c=9.1^{+6.0 \circ}_{-4.8}$ (22$^\circ$ for 99\% upper limit). (3) The inner planet could have a density of $0.137 \pm 0.026$ g/cm$^{-3}$. With a TESS magnitude of 11.45, TOI-1338 b is an optimal circumbinary planet for ground-based follow-up and transit spectroscopy. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 13 Figures, 3 Tables; Accepted by AJ

arXiv:2404.18412 [pdf]

Uncovering an Interfacial Band Resulting from Orbital Hybridization in Nickelate Heterostructures

Authors: Mingyao Chen, Huimin Liu, Xu He, Minjuan Li, Chi Sin Tang, Mengxia Sun, Krishna Prasad Koirala, Mark E. Bowden, Yangyang Li, Xiongfang Liu, Difan Zhou, Shuo Sun, Mark B. H. Breese, Chuanbing Cai, Yingge Du, Andrew T. S. Wee, Le Wang, Xinmao Yin

Abstract: The interaction of atomic orbitals at the interface of perovskite oxide heterostructures has been investigated for its profound impact on the band structures and electronic properties, giving rise to unique electronic states and a variety of tunable functionalities. In this study, we conducted an extensive investigation of the optical and electronic properties of epitaxial NdNiO3 thin films grown… ▽ More The interaction of atomic orbitals at the interface of perovskite oxide heterostructures has been investigated for its profound impact on the band structures and electronic properties, giving rise to unique electronic states and a variety of tunable functionalities. In this study, we conducted an extensive investigation of the optical and electronic properties of epitaxial NdNiO3 thin films grown on a series of single crystal substrates. Unlike films synthesized on other substrates, NdNiO3 on SrTiO3 (NNO/STO) gives rise to a unique band structure which features an additional unoccupied band situated above the Fermi level. Our comprehensive investigation, which incorporated a wide array of experimental techniques and density functional theory calculations, revealed that the emergence of the interfacial band structure is primarily driven by the orbital hybridization between Ti 3d orbitals of the STO substrate and O 2p orbitals of the NNO thin film. Furthermore, exciton peaks have been detected in the optical spectra of the NNO/STO film, attributable to the pronounced electron-electron (e-e) and electron-hole (e-h) interactions propagating from the STO substrate into the NNO film. These findings underscore the substantial influence of interfacial orbital hybridization on the electronic structure of oxide thin-films, thereby offering key insights into tuning their interfacial properties. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 26 pages,4 figures

arXiv:2404.18411 [pdf, other]

Multi-modal Perception Dataset of In-water Objects for Autonomous Surface Vehicles

Authors: Mingi Jeong, Arihant Chadda, Ziang Ren, Luyang Zhao, Haowen Liu, Monika Roznere, Aiwei Zhang, Yitao Jiang, Sabriel Achong, Samuel Lensgraf, Alberto Quattrini Li

Abstract: This paper introduces the first publicly accessible multi-modal perception dataset for autonomous maritime navigation, focusing on in-water obstacles within the aquatic environment to enhance situational awareness for Autonomous Surface Vehicles (ASVs). This dataset, consisting of diverse objects encountered under varying environmental conditions, aims to bridge the research gap in marine robotics… ▽ More This paper introduces the first publicly accessible multi-modal perception dataset for autonomous maritime navigation, focusing on in-water obstacles within the aquatic environment to enhance situational awareness for Autonomous Surface Vehicles (ASVs). This dataset, consisting of diverse objects encountered under varying environmental conditions, aims to bridge the research gap in marine robotics by providing a multi-modal, annotated, and ego-centric perception dataset, for object detection and classification. We also show the applicability of the proposed dataset's framework using deep learning-based open-source perception algorithms that have shown success. We expect that our dataset will contribute to development of the marine autonomy pipeline and marine (field) robotics. Please note this is a work-in-progress paper about our on-going research that we plan to release in full via future publication. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: Accepted to the IEEE ICRA Workshop on Field Robotics 2024

arXiv:2404.18344 [pdf, ps, other]

Some Computational Results on Koszul-Vinberg Cochain Complexes

Authors: Hanwen Liu, Jun Zhang

Abstract: An affine connection is said to be flat if its curvature tensor vanishes identically. Koszul-Vinberg (KV for abbreviation) cohomology has been invoked to study the deformation theory of flat and torsion-free affine connections on tangent bundle. In this Note, we compute explicitly the differentials of various specific KV cochains, and study their relation to classical objects in information geomet… ▽ More An affine connection is said to be flat if its curvature tensor vanishes identically. Koszul-Vinberg (KV for abbreviation) cohomology has been invoked to study the deformation theory of flat and torsion-free affine connections on tangent bundle. In this Note, we compute explicitly the differentials of various specific KV cochains, and study their relation to classical objects in information geometry, including deformations associated with projective and dual-projective transformations of a flat and torsion-free affine connection. As an application, we also give a simple yet non-trivial example of a KV algebra of which second cohomology group does not vanish. △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: 9 pages, 0 figue

arXiv:2404.18304 [pdf, other]

Retrieval-Oriented Knowledge for Click-Through Rate Prediction

Authors: Huanshuo Liu, Bo Chen, Menghui Zhu, Jianghao Lin, Jiarui Qin, Yang Yang, Hao Zhang, Ruiming Tang

Abstract: Click-through rate (CTR) prediction plays an important role in personalized recommendations. Recently, sample-level retrieval-based models (e.g., RIM) have achieved remarkable performance by retrieving and aggregating relevant samples. However, their inefficiency at the inference stage makes them impractical for industrial applications. To overcome this issue, this paper proposes a universal plug-… ▽ More Click-through rate (CTR) prediction plays an important role in personalized recommendations. Recently, sample-level retrieval-based models (e.g., RIM) have achieved remarkable performance by retrieving and aggregating relevant samples. However, their inefficiency at the inference stage makes them impractical for industrial applications. To overcome this issue, this paper proposes a universal plug-and-play Retrieval-Oriented Knowledge (ROK) framework. Specifically, a knowledge base, consisting of a retrieval-oriented embedding layer and a knowledge encoder, is designed to preserve and imitate the retrieved & aggregated representations in a decomposition-reconstruction paradigm. Knowledge distillation and contrastive learning methods are utilized to optimize the knowledge base, and the learned retrieval-enhanced representations can be integrated with arbitrary CTR models in both instance-wise and feature-wise manners. Extensive experiments on three large-scale datasets show that ROK achieves competitive performance with the retrieval-based CTR models while reserving superior inference efficiency and model compatibility. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.18255 [pdf, other]

PatentGPT: A Large Language Model for Intellectual Property

Authors: Zilong Bai, Ruiji Zhang, Linqing Chen, Qijun Cai, Yuan Zhong, Cong Wang, Yan Fang, Jie Fang, **g Sun, Weikuan Wang, Lizhi Zhou, Haoran Hua, Tian Qiu, Chaochao Wang, Cheng Sun, Jian** Lu, Yixin Wang, Yubin Xia, Meng Hu, Haowen Liu, Peng Xu, Licong Xu, Fu Bian, Xiaolong Gu, Lisha Zhang , et al. (2 additional authors not shown)

Abstract: In recent years, large language models(LLMs) have attracted significant attention due to their exceptional performance across a multitude of natural language process tasks, and have been widely applied in various fields. However, the application of large language models in the Intellectual Property (IP) domain is challenging due to the strong need for specialized knowledge, privacy protection, pro… ▽ More In recent years, large language models(LLMs) have attracted significant attention due to their exceptional performance across a multitude of natural language process tasks, and have been widely applied in various fields. However, the application of large language models in the Intellectual Property (IP) domain is challenging due to the strong need for specialized knowledge, privacy protection, processing of extremely long text in this field. In this technical report, we present for the first time a low-cost, standardized procedure for training IP-oriented LLMs, meeting the unique requirements of the IP domain. Using this standard process, we have trained the PatentGPT series models based on open-source pretrained models. By evaluating them on the open-source IP-oriented benchmark MOZIP, our domain-specific LLMs outperforms GPT-4, indicating the effectiveness of the proposed training procedure and the expertise of the PatentGPT models in the IP domain. Remarkably, our model surpassed GPT-4 on the 2019 China Patent Agent Qualification Examination, scoring 65 and matching human expert levels. Additionally, the PatentGPT model, which utilizes the SMoE architecture, achieves performance comparable to that of GPT-4 in the IP domain and demonstrates a better cost-performance ratio on long-text tasks, potentially serving as an alternative to GPT-4 within the IP domain. △ Less

Submitted 4 June, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

Comments: 19 pages, 9 figures

ACM Class: I.2.7

arXiv:2404.18225 [pdf, other]

Quadruped robot traversing 3D complex environments with limited perception

Authors: Yi Cheng, Hang Liu, Guo** Pan, Linqi Ye, Houde Liu, Bin Liang

Abstract: Traversing 3-D complex environments has always been a significant challenge for legged locomotion. Existing methods typically rely on external sensors such as vision and lidar to preemptively react to obstacles by acquiring environmental information. However, in scenarios like nighttime or dense forests, external sensors often fail to function properly, necessitating robots to rely on propriocepti… ▽ More Traversing 3-D complex environments has always been a significant challenge for legged locomotion. Existing methods typically rely on external sensors such as vision and lidar to preemptively react to obstacles by acquiring environmental information. However, in scenarios like nighttime or dense forests, external sensors often fail to function properly, necessitating robots to rely on proprioceptive sensors to perceive diverse obstacles in the environment and respond promptly. This task is undeniably challenging. Our research finds that methods based on collision detection can enhance a robot's perception of environmental obstacles. In this work, we propose an end-to-end learning-based quadruped robot motion controller that relies solely on proprioceptive sensing. This controller can accurately detect, localize, and agilely respond to collisions in unknown and complex 3D environments, thereby improving the robot's traversability in complex environments. We demonstrate in both simulation and real-world experiments that our method enables quadruped robots to successfully traverse challenging obstacles in various complex environments. △ Less

Submitted 29 April, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

Comments: 10 pages, 8 figures,submitted to iros2024

arXiv:2404.18201 [pdf, other]

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey

Authors: Dingzhe Li, Yixiang **, Yong A, Hongze Yu, Jun Shi, Xiaoshuai Hao, Peng Hao, Hua** Liu, Fuchun Sun, Bin Fang

Abstract: The realization of universal robots is an ultimate goal of researchers. However, a key hurdle in achieving this goal lies in the robots' ability to manipulate objects in their unstructured surrounding environments according to different tasks. The learning-based approach is considered an effective way to address generalization. The impressive performance of foundation models in the fields of compu… ▽ More The realization of universal robots is an ultimate goal of researchers. However, a key hurdle in achieving this goal lies in the robots' ability to manipulate objects in their unstructured surrounding environments according to different tasks. The learning-based approach is considered an effective way to address generalization. The impressive performance of foundation models in the fields of computer vision and natural language suggests the potential of embedding foundation models into manipulation tasks as a viable path toward achieving general manipulation capability. However, we believe achieving general manipulation capability requires an overarching framework akin to auto driving. This framework should encompass multiple functional modules, with different foundation models assuming distinct roles in facilitating general manipulation capability. This survey focuses on the contributions of foundation models to robot learning for manipulation. We propose a comprehensive framework and detail how foundation models can address challenges in each module of the framework. What's more, we examine current approaches, outline challenges, suggest future research directions, and identify potential risks associated with integrating foundation models into this domain. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.18186 [pdf, other]

doi 10.1145/3660772

Static Application Security Testing (SAST) Tools for Smart Contracts: How Far Are We?

Authors: Kaixuan Li, Yue Xue, Sen Chen, Han Liu, Kairan Sun, Ming Hu, Haijun Wang, Yang Liu, Yixiang Chen

Abstract: In recent years, the importance of smart contract security has been heightened by the increasing number of attacks against them. To address this issue, a multitude of static application security testing (SAST) tools have been proposed for detecting vulnerabilities in smart contracts. However, objectively comparing these tools to determine their effectiveness remains challenging. Existing studies o… ▽ More In recent years, the importance of smart contract security has been heightened by the increasing number of attacks against them. To address this issue, a multitude of static application security testing (SAST) tools have been proposed for detecting vulnerabilities in smart contracts. However, objectively comparing these tools to determine their effectiveness remains challenging. Existing studies often fall short due to the taxonomies and benchmarks only covering a coarse and potentially outdated set of vulnerability types, which leads to evaluations that are not entirely comprehensive and may display bias. In this paper, we fill this gap by proposing an up-to-date and fine-grained taxonomy that includes 45 unique vulnerability types for smart contracts. Taking it as a baseline, we develop an extensive benchmark that covers 40 distinct types and includes a diverse range of code characteristics, vulnerability patterns, and application scenarios. Based on them, we evaluated 8 SAST tools using this benchmark, which comprises 788 smart contract files and 10,394 vulnerabilities. Our results reveal that the existing SAST tools fail to detect around 50% of vulnerabilities in our benchmark and suffer from high false positives, with precision not surpassing 10%. We also discover that by combining the results of multiple tools, the false negative rate can be reduced effectively, at the expense of flagging 36.77 percentage points more functions. Nevertheless, many vulnerabilities, especially those beyond Access Control and Reentrancy vulnerabilities, remain undetected. We finally highlight the valuable insights from our study, ho** to provide guidance on tool development, enhancement, evaluation, and selection for developers, researchers, and practitioners. △ Less

Submitted 29 June, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

Comments: to appear at FSE 2024

arXiv:2404.18130 [pdf, other]

Logic Agent: Enhancing Validity with Logic Rule Invocation

Authors: Hanmeng Liu, Zhiyang Teng, Chaoli Zhang, Yue Zhang

Abstract: Chain-of-Thought (CoT) prompting has emerged as a pivotal technique for augmenting the inferential capabilities of language models during reasoning tasks. Despite its advancements, CoT often grapples with challenges in validating reasoning validity and ensuring informativeness. Addressing these limitations, this paper introduces the Logic Agent (LA), an agent-based framework aimed at enhancing the… ▽ More Chain-of-Thought (CoT) prompting has emerged as a pivotal technique for augmenting the inferential capabilities of language models during reasoning tasks. Despite its advancements, CoT often grapples with challenges in validating reasoning validity and ensuring informativeness. Addressing these limitations, this paper introduces the Logic Agent (LA), an agent-based framework aimed at enhancing the validity of reasoning processes in Large Language Models (LLMs) through strategic logic rule invocation. Unlike conventional approaches, LA transforms LLMs into logic agents that dynamically apply propositional logic rules, initiating the reasoning process by converting natural language inputs into structured logic forms. The logic agent leverages a comprehensive set of predefined functions to systematically navigate the reasoning process. This methodology not only promotes the structured and coherent generation of reasoning constructs but also significantly improves their interpretability and logical coherence. Through extensive experimentation, we demonstrate LA's capacity to scale effectively across various model sizes, markedly improving the precision of complex reasoning across diverse tasks. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.17806 [pdf, other]

T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

Authors: Yi Yuan, Zhuo Chen, Xubo Liu, Haohe Liu, Xuenan Xu, Dongya Jia, Yuanzhe Chen, Mark D. Plumbley, Wenwu Wang

Abstract: Contrastive language-audio pretraining~(CLAP) has been developed to align the representations of audio and language, achieving remarkable performance in retrieval and classification tasks. However, current CLAP struggles to capture temporal information within audio and text features, presenting substantial limitations for tasks such as audio retrieval and generation. To address this gap, we introd… ▽ More Contrastive language-audio pretraining~(CLAP) has been developed to align the representations of audio and language, achieving remarkable performance in retrieval and classification tasks. However, current CLAP struggles to capture temporal information within audio and text features, presenting substantial limitations for tasks such as audio retrieval and generation. To address this gap, we introduce T-CLAP, a temporal-enhanced CLAP model. We use Large Language Models~(LLMs) and mixed-up strategies to generate temporal-contrastive captions for audio clips from extensive audio-text datasets. Subsequently, a new temporal-focused contrastive loss is designed to fine-tune the CLAP model by incorporating these synthetic data. We conduct comprehensive experiments and analysis in multiple downstream tasks. T-CLAP shows improved capability in capturing the temporal relationship of sound events and outperforms state-of-the-art models by a significant margin. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Comments: Preprint submitted to IEEE MLSP 2024

arXiv:2404.17685 [pdf]

Localization Through Particle Filter Powered Neural Network Estimated Monocular Camera Poses

Authors: Yi Shen, Hao Liu, Xinxin Liu, Wen**g Zhou, Chang Zhou, Yizhou Chen

Abstract: The reduced cost and computational and calibration requirements of monocular cameras make them ideal positioning sensors for mobile robots, albeit at the expense of any meaningful depth measurement. Solutions proposed by some scholars to this localization problem involve fusing pose estimates from convolutional neural networks (CNNs) with pose estimates from geometric constraints on motion to gene… ▽ More The reduced cost and computational and calibration requirements of monocular cameras make them ideal positioning sensors for mobile robots, albeit at the expense of any meaningful depth measurement. Solutions proposed by some scholars to this localization problem involve fusing pose estimates from convolutional neural networks (CNNs) with pose estimates from geometric constraints on motion to generate accurate predictions of robot trajectories. However, the distribution of attitude estimation based on CNN is not uniform, resulting in certain translation problems in the prediction of robot trajectories. This paper proposes improving these CNN-based pose estimates by propagating a SE(3) uniform distribution driven by a particle filter. The particles utilize the same motion model used by the CNN, while updating their weights using CNN-based estimates. The results show that while the rotational component of pose estimation does not consistently improve relative to CNN-based estimation, the translational component is significantly more accurate. This factor combined with the superior smoothness of the filtered trajectories shows that the use of particle filters significantly improves the performance of CNN-based localization algorithms. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.17379 [pdf]

Adaptive speed planning for Unmanned Vehicle Based on Deep Reinforcement Learning

Authors: Hao Liu, Yi Shen, Wen**g Zhou, Yuelin Zou, Chang Zhou, Shuyao He

Abstract: In order to solve the problem of frequent deceleration of unmanned vehicles when approaching obstacles, this article uses a Deep Q-Network (DQN) and its extension, the Double Deep Q-Network (DDQN), to develop a local navigation system that adapts to obstacles while maintaining optimal speed planning. By integrating improved reward functions and obstacle angle determination methods, the system demo… ▽ More In order to solve the problem of frequent deceleration of unmanned vehicles when approaching obstacles, this article uses a Deep Q-Network (DQN) and its extension, the Double Deep Q-Network (DDQN), to develop a local navigation system that adapts to obstacles while maintaining optimal speed planning. By integrating improved reward functions and obstacle angle determination methods, the system demonstrates significant enhancements in maneuvering capabilities without frequent decelerations. Experiments conducted in simulated environments with varying obstacle densities confirm the effectiveness of the proposed method in achieving more stable and efficient path planning. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.17286 [pdf, other]

Black Hole Singularity from OPE

Authors: Nejc Čeplak, Hong Liu, Andrei Parnachev, Samuel Valach

Abstract: Eternal asymptotically AdS black holes are dual to thermofield double states in the boundary CFT. It has long been known that black hole singularities have certain signatures in boundary thermal two-point functions related to null geodesics bouncing off the singularities (bouncing geodesics). In this paper we shed light on the manifestations of black hole singularities in the dual CFT. We decompos… ▽ More Eternal asymptotically AdS black holes are dual to thermofield double states in the boundary CFT. It has long been known that black hole singularities have certain signatures in boundary thermal two-point functions related to null geodesics bouncing off the singularities (bouncing geodesics). In this paper we shed light on the manifestations of black hole singularities in the dual CFT. We decompose the boundary CFT correlator of scalar operators using the Operator Product Expansion (OPE) and focus on the contributions from the identity, the stress tensor, and its products. We show that this part of the correlator develops singularities precisely at the points that are connected by bulk bouncing geodesics. Black hole singularities are thus encoded in the analytic behavior of the boundary correlators determined by multiple stress tensor exchanges. Furthermore, we show that in the limit where the conformal dimension of the operators is large, the sum of multi-stress-tensor contributions develops a branch point singularity as predicted by the geodesic analysis. We also argue that the appearance of complexified geodesics, which play an important role in computing the full correlator, is related to the contributions of the double-trace operators in the boundary CFT. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 36+29 pages, 18 figures

arXiv:2404.17147 [pdf, other]

On the Federated Learning Framework for Cooperative Perception

Authors: Zhenrong Zhang, Jianan Liu, Xi Zhou, Tao Huang, Qing-Long Han, **gxin Liu, Hongbin Liu

Abstract: Cooperative perception is essential to enhance the efficiency and safety of future transportation systems, requiring extensive data sharing among vehicles on the road, which raises significant privacy concerns. Federated learning offers a promising solution by enabling data privacy-preserving collaborative enhancements in perception, decision-making, and planning among connected and autonomous veh… ▽ More Cooperative perception is essential to enhance the efficiency and safety of future transportation systems, requiring extensive data sharing among vehicles on the road, which raises significant privacy concerns. Federated learning offers a promising solution by enabling data privacy-preserving collaborative enhancements in perception, decision-making, and planning among connected and autonomous vehicles (CAVs). However, federated learning is impeded by significant challenges arising from data heterogeneity across diverse clients, potentially diminishing model accuracy and prolonging convergence periods. This study introduces a specialized federated learning framework for CP, termed the federated dynamic weighted aggregation (FedDWA) algorithm, facilitated by dynamic adjusting loss (DALoss) function. This framework employs dynamic client weighting to direct model convergence and integrates a novel loss function that utilizes Kullback-Leibler divergence (KLD) to counteract the detrimental effects of non-independently and identically distributed (Non-IID) and unbalanced data. Utilizing the BEV transformer as the primary model, our rigorous testing on the OpenV2V dataset, augmented with FedBEVT data, demonstrates significant improvements in the average intersection over union (IoU). These results highlight the substantial potential of our federated learning framework to address data heterogeneity challenges in CP, thereby enhancing the accuracy of environmental perception models and facilitating more robust and efficient collaborative learning solutions in the transportation sector. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.16776 [pdf, other]

Modeling Selective Feature Attention for Representation-based Siamese Text Matching

Authors: Jianxiang Zang, Hui Liu

Abstract: Representation-based Siamese networks have risen to popularity in lightweight text matching due to their low deployment and inference costs. While word-level attention mechanisms have been implemented within Siamese networks to improve performance, we propose Feature Attention (FA), a novel downstream block designed to enrich the modeling of dependencies among embedding features. Employing "squeez… ▽ More Representation-based Siamese networks have risen to popularity in lightweight text matching due to their low deployment and inference costs. While word-level attention mechanisms have been implemented within Siamese networks to improve performance, we propose Feature Attention (FA), a novel downstream block designed to enrich the modeling of dependencies among embedding features. Employing "squeeze-and-excitation" techniques, the FA block dynamically adjusts the emphasis on individual features, enabling the network to concentrate more on features that significantly contribute to the final classification. Building upon FA, we introduce a dynamic "selection" mechanism called Selective Feature Attention (SFA), which leverages a stacked BiGRU Inception structure. The SFA block facilitates multi-scale semantic extraction by traversing different stacked BiGRU layers, encouraging the network to selectively concentrate on semantic information and embedding features across varying levels of abstraction. Both the FA and SFA blocks offer a seamless integration capability with various Siamese networks, showcasing a plug-and-play characteristic. Experimental evaluations conducted across diverse text matching baselines and benchmarks underscore the indispensability of modeling feature attention and the superiority of the "selection" mechanism. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: Accepted by IJCAI2024

arXiv:2404.16555 [pdf, other]

MMGRec: Multimodal Generative Recommendation with Transformer Model

Authors: Han Liu, Yinwei Wei, Xuemeng Song, Weili Guan, Yuan-Fang Li, Liqiang Nie

Abstract: Multimodal recommendation aims to recommend user-preferred candidates based on her/his historically interacted items and associated multimodal information. Previous studies commonly employ an embed-and-retrieve paradigm: learning user and item representations in the same embedding space, then retrieving similar candidate items for a user via embedding inner product. However, this paradigm suffers… ▽ More Multimodal recommendation aims to recommend user-preferred candidates based on her/his historically interacted items and associated multimodal information. Previous studies commonly employ an embed-and-retrieve paradigm: learning user and item representations in the same embedding space, then retrieving similar candidate items for a user via embedding inner product. However, this paradigm suffers from inference cost, interaction modeling, and false-negative issues. Toward this end, we propose a new MMGRec model to introduce a generative paradigm into multimodal recommendation. Specifically, we first devise a hierarchical quantization method Graph RQ-VAE to assign Rec-ID for each item from its multimodal and CF information. Consisting of a tuple of semantically meaningful tokens, Rec-ID serves as the unique identifier of each item. Afterward, we train a Transformer-based recommender to generate the Rec-IDs of user-preferred items based on historical interaction sequences. The generative paradigm is qualified since this model systematically predicts the tuple of tokens identifying the recommended item in an autoregressive manner. Moreover, a relation-aware self-attention mechanism is devised for the Transformer to handle non-sequential interaction sequences, which explores the element pairwise relation to replace absolute positional encoding. Extensive experiments evaluate MMGRec's effectiveness compared with state-of-the-art methods. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.16425 [pdf, other]

Soft X-ray prompt emission from a high-redshift gamma-ray burst EP240315a

Authors: Y. Liu, H. Sun, D. Xu, D. S. Svinkin, J. Delaunay, N. R. Tanvir, H. Gao, C. Zhang, Y. Chen, X. -F. Wu, B. Zhang, W. Yuan, J. An, G. Bruni, D. D. Frederiks, G. Ghirlanda, J. -W. Hu, A. Li, C. -K. Li, J. -D. Li, D. B. Malesani, L. Piro, G. Raman, R. Ricci, E. Troja , et al. (170 additional authors not shown)

Abstract: Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a,… ▽ More Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a, whose bright peak was also detected by the Swift Burst Alert Telescope and Konus-Wind through off-line analyses. At a redshift of $z=4.859$, EP240315a showed a much longer and more complicated light curve in the soft X-ray band than in gamma-rays. Benefiting from a large field-of-view ($\sim$3600 deg$^2$) and a high sensitivity, EP-WXT captured the earlier engine activation and extended late engine activity through a continuous detection. With a peak X-ray flux at the faint end of previously known high-$z$ GRBs, the detection of EP240315a demonstrates the great potential for EP to study the early universe via GRBs. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 41 pages, 8 figures, 7 tables

arXiv:2404.16235 [pdf, other]

Inclusive studies of two- and three-nucleon short-range correlations in $^3$H and $^3$He

Authors: S. Li, S. N. Santiesteban, J. Arrington, R. Cruz-Torres, L. Kurbany, D. Abrams, S. Alsalmi, D. Androic, K. Aniol, T. Averett, C. Ayerbe Gayoso, J. Bane, S. Barcus, J. Barrow, A. Beck, V. Bellini, H. Bhatt, D. Bhetuwal, D. Biswas, D. Bulumulla, A. Camsonne, J. Castellanos, J. Chen, J-P. Chen, D. Chrisman , et al. (91 additional authors not shown)

Abstract: Inclusive electron scattering at carefully chosen kinematics can isolate scattering from short-range correlations (SRCs), produced through hard, short-distance interactions of nucleons in the nucleus. Because the two-nucleon (2N) SRCs arise from the same N-N interaction in all nuclei, the cross section in the SRC-dominated regime is identical up to an overall scaling factor, and the A/2H cross sec… ▽ More Inclusive electron scattering at carefully chosen kinematics can isolate scattering from short-range correlations (SRCs), produced through hard, short-distance interactions of nucleons in the nucleus. Because the two-nucleon (2N) SRCs arise from the same N-N interaction in all nuclei, the cross section in the SRC-dominated regime is identical up to an overall scaling factor, and the A/2H cross section ratio is constant in this region. This scaling behavior has been used to identify SRC dominance and to map out the contribution of SRCs for a wide range of nuclei. We examine this scaling behavior at lower momentum transfers using new data on $^2$H, $^3$H, and $^3$He which show that the scaling region is larger than in heavy nuclei. Based on the improved scaling, especially for $^3$H/$^3$He, we examine the ratios at kinematics where three-nucleon SRCs may play an important role. The data for the largest initial nucleon momenta are consistent with isolation of scattering from 3N-SRCs, and suggest that the very-highest momentum nucleons in $^3$He have a nearly isospin-independent momentum configuration, or a small enhancement of the proton distribution. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.15657 [pdf, other]

FedSI: Federated Subnetwork Inference for Efficient Uncertainty Quantification

Authors: Hui Chen, Hengyu Liu, Zhangkai Wu, Xuhui Fan, Longbing Cao

Abstract: While deep neural networks (DNNs) based personalized federated learning (PFL) is demanding for addressing data heterogeneity and shows promising performance, existing methods for federated learning (FL) suffer from efficient systematic uncertainty quantification. The Bayesian DNNs-based PFL is usually questioned of either over-simplified model structures or high computational and memory costs. In… ▽ More While deep neural networks (DNNs) based personalized federated learning (PFL) is demanding for addressing data heterogeneity and shows promising performance, existing methods for federated learning (FL) suffer from efficient systematic uncertainty quantification. The Bayesian DNNs-based PFL is usually questioned of either over-simplified model structures or high computational and memory costs. In this paper, we introduce FedSI, a novel Bayesian DNNs-based subnetwork inference PFL framework. FedSI is simple and scalable by leveraging Bayesian methods to incorporate systematic uncertainties effectively. It implements a client-specific subnetwork inference mechanism, selects network parameters with large variance to be inferred through posterior distributions, and fixes the rest as deterministic ones. FedSI achieves fast and scalable inference while preserving the systematic uncertainties to the fullest extent. Extensive experiments on three different benchmark datasets demonstrate that FedSI outperforms existing Bayesian and non-Bayesian FL baselines in heterogeneous FL scenarios. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.15284 [pdf, other]

Global 4D Ionospheric STEC Prediction based on DeepONet for GNSS Rays

Authors: Dijia Cai, Zenghui Shi, Haiyang Fu, Huan Liu, Hongyi Qian, Yun Sui, Feng Xu, Ya-Qiu **

Abstract: The ionosphere is a vitally dynamic charged particle region in the Earth's upper atmosphere, playing a crucial role in applications such as radio communication and satellite navigation. The Slant Total Electron Contents (STEC) is an important parameter for characterizing wave propagation, representing the integrated electron density along the ray of radio signals passing through the ionosphere. Th… ▽ More The ionosphere is a vitally dynamic charged particle region in the Earth's upper atmosphere, playing a crucial role in applications such as radio communication and satellite navigation. The Slant Total Electron Contents (STEC) is an important parameter for characterizing wave propagation, representing the integrated electron density along the ray of radio signals passing through the ionosphere. The accurate prediction of STEC is essential for mitigating the ionospheric impact particularly on Global Navigation Satellite Systems (GNSS). In this work, we propose a high-precision STEC prediction model named DeepONet-STEC, which learns nonlinear operators to predict the 4D temporal-spatial integrated parameter for specified ground station - satellite ray path globally. As a demonstration, we validate the performance of the model based on GNSS observation data for global and US-CORS regimes under ionospheric quiet and storm conditions. The DeepONet-STEC model results show that the three-day 72 hour prediction in quiet periods could achieve high accuracy using observation data by the Precise Point Positioning (PPP) with temporal resolution 30s. Under active solar magnetic storm periods, the DeepONet-STEC also demonstrated its robustness and superiority than traditional deep learning methods. This work presents a neural operator regression architecture for predicting the 4D temporal-spatial ionospheric parameter for satellite navigation system performance, which may be further extended for various space applications and beyond. △ Less

Submitted 12 March, 2024; originally announced April 2024.

arXiv:2404.15070 [pdf, other]

BotDGT: Dynamicity-aware Social Bot Detection with Dynamic Graph Transformers

Authors: Buyun He, Yingguang Yang, Qi Wu, Hao Liu, Renyu Yang, Hao Peng, Xiang Wang, Yong Liao, Pengyuan Zhou

Abstract: Detecting social bots has evolved into a pivotal yet intricate task, aimed at combating the dissemination of misinformation and preserving the authenticity of online interactions. While earlier graph-based approaches, which leverage topological structure of social networks, yielded notable outcomes, they overlooked the inherent dynamicity of social networks -- In reality, they largely depicted the… ▽ More Detecting social bots has evolved into a pivotal yet intricate task, aimed at combating the dissemination of misinformation and preserving the authenticity of online interactions. While earlier graph-based approaches, which leverage topological structure of social networks, yielded notable outcomes, they overlooked the inherent dynamicity of social networks -- In reality, they largely depicted the social network as a static graph and solely relied on its most recent state. Due to the absence of dynamicity modeling, such approaches are vulnerable to evasion, particularly when advanced social bots interact with other users to camouflage identities and escape detection. To tackle these challenges, we propose BotDGT, a novel framework that not only considers the topological structure, but also effectively incorporates dynamic nature of social network. Specifically, we characterize a social network as a dynamic graph. A structural module is employed to acquire topological information from each historical snapshot. Additionally, a temporal module is proposed to integrate historical context and model the evolving behavior patterns exhibited by social bots and legitimate users. Experimental results demonstrate the superiority of BotDGT against the leading methods that neglected the dynamic nature of social networks in terms of accuracy, recall, and F1-score. △ Less

Submitted 24 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

Comments: IJCAI 2024

arXiv:2404.15028 [pdf, other]

PRISM: A Promptable and Robust Interactive Segmentation Model with Visual Prompts

Authors: Hao Li, Han Liu, Dewei Hu, Jiacheng Wang, Ipek Oguz

Abstract: In this paper, we present PRISM, a Promptable and Robust Interactive Segmentation Model, aiming for precise segmentation of 3D medical images. PRISM accepts various visual inputs, including points, boxes, and scribbles as sparse prompts, as well as masks as dense prompts. Specifically, PRISM is designed with four principles to achieve robustness: (1) Iterative learning. The model produces segmenta… ▽ More In this paper, we present PRISM, a Promptable and Robust Interactive Segmentation Model, aiming for precise segmentation of 3D medical images. PRISM accepts various visual inputs, including points, boxes, and scribbles as sparse prompts, as well as masks as dense prompts. Specifically, PRISM is designed with four principles to achieve robustness: (1) Iterative learning. The model produces segmentations by using visual prompts from previous iterations to achieve progressive improvement. (2) Confidence learning. PRISM employs multiple segmentation heads per input image, each generating a continuous map and a confidence score to optimize predictions. (3) Corrective learning. Following each segmentation iteration, PRISM employs a shallow corrective refinement network to reassign mislabeled voxels. (4) Hybrid design. PRISM integrates hybrid encoders to better capture both the local and global information. Comprehensive validation of PRISM is conducted using four public datasets for tumor segmentation in the colon, pancreas, liver, and kidney, highlighting challenges caused by anatomical variations and ambiguous boundaries in accurate tumor identification. Compared to state-of-the-art methods, both with and without prompt engineering, PRISM significantly improves performance, achieving results that are close to human levels. The code is publicly available at https://github.com/MedICL-VU/PRISM. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.14928 [pdf, other]

Graph Machine Learning in the Era of Large Language Models (LLMs)

Authors: Wenqi Fan, Shijie Wang, Jiani Huang, Zhikai Chen, Yu Song, Wenzhuo Tang, Haitao Mao, Hui Liu, Xiaorui Liu, Dawei Yin, Qing Li

Abstract: Graphs play an important role in representing complex relationships in various domains like social networks, knowledge graphs, and molecular discovery. With the advent of deep learning, Graph Neural Networks (GNNs) have emerged as a cornerstone in Graph Machine Learning (Graph ML), facilitating the representation and processing of graph structures. Recently, LLMs have demonstrated unprecedented ca… ▽ More Graphs play an important role in representing complex relationships in various domains like social networks, knowledge graphs, and molecular discovery. With the advent of deep learning, Graph Neural Networks (GNNs) have emerged as a cornerstone in Graph Machine Learning (Graph ML), facilitating the representation and processing of graph structures. Recently, LLMs have demonstrated unprecedented capabilities in language tasks and are widely adopted in a variety of applications such as computer vision and recommender systems. This remarkable success has also attracted interest in applying LLMs to the graph domain. Increasing efforts have been made to explore the potential of LLMs in advancing Graph ML's generalization, transferability, and few-shot learning ability. Meanwhile, graphs, especially knowledge graphs, are rich in reliable factual knowledge, which can be utilized to enhance the reasoning capabilities of LLMs and potentially alleviate their limitations such as hallucinations and the lack of explainability. Given the rapid progress of this research direction, a systematic review summarizing the latest advancements for Graph ML in the era of LLMs is necessary to provide an in-depth understanding to researchers and practitioners. Therefore, in this survey, we first review the recent developments in Graph ML. We then explore how LLMs can be utilized to enhance the quality of graph features, alleviate the reliance on labeled data, and address challenges such as graph heterogeneity and out-of-distribution (OOD) generalization. Afterward, we delve into how graphs can enhance LLMs, highlighting their abilities to enhance LLM pre-training and inference. Furthermore, we investigate various applications and discuss the potential future directions in this promising field. △ Less

Submitted 3 June, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.14775 [pdf, ps, other]

Properties of quark stars based on the density-dependent MIT bag model

Authors: Min Ju, Pengcheng Chu, Xuhao Wu, He Liu

Abstract: In this study, we extend the MIT bag model by incorporating the vector interaction among quarks and introducing a density-dependent bag pressure. Then we proceed to investigate the thermodynamic properties of strange quark matter (SQM) and pure up-down quark matter (udQM) in quark stars. The results demonstrate that the vector interaction among quarks and the densitydependent bag pressure have sig… ▽ More In this study, we extend the MIT bag model by incorporating the vector interaction among quarks and introducing a density-dependent bag pressure. Then we proceed to investigate the thermodynamic properties of strange quark matter (SQM) and pure up-down quark matter (udQM) in quark stars. The results demonstrate that the vector interaction among quarks and the densitydependent bag pressure have significant impacts on the equation of state for both SQM and udQM. The inclusion of GV , which represents the strength of vector interactions, results in a stiffening of equation of state while maintaining causality. This allows for the description of massive compact stars such as those observed in GW190814 and PSR J0740+6620 as quark stars. Ultimately, we utilize the vMIT bag model to derive a series of mass-radius relations of quark stars (QSs) which is consistent with the astronomical observations from HESS J1731-347, 4U 1702-429, PSR J0740+6620, GW170817 and GW190814. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.14720 [pdf, other]

Incorporating Gradients to Rules: Towards Lightweight, Adaptive Provenance-based Intrusion Detection

Authors: Lingzhi Wang, Xiangmin Shen, Weijian Li, Zhenyuan Li, R. Sekar, Han Liu, Yan Chen

Abstract: As cyber-attacks become increasingly sophisticated and stealthy, it becomes more imperative and challenging to detect intrusion from normal behaviors. Through fine-grained causality analysis, provenance-based intrusion detection systems (PIDS) demonstrated a promising capacity to distinguish benign and malicious behaviors, attracting widespread attention from both industry and academia. Among dive… ▽ More As cyber-attacks become increasingly sophisticated and stealthy, it becomes more imperative and challenging to detect intrusion from normal behaviors. Through fine-grained causality analysis, provenance-based intrusion detection systems (PIDS) demonstrated a promising capacity to distinguish benign and malicious behaviors, attracting widespread attention from both industry and academia. Among diverse approaches, rule-based PIDS stands out due to its lightweight overhead, real-time capabilities, and explainability. However, existing rule-based systems suffer low detection accuracy, especially the high false alarms, due to the lack of fine-grained rules and environment-specific configurations. In this paper, we propose CAPTAIN, a rule-based PIDS capable of automatically adapting to diverse environments. Specifically, we propose three adaptive parameters to adjust the detection configuration with respect to nodes, edges, and alarm generation thresholds. We build a differentiable tag propagation framework and utilize the gradient descent algorithm to optimize these adaptive parameters based on the training data. We evaluate our system based on data from DARPA Engagement and simulated environments. The evaluation results demonstrate that CAPTAIN offers better detection accuracy, less detection latency, lower runtime overhead, and more interpretable detection alarms and knowledge compared to the SOTA PIDS. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.14700 [pdf, other]

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Authors: Zhen Ye, Zeqian Ju, Haohe Liu, Xu Tan, Jianyi Chen, Yiwen Lu, Peiwen Sun, Jiahao Pan, Weizhen Bian, Shulin He, Qifeng Liu, Yike Guo, Wei Xue

Abstract: Recent progress in large-scale zero-shot speech synthesis has been significantly advanced by language models and diffusion models. However, the generation process of both methods is slow and computationally intensive. Efficient speech synthesis using a lower computing budget to achieve quality on par with previous work remains a significant challenge. In this paper, we present FlashSpeech, a large… ▽ More Recent progress in large-scale zero-shot speech synthesis has been significantly advanced by language models and diffusion models. However, the generation process of both methods is slow and computationally intensive. Efficient speech synthesis using a lower computing budget to achieve quality on par with previous work remains a significant challenge. In this paper, we present FlashSpeech, a large-scale zero-shot speech synthesis system with approximately 5\% of the inference time compared with previous work. FlashSpeech is built on the latent consistency model and applies a novel adversarial consistency training approach that can train from scratch without the need for a pre-trained diffusion model as the teacher. Furthermore, a new prosody generator module enhances the diversity of prosody, making the rhythm of the speech sound more natural. The generation processes of FlashSpeech can be achieved efficiently with one or two sampling steps while maintaining high audio quality and high similarity to the audio prompt for zero-shot speech generation. Our experimental results demonstrate the superior performance of FlashSpeech. Notably, FlashSpeech can be about 20 times faster than other zero-shot speech synthesis systems while maintaining comparable performance in terms of voice quality and similarity. Furthermore, FlashSpeech demonstrates its versatility by efficiently performing tasks like voice conversion, speech editing, and diverse speech sampling. Audio samples can be found in https://flashspeech.github.io/. △ Less

Submitted 24 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: Efficient zero-shot speech synthesis

arXiv:2404.14467 [pdf, other]

Integrating Chemistry Knowledge in Large Language Models via Prompt Engineering

Authors: Hongxuan Liu, Haoyu Yin, Zhiyao Luo, Xiaonan Wang

Abstract: This paper presents a study on the integration of domain-specific knowledge in prompt engineering to enhance the performance of large language models (LLMs) in scientific domains. A benchmark dataset is curated to encapsulate the intricate physical-chemical properties of small molecules, their drugability for pharmacology, alongside the functional attributes of enzymes and crystal materials, under… ▽ More This paper presents a study on the integration of domain-specific knowledge in prompt engineering to enhance the performance of large language models (LLMs) in scientific domains. A benchmark dataset is curated to encapsulate the intricate physical-chemical properties of small molecules, their drugability for pharmacology, alongside the functional attributes of enzymes and crystal materials, underscoring the relevance and applicability across biological and chemical domains.The proposed domain-knowledge embedded prompt engineering method outperforms traditional prompt engineering strategies on various metrics, including capability, accuracy, F1 score, and hallucination drop. The effectiveness of the method is demonstrated through case studies on complex materials including the MacMillan catalyst, paclitaxel, and lithium cobalt oxide. The results suggest that domain-knowledge prompts can guide LLMs to generate more accurate and relevant responses, highlighting the potential of LLMs as powerful tools for scientific discovery and innovation when equipped with domain-specific prompts. The study also discusses limitations and future directions for domain-specific prompt engineering development. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 43 pages, 17 figures

arXiv:2404.14013 [pdf, ps, other]

A characterization of compactness via bilinear $T1$ theorem

Authors: Mingming Cao, Honghai Liu, Zengyan Si, Kôzô Yabuta

Abstract: We establish a bilinear $T1$ theorem to characterize the weighted compactness of bilinear Calderón--Zygmund operators. Let $T$ be a bilinear operator associated with a standard bilinear Calderón--Zygmund kernel. We demonstrate that $T$ can be extended to a compact bilinear operator from $L^{p_1}(w_1^{p_1}) \times L^{p_2}(w_2^{p_2})$ to $L^p(w^p)$ for all exponents… ▽ More We establish a bilinear $T1$ theorem to characterize the weighted compactness of bilinear Calderón--Zygmund operators. Let $T$ be a bilinear operator associated with a standard bilinear Calderón--Zygmund kernel. We demonstrate that $T$ can be extended to a compact bilinear operator from $L^{p_1}(w_1^{p_1}) \times L^{p_2}(w_2^{p_2})$ to $L^p(w^p)$ for all exponents $\frac{1}{p} = \frac{1}{p_1} + \frac{1}{p_2}$ with $1<p_1, p_2< \infty$ and for all weights $(w_1, w_2) \in A_{(p_1, p_2)}$ if and only if the following conditions hold: (i) $T$ is associated with a compact bilinear Calderón--Zygmund kernel, (ii) $T$ satisfies the weak compactness property, and (iii) $T(1,1), T^{*1}(1,1), T^{*2}(1,1) \in \mathrm{CMO}(\mathbb{R}^n)$. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: This is just a draft, but we post the file in its current form now, in response to several queries about the result and method. Eventually, these results will be a part of a more extensive work about compactness of bilinear singular integrals

MSC Class: 42B20; 42B35

arXiv:2404.13840 [pdf, other]

Study of $e^+e^-\toωX(3872)$ and $γX(3872)$ from 4.66 to 4.95 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: Using data samples with an integrated luminosity of $4.5~\text{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies ranging from 4.66 to 4.95 GeV, we study the processes of $e^+e^-\toωX(3872)$ and $e^+e^-\toγX(3872)$. With the $e^+e^-\toωX(3872)$ process, the branching fraction ratio $R\equiv\frac{\mathcal{B}(X(3872)\toγJ/ψ)}{\mathcal{B}(X(3872)\toπ^+π^- J/ψ)}$ is measured to be… ▽ More Using data samples with an integrated luminosity of $4.5~\text{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies ranging from 4.66 to 4.95 GeV, we study the processes of $e^+e^-\toωX(3872)$ and $e^+e^-\toγX(3872)$. With the $e^+e^-\toωX(3872)$ process, the branching fraction ratio $R\equiv\frac{\mathcal{B}(X(3872)\toγJ/ψ)}{\mathcal{B}(X(3872)\toπ^+π^- J/ψ)}$ is measured to be $0.38\pm0.20_\text{stat.}\pm0.01_\text{syst.}$ ($R< 0.83$ at 90\% confidence level). In addition, we measure the ratio of the average cross section of $e^+e^-\toωX(3872)$ to $e^+e^-\toωχ_{c1}(ωχ_{c2})$ to be $σ_{ωX(3872)}/σ_{ωχ_{c1}}~(σ_{ωX(3872)}/σ_{ωχ_{c2}})=5.2\pm1.0_\text{stat.}\pm1.9_\text{syst.}~ (5.5\pm1.1_\text{stat.}\pm2.4_\text{syst.})$. Finally, we search for the process of $e^+e^-\toγX(3872)$, and no obvious signal is observed. The upper limit on the ratio of the average cross section of $e^+e^-\toγX(3872)$ to $e^+e^-\toωX(3872)$ is set as $σ_{γX(3872)}/σ_{ωX(3872)}<0.23$ at 90\% confidence level. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 19 pages, 10 figures

arXiv:2404.13677 [pdf, other]

A Dataset and Model for Realistic License Plate Deblurring

Authors: Haoyan Gong, Yuzheng Feng, Zhenrong Zhang, Xianxu Hou, **gxin Liu, Siqi Huang, Hongbin Liu

Abstract: Vehicle license plate recognition is a crucial task in intelligent traffic management systems. However, the challenge of achieving accurate recognition persists due to motion blur from fast-moving vehicles. Despite the widespread use of image synthesis approaches in existing deblurring and recognition algorithms, their effectiveness in real-world scenarios remains unproven. To address this, we int… ▽ More Vehicle license plate recognition is a crucial task in intelligent traffic management systems. However, the challenge of achieving accurate recognition persists due to motion blur from fast-moving vehicles. Despite the widespread use of image synthesis approaches in existing deblurring and recognition algorithms, their effectiveness in real-world scenarios remains unproven. To address this, we introduce the first large-scale license plate deblurring dataset named License Plate Blur (LPBlur), captured by a dual-camera system and processed through a post-processing pipeline to avoid misalignment issues. Then, we propose a License Plate Deblurring Generative Adversarial Network (LPDGAN) to tackle the license plate deblurring: 1) a Feature Fusion Module to integrate multi-scale latent codes; 2) a Text Reconstruction Module to restore structure through textual modality; 3) a Partition Discriminator Module to enhance the model's perception of details in each letter. Extensive experiments validate the reliability of the LPBlur dataset for both model training and testing, showcasing that our proposed model outperforms other state-of-the-art motion deblurring methods in realistic license plate deblurring scenarios. The dataset and code are available at https://github.com/haoyGONG/LPDGAN. △ Less

Submitted 22 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

Comments: Accepted by IJCAI 2024

arXiv:2404.13657 [pdf, other]

MLP: Motion Label Prior for Temporal Sentence Localization in Untrimmed 3D Human Motions

Authors: Sheng Yan, Mengyuan Liu, Yong Wang, Yang Liu, Chen Chen, Hong Liu

Abstract: In this paper, we address the unexplored question of temporal sentence localization in human motions (TSLM), aiming to locate a target moment from a 3D human motion that semantically corresponds to a text query. Considering that 3D human motions are captured using specialized motion capture devices, motions with only a few joints lack complex scene information like objects and lighting. Due to thi… ▽ More In this paper, we address the unexplored question of temporal sentence localization in human motions (TSLM), aiming to locate a target moment from a 3D human motion that semantically corresponds to a text query. Considering that 3D human motions are captured using specialized motion capture devices, motions with only a few joints lack complex scene information like objects and lighting. Due to this character, motion data has low contextual richness and semantic ambiguity between frames, which limits the accuracy of predictions made by current video localization frameworks extended to TSLM to only a rough level. To refine this, we devise two novel label-prior-assisted training schemes: one embed prior knowledge of foreground and background to highlight the localization chances of target moments, and the other forces the originally rough predictions to overlap with the more accurate predictions obtained from the flipped start/end prior label sequences during recovery training. We show that injecting label-prior knowledge into the model is crucial for improving performance at high IoU. In our constructed TSLM benchmark, our model termed MLP achieves a recall of 44.13 at [email protected] on the BABEL dataset and 71.17 on HumanML3D (Restore), outperforming prior works. Finally, we showcase the potential of our approach in corpus-level moment retrieval. Our source code is openly accessible at https://github.com/eanson023/mlp. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 13 pages, 9 figures

arXiv:2404.13472 [pdf, other]

Foundry manufacturing of octave-spanning microcombs

Authors: Jizhao Zang, Haixin Liu, Travis C. Briles, Scott B. Papp

Abstract: Soliton microcombs provide a chip-based, octave-spanning source for self-referencing and optical metrology. We explore use of a silicon-nitride integrated photonics foundry to manufacture octave-spanning microcombs. By group-velocity dispersion engineering with the waveguide cross-section, we shape the soliton spectrum for dispersive-wave spectral enhancements at the frequencies for f-2f self-refe… ▽ More Soliton microcombs provide a chip-based, octave-spanning source for self-referencing and optical metrology. We explore use of a silicon-nitride integrated photonics foundry to manufacture octave-spanning microcombs. By group-velocity dispersion engineering with the waveguide cross-section, we shape the soliton spectrum for dispersive-wave spectral enhancements at the frequencies for f-2f self-referencing. With the optimized waveguide geometry, we control the carrier-envelope offset frequency by adjusting the resonator radius. Moreover, we demonstrate the other considerations for octave microcombs, including models for soliton spectrum design, ultra-broadband resonator external coupling, low-loss edge couplers, and the nonlinear self-interactions of few-cycle solitons. This design process permits highly repeatable creation of soliton microcombs optimized for pump operation less than 100 mW, an electronically detectable offset frequency, and high comb mode power for f-2f detection. However, these design aspects must also be made compatible with the foundry fabrication tolerance of octave microcomb devices. Our experiments highlight the potential to manufacture a single-chip solution for an octave-spanning microcomb, which is the central component of a compact microsystem for optical metrology. △ Less

Submitted 20 April, 2024; originally announced April 2024.

arXiv:2404.13390 [pdf, other]

Explanation based Bias Decoupling Regularization for Natural Language Inference

Authors: Jianxiang Zang, Hui Liu

Abstract: The robustness of Transformer-based Natural Language Inference encoders is frequently compromised as they tend to rely more on dataset biases than on the intended task-relevant features. Recent studies have attempted to mitigate this by reducing the weight of biased samples during the training process. However, these debiasing methods primarily focus on identifying which samples are biased without… ▽ More The robustness of Transformer-based Natural Language Inference encoders is frequently compromised as they tend to rely more on dataset biases than on the intended task-relevant features. Recent studies have attempted to mitigate this by reducing the weight of biased samples during the training process. However, these debiasing methods primarily focus on identifying which samples are biased without explicitly determining the biased components within each case. This limitation restricts those methods' capability in out-of-distribution inference. To address this issue, we aim to train models to adopt the logic humans use in explaining causality. We propose a simple, comprehensive, and interpretable method: Explanation based Bias Decoupling Regularization (EBD-Reg). EBD-Reg employs human explanations as criteria, guiding the encoder to establish a tripartite parallel supervision of Distinguishing, Decoupling and Aligning. This method enables encoders to identify and focus on keywords that represent the task-relevant features during inference, while discarding the residual elements acting as biases. Empirical evidence underscores that EBD-Reg effectively guides various Transformer-based encoders to decouple biases through a human-centric lens, significantly surpassing other methods in terms of out-of-distribution inference capabilities. △ Less

Submitted 20 April, 2024; originally announced April 2024.

arXiv:2404.13195 [pdf, ps, other]

doi 10.1145/3626203.3670561

Automatic BLAS Offloading on Unified Memory Architecture: A Study on NVIDIA Grace-Hopper

Authors: Junjie Li, Yinzhi Wang, Xiao Liang, Hang Liu

Abstract: Porting codes to GPU often requires major efforts. While several tools exist for automatically offload numerical libraries such as BLAS and LAPACK, they often prove impractical due to the high cost of mandatory data transfer. The new unified memory architecture in NVIDIA Grace-Hopper allows high bandwidth cache-coherent memory access of all memory from both CPU and GPU, potentially eliminating bot… ▽ More Porting codes to GPU often requires major efforts. While several tools exist for automatically offload numerical libraries such as BLAS and LAPACK, they often prove impractical due to the high cost of mandatory data transfer. The new unified memory architecture in NVIDIA Grace-Hopper allows high bandwidth cache-coherent memory access of all memory from both CPU and GPU, potentially eliminating bottleneck faced in conventional architecture. This breakthrough opens up new avenues for application development and porting strategies. In this study, we introduce a new tool for automatic BLAS offload, the tool leverages the high speed cache coherent NVLink C2C interconnect in Grace-Hopper, and enables performant GPU offload for BLAS heavy applications with no code changes or recompilation. The tool was tested on two quantum chemistry or physics codes, great performance benefits were observed. △ Less

Submitted 5 June, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.12817 [pdf, other]

Determination of the CKM angle $φ_{3}$ from a combination of Belle and Belle II results

Authors: Belle, Belle II Collaborations, :, I. Adachi, L. Aggarwal, H. Aihara, N. Akopov, A. Aloisio, S. Al Said, N. Anh Ky, D. M. Asner, H. Atmacan, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, S. Bahinipati, P. Bambade, Sw. Banerjee, S. Bansal, M. Barrett, J. Baudot, A. Baur, A. Beaubien , et al. (377 additional authors not shown)

Abstract: We report a determination of the CKM angle $φ_{3}$, also known as $γ$, from a combination of measurements using samples of up to 711~fb$^{-1}$ from the Belle experiment and up to 362~fb$^{-1}$ from the Belle II experiment. We combine results from analyses of $B^+\to DK^+, B^+\to Dπ^+$, and $B^+ \to D^{*}K^+$ decays, where $D$ is an admixture of $D^0$ and $\overline{D}{}^{0}$ mesons, in a likelihoo… ▽ More We report a determination of the CKM angle $φ_{3}$, also known as $γ$, from a combination of measurements using samples of up to 711~fb$^{-1}$ from the Belle experiment and up to 362~fb$^{-1}$ from the Belle II experiment. We combine results from analyses of $B^+\to DK^+, B^+\to Dπ^+$, and $B^+ \to D^{*}K^+$ decays, where $D$ is an admixture of $D^0$ and $\overline{D}{}^{0}$ mesons, in a likelihood fit to obtain $φ_{3} = (78.6^{+7.2}_{-7.3})^{\circ}$. We also briefly discuss the interpretation of this result. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: 31 pages, 4 figures

Report number: Belle II Preprint 2023-015, KEK Preprint 2023-31

arXiv:2404.12803 [pdf, other]

TextSquare: Scaling up Text-Centric Visual Instruction Tuning

Authors: **gqun Tang, Chunhui Lin, Zhen Zhao, Shu Wei, Binghong Wu, Qi Liu, Hao Feng, Yang Li, Siqi Wang, Lei Liao, Wei Shi, Yuliang Liu, Hao Liu, Yuan Xie, Xiang Bai, Can Huang

Abstract: Text-centric visual question answering (VQA) has made great strides with the development of Multimodal Large Language Models (MLLMs), yet open-source models still fall short of leading models like GPT4V and Gemini, partly due to a lack of extensive, high-quality instruction tuning data. To this end, we introduce a new approach for creating a massive, high-quality instruction-tuning dataset, Square… ▽ More Text-centric visual question answering (VQA) has made great strides with the development of Multimodal Large Language Models (MLLMs), yet open-source models still fall short of leading models like GPT4V and Gemini, partly due to a lack of extensive, high-quality instruction tuning data. To this end, we introduce a new approach for creating a massive, high-quality instruction-tuning dataset, Square-10M, which is generated using closed-source MLLMs. The data construction process, termed Square, consists of four steps: Self-Questioning, Answering, Reasoning, and Evaluation. Our experiments with Square-10M led to three key findings: 1) Our model, TextSquare, considerably surpasses open-source previous state-of-the-art Text-centric MLLMs and sets a new standard on OCRBench(62.2%). It even outperforms top-tier models like GPT4V and Gemini in 6 of 10 text-centric benchmarks. 2) Additionally, we demonstrate the critical role of VQA reasoning data in offering comprehensive contextual insights for specific questions. This not only improves accuracy but also significantly mitigates hallucinations. Specifically, TextSquare scores an average of 75.1% across four general VQA and hallucination evaluation datasets, outperforming previous state-of-the-art models. 3) Notably, the phenomenon observed in scaling text-centric VQA datasets reveals a vivid pattern: the exponential increase of instruction tuning data volume is directly proportional to the improvement in model performance, thereby validating the necessity of the dataset scale and the high quality of Square-10M. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.12659 [pdf, ps, other]

SOS-1K: A Fine-grained Suicide Risk Classification Dataset for Chinese Social Media Analysis

Authors: Hongzhi Qi, Hanfei Liu, Jianqiang Li, Qing Zhao, Wei Zhai, Dan Luo, Tian Yu He, Shuo Liu, Bing Xiang Yang, Guanghui Fu

Abstract: In the social media, users frequently express personal emotions, a subset of which may indicate potential suicidal tendencies. The implicit and varied forms of expression in internet language complicate accurate and rapid identification of suicidal intent on social media, thus creating challenges for timely intervention efforts. The development of deep learning models for suicide risk detection is… ▽ More In the social media, users frequently express personal emotions, a subset of which may indicate potential suicidal tendencies. The implicit and varied forms of expression in internet language complicate accurate and rapid identification of suicidal intent on social media, thus creating challenges for timely intervention efforts. The development of deep learning models for suicide risk detection is a promising solution, but there is a notable lack of relevant datasets, especially in the Chinese context. To address this gap, this study presents a Chinese social media dataset designed for fine-grained suicide risk classification, focusing on indicators such as expressions of suicide intent, methods of suicide, and urgency of timing. Seven pre-trained models were evaluated in two tasks: high and low suicide risk, and fine-grained suicide risk classification on a level of 0 to 10. In our experiments, deep learning models show good performance in distinguishing between high and low suicide risk, with the best model achieving an F1 score of 88.39%. However, the results for fine-grained suicide risk classification were still unsatisfactory, with an weighted F1 score of 50.89%. To address the issues of data imbalance and limited dataset size, we investigated both traditional and advanced, large language model based data augmentation techniques, demonstrating that data augmentation can enhance model performance by up to 4.65% points in F1-score. Notably, the Chinese MentalBERT model, which was pre-trained on psychological domain data, shows superior performance in both tasks. This study provides valuable insights for automatic identification of suicidal individuals, facilitating timely psychological intervention on social media platforms. The source code and data are publicly available. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.12582 [pdf]

Effective Sorting of Fractional Optical Vortex Modes

Authors: Zhengyang Mao, Haigang Liu, Xianfeng Chen

Abstract: Mode sorter is the crucial component of the communication systems based on orbital angular momentum (OAM). However, schemes proposed so far can only effectively sort integer OAM (IOAM) modes. Here, we demonstrate the effective sorting of fractional OAM (FOAM) modes by utilizing the coordinate transformation method, which can convert FOAM modes to IOAM modes. The transformed IOAM modes are subseque… ▽ More Mode sorter is the crucial component of the communication systems based on orbital angular momentum (OAM). However, schemes proposed so far can only effectively sort integer OAM (IOAM) modes. Here, we demonstrate the effective sorting of fractional OAM (FOAM) modes by utilizing the coordinate transformation method, which can convert FOAM modes to IOAM modes. The transformed IOAM modes are subsequently sorted by using a mode conversion method called topological charge matching. The validation of our scheme is verified by implementing two sorting processes and corresponding mode purity analyses, both theoretically and experimentally. This new sorting method exhibits a huge potential of implementing a highly confidential and high-capacity FOAM-based communication system, which may inspire further applications in both classical and quantum regimes. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 6 pages, 5 figures

arXiv:2404.12578 [pdf]

Mixed polytype/polymorph formation and its effects on the electronic properties in InSe films grown by molecular beam epitaxy on GaAs(111)B

Authors: Maria Hilse, Justin Rodriguez, Jennifer Gray, **yuan Yao, Shaoqing Ding, Derrick Shao Heng Liu, Mo Li, Joshua Young, Ying Liu, Roman Engel-Herbert

Abstract: The top-down synthesis of inherently ferroelectric semiconductors and their integration with traditional material platforms have the potential to enable new low power logic devices, and to harness the bulk photoelectric effect for more efficient photovoltaic cells. InSe is a layered van der Waals compound exhibiting multiple polytypes, with semiconducting gamma-InSe revealing a non-centrosymmetric… ▽ More The top-down synthesis of inherently ferroelectric semiconductors and their integration with traditional material platforms have the potential to enable new low power logic devices, and to harness the bulk photoelectric effect for more efficient photovoltaic cells. InSe is a layered van der Waals compound exhibiting multiple polytypes, with semiconducting gamma-InSe revealing a non-centrosymmetric space group and showing a high carrier mobility at room temperature. Here we report the growth of InSe films on close to lattice matched semi-insulating GaAs(111)B substrates by molecular beam epitaxy (MBE). Excellent nucleation behavior resulted in the growth of smooth, single phase InSe films. The dominant polytype determined from X-ray diffraction was the targeted gamma-InSe, however Raman spectroscopy revealed spatial variations in the overall low-intensity non-centrosymmetric vibration modes. Transmission electron microscopy uncovered the presence of the three bulk polytypes beta, gamma, and epsilon-InSe coexisting in the films arranging in nanosized domains. The different polytypes can be interpreted as sequences of stacking faults and rotational twin boundaries of gamma-InSe made from individual non-centrosymmetric Se-In-In-Se layers with P-6m2 symmetry. A second, centrosymmetric Se-In-In-Se layer polymorph was identified with P-3m symmetry, which is typically not present in InSe bulk phases. First principles calculations revealed small formation energy differences between the InSe polymorphs and polytypes, yet sizeable differences in their electronic properties. Nanoscale domain sizes of varying polytypes thus resulted in sizeable electronic disorder in the grown films that dominated the electronic transport properties. Our results indicate that bottom-up thin film synthesis is a viable synthesis route towards stabilization of InSe polytypes not present in the bulk. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.12374 [pdf]

Tunable Kondo physics in a van der Waals kagome antiferromagnet

Authors: Boqin Song, Yuyang Xie, Wei-Jian Li, Hui Liu, Qinghua Zhang, Jian-gang Guo, Lin Zhao, Shun-Li Yu, Xingjiang Zhou, Xiaolong Chen, Tian** Ying

Abstract: The Kondo lattice physics, describing the hybridization of localized spin matrix with dispersive conduction electrons, breeds numerous discoveries in the realm of strongly correlated quantum matter. Generally observed in lanthanide and actinide compounds, increasing attention has been directed towards alternative pathways for achieving flat band structures, such as Morie superlattices and Kagome m… ▽ More The Kondo lattice physics, describing the hybridization of localized spin matrix with dispersive conduction electrons, breeds numerous discoveries in the realm of strongly correlated quantum matter. Generally observed in lanthanide and actinide compounds, increasing attention has been directed towards alternative pathways for achieving flat band structures, such as Morie superlattices and Kagome metals. However, fine control of Kondo interaction outside of heterostructures remains elusive. Here we report the discovery of a van der Waals (vdW) kagome antiferromagnet CsCr6Sb6. Angle-resolved photoemission spectra and theoretical analysis show clear flat bands, consisting of half-filled 3dxz and 3dyz orbitals of Cr, situated 50 meV below the Fermi level. Importantly, we observe the emergence of anomalous Hall effect with remarkable tunability by simple reduction the sample thickness. The effective control of kondo interaction in CsCr6Sb6 render it an ideal platform for exploring unpresented phenomena using the vast toolkit of vdW structures. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.12322 [pdf, other]

Generalizable Face Landmarking Guided by Conditional Face War**

Authors: Jiayi Liang, Haotian Liu, Hongteng Xu, Dixin Luo

Abstract: As a significant step for human face modeling, editing, and generation, face landmarking aims at extracting facial keypoints from images. A generalizable face landmarker is required in practice because real-world facial images, e.g., the avatars in animations and games, are often stylized in various ways. However, achieving generalizable face landmarking is challenging due to the diversity of faci… ▽ More As a significant step for human face modeling, editing, and generation, face landmarking aims at extracting facial keypoints from images. A generalizable face landmarker is required in practice because real-world facial images, e.g., the avatars in animations and games, are often stylized in various ways. However, achieving generalizable face landmarking is challenging due to the diversity of facial styles and the scarcity of labeled stylized faces. In this study, we propose a simple but effective paradigm to learn a generalizable face landmarker based on labeled real human faces and unlabeled stylized faces. Our method learns the face landmarker as the key module of a conditional face warper. Given a pair of real and stylized facial images, the conditional face warper predicts a war** field from the real face to the stylized one, in which the face landmarker predicts the ending points of the war** field and provides us with high-quality pseudo landmarks for the corresponding stylized facial images. Applying an alternating optimization strategy, we learn the face landmarker to minimize $i)$ the discrepancy between the stylized faces and the warped real ones and $ii)$ the prediction errors of both real and pseudo landmarks. Experiments on various datasets show that our method outperforms existing state-of-the-art domain adaptation methods in face landmarking tasks, leading to a face landmarker with better generalizability. Code is available at https://plustwo0.github.io/project-face-landmarker. △ Less

Submitted 21 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

Comments: Accepted in CVPR 2024

arXiv:2404.12161 [pdf, other]

doi 10.3847/1538-4357/ad49a1

The 2018 outburst of MAXI J1820+070 as seen by Insight-HXMT

Authors: Ningyue Fan, Songyu Li, Rui Zhan, Honghui Liu, Zuobin Zhang, Cosimo Bambi, Long Ji, Xiang Ma, James F. Steiner, Shuang-Nan Zhang, Menglei Zhou

Abstract: We present an analysis of the whole 2018 outburst of the black hole X-ray binary MAXI J1820+070 with Insight-HXMT data. We focus our study on the temporal evolution of the parameters of the source. We employ two different models to fit the disk's thermal spectrum: the Newtonian model DISKBB and the relativistic model NKBB. These two models provide different pictures of the source in the soft state… ▽ More We present an analysis of the whole 2018 outburst of the black hole X-ray binary MAXI J1820+070 with Insight-HXMT data. We focus our study on the temporal evolution of the parameters of the source. We employ two different models to fit the disk's thermal spectrum: the Newtonian model DISKBB and the relativistic model NKBB. These two models provide different pictures of the source in the soft state. With DISKBB, we find that the inner edge of the disk is close to the innermost stable circular orbit of a fast-rotating black hole and the corona changes geometry from the hard to the soft state. With NKBB, we find that the disk is truncated in the soft state and that the coronal geometry does not change significantly during the whole outburst. However, the model with NKBB can predict an untruncated disk around a fast-rotating black hole if we assume that the disk inclination angle is around $30^\circ$ (instead of $\sim 60^\circ$, which is the inclination angle of the jet and is usually adopted as the disk inclination angle in the literature) and we employ a high-density reflection model. In such a case, we measure a high value of the black hole spin parameter with observations in the soft state, in agreement with the high spin value found from the analysis of the reflection features and in disagreement with the low spin value found by previous continuum-fitting method measurements with the disk inclination angle set to the value of the jet inclination angle. △ Less

Submitted 1 July, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

Comments: 14 pages, 8 figures. v2: refereed version

Journal ref: Astrophys.J. 969: 61 (2024)

arXiv:2404.12139 [pdf, other]

Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models

Authors: Shouwei Ruan, Yinpeng Dong, Hanqing Liu, Yao Huang, Hang Su, Xingxing Wei

Abstract: Vision-Language Pre-training (VLP) models like CLIP have achieved remarkable success in computer vision and particularly demonstrated superior robustness to distribution shifts of 2D images. However, their robustness under 3D viewpoint variations is still limited, which can hinder the development for real-world applications. This paper successfully addresses this concern while kee** VLPs' origin… ▽ More Vision-Language Pre-training (VLP) models like CLIP have achieved remarkable success in computer vision and particularly demonstrated superior robustness to distribution shifts of 2D images. However, their robustness under 3D viewpoint variations is still limited, which can hinder the development for real-world applications. This paper successfully addresses this concern while kee** VLPs' original performance by breaking through two primary obstacles: 1) the scarcity of training data and 2) the suboptimal fine-tuning paradigms. To combat data scarcity, we build the Multi-View Caption (MVCap) dataset -- a comprehensive collection of over four million multi-view image-text pairs across more than 100K objects, providing more potential for VLP models to develop generalizable viewpoint-invariant representations. To address the limitations of existing paradigms in performance trade-offs and training efficiency, we design a novel fine-tuning framework named Omniview-Tuning (OVT). Specifically, OVT introduces a Cross-Viewpoint Alignment objective through a minimax-like optimization strategy, which effectively aligns representations of identical objects from diverse viewpoints without causing overfitting. Additionally, OVT fine-tunes VLP models in a parameter-efficient manner, leading to minimal computational cost. Extensive experiments on various VLP models with different architectures validate that OVT significantly improves the models' resilience to viewpoint shifts and keeps the original performance, establishing a pioneering standard for boosting the viewpoint invariance of VLP models. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 20 pages

arXiv:2404.11890 [pdf, other]

FCNCP: A Coupled Nonnegative CANDECOMP/PARAFAC Decomposition Based on Federated Learning

Authors: Yukai Cai, Hang Liu, Xiulin Wang, Hong** Li, Ziyi Wang, Chuanshuai Yang, Fengyu Cong

Abstract: In the field of brain science, data sharing across servers is becoming increasingly challenging due to issues such as industry competition, privacy security, and administrative procedure policies and regulations. Therefore, there is an urgent need to develop new methods for data analysis and processing that enable scientific collaboration without data sharing. In view of this, this study proposes… ▽ More In the field of brain science, data sharing across servers is becoming increasingly challenging due to issues such as industry competition, privacy security, and administrative procedure policies and regulations. Therefore, there is an urgent need to develop new methods for data analysis and processing that enable scientific collaboration without data sharing. In view of this, this study proposes to study and develop a series of efficient non-negative coupled tensor decomposition algorithm frameworks based on federated learning called FCNCP for the EEG data arranged on different servers. It combining the good discriminative performance of tensor decomposition in high-dimensional data representation and decomposition, the advantages of coupled tensor decomposition in cross-sample tensor data analysis, and the features of federated learning for joint modelling in distributed servers. The algorithm utilises federation learning to establish coupling constraints for data distributed across different servers. In the experiments, firstly, simulation experiments are carried out using simulated data, and stable and consistent decomposition results are obtained, which verify the effectiveness of the proposed algorithms in this study. Then the FCNCP algorithm was utilised to decompose the fifth-order event-related potential (ERP) tensor data collected by applying proprioceptive stimuli on the left and right hands. It was found that contralateral stimulation induced more symmetrical components in the activation areas of the left and right hemispheres. The conclusions drawn are consistent with the interpretations of related studies in cognitive neuroscience, demonstrating that the method can efficiently process higher-order EEG data and that some key hidden information can be preserved. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.11884 [pdf, other]

Seeing Motion at Nighttime with an Event Camera

Authors: Haoyue Liu, Shihan Peng, Lin Zhu, Yi Chang, Hanyu Zhou, Luxin Yan

Abstract: We focus on a very challenging task: imaging at nighttime dynamic scenes. Most previous methods rely on the low-light enhancement of a conventional RGB camera. However, they would inevitably face a dilemma between the long exposure time of nighttime and the motion blur of dynamic scenes. Event cameras react to dynamic changes with higher temporal resolution (microsecond) and higher dynamic range (… ▽ More We focus on a very challenging task: imaging at nighttime dynamic scenes. Most previous methods rely on the low-light enhancement of a conventional RGB camera. However, they would inevitably face a dilemma between the long exposure time of nighttime and the motion blur of dynamic scenes. Event cameras react to dynamic changes with higher temporal resolution (microsecond) and higher dynamic range (120dB), offering an alternative solution. In this work, we present a novel nighttime dynamic imaging method with an event camera. Specifically, we discover that the event at nighttime exhibits temporal trailing characteristics and spatial non-stationary distribution. Consequently, we propose a nighttime event reconstruction network (NER-Net) which mainly includes a learnable event timestamps calibration module (LETC) to align the temporal trailing events and a non-uniform illumination aware module (NIAM) to stabilize the spatiotemporal distribution of events. Moreover, we construct a paired real low-light event dataset (RLED) through a co-axial imaging system, including 64,200 spatially and temporally aligned image GTs and low-light events. Extensive experiments demonstrate that the proposed method outperforms state-of-the-art methods in terms of visual quality and generalization ability on real-world nighttime datasets. The project are available at: https://github.com/Liu-haoyue/NER-Net. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR 2024

arXiv:2404.11631 [pdf, other]

A Preliminary Study on Accelerating Simulation Optimization with GPU Implementation

Authors: **ghai He, Haoyu Liu, Yuhang Wu, Zeyu Zheng, Tingyu Zhu

Abstract: We provide a preliminary study on utilizing GPU (Graphics Processing Unit) to accelerate computation for three simulation optimization tasks with either first-order or second-order algorithms. Compared to the implementation using only CPU (Central Processing Unit), the GPU implementation benefits from computational advantages of parallel processing for large-scale matrices and vectors operations.… ▽ More We provide a preliminary study on utilizing GPU (Graphics Processing Unit) to accelerate computation for three simulation optimization tasks with either first-order or second-order algorithms. Compared to the implementation using only CPU (Central Processing Unit), the GPU implementation benefits from computational advantages of parallel processing for large-scale matrices and vectors operations. Numerical experiments demonstrate computational advantages of utilizing GPU implementation in simulation optimization problems, and show that such advantage comparatively further increase as the problem scale increases. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.11401 [pdf, other]

RainyScape: Unsupervised Rainy Scene Reconstruction using Decoupled Neural Rendering

Authors: Xianqiang Lyu, Hui Liu, Junhui Hou

Abstract: We propose RainyScape, an unsupervised framework for reconstructing clean scenes from a collection of multi-view rainy images. RainyScape consists of two main modules: a neural rendering module and a rain-prediction module that incorporates a predictor network and a learnable latent embedding that captures the rain characteristics of the scene. Specifically, based on the spectral bias property of… ▽ More We propose RainyScape, an unsupervised framework for reconstructing clean scenes from a collection of multi-view rainy images. RainyScape consists of two main modules: a neural rendering module and a rain-prediction module that incorporates a predictor network and a learnable latent embedding that captures the rain characteristics of the scene. Specifically, based on the spectral bias property of neural networks, we first optimize the neural rendering pipeline to obtain a low-frequency scene representation. Subsequently, we jointly optimize the two modules, driven by the proposed adaptive direction-sensitive gradient-based reconstruction loss, which encourages the network to distinguish between scene details and rain streaks, facilitating the propagation of gradients to the relevant components. Extensive experiments on both the classic neural radiance field and the recently proposed 3D Gaussian splatting demonstrate the superiority of our method in effectively eliminating rain streaks and rendering clean images, achieving state-of-the-art performance. The constructed high-quality dataset and source code will be publicly available. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.11036 [pdf, other]

Cross-Platform Hate Speech Detection with Weakly Supervised Causal Disentanglement

Authors: Paras Sheth, Tharindu Kumarage, Raha Moraffah, Aman Chadha, Huan Liu

Abstract: Content moderation faces a challenging task as social media's ability to spread hate speech contrasts with its role in promoting global connectivity. With rapidly evolving slang and hate speech, the adaptability of conventional deep learning to the fluid landscape of online dialogue remains limited. In response, causality inspired disentanglement has shown promise by segregating platform specific… ▽ More Content moderation faces a challenging task as social media's ability to spread hate speech contrasts with its role in promoting global connectivity. With rapidly evolving slang and hate speech, the adaptability of conventional deep learning to the fluid landscape of online dialogue remains limited. In response, causality inspired disentanglement has shown promise by segregating platform specific peculiarities from universal hate indicators. However, its dependency on available ground truth target labels for discerning these nuances faces practical hurdles with the incessant evolution of platforms and the mutable nature of hate speech. Using confidence based reweighting and contrastive regularization, this study presents HATE WATCH, a novel framework of weakly supervised causal disentanglement that circumvents the need for explicit target labeling and effectively disentangles input features into invariant representations of hate. Empirical validation across platforms two with target labels and two without positions HATE WATCH as a novel method in cross platform hate speech detection with superior performance. HATE WATCH advances scalable content moderation techniques towards develo** safer online communities. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Showing 301–350 of 7,384 results for author: Liu, H