Search | arXiv e-print repository

Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives

Authors: Sheng Luo, Wei Chen, Wanxin Tian, Rui Liu, Luanxuan Hou, Xiubao Zhang, Haifeng Shen, Ruiqi Wu, Shuyi Geng, Yi Zhou, Ling Shao, Yi Yang, Bojun Gao, Qun Li, Guobin Wu

Abstract: Foundation models have indeed made a profound impact on various fields, emerging as pivotal components that significantly shape the capabilities of intelligent systems. In the context of intelligent vehicles, leveraging the power of foundation models has proven to be transformative, offering notable advancements in visual understanding. Equipped with multi-modal and multi-task learning capabilitie… ▽ More Foundation models have indeed made a profound impact on various fields, emerging as pivotal components that significantly shape the capabilities of intelligent systems. In the context of intelligent vehicles, leveraging the power of foundation models has proven to be transformative, offering notable advancements in visual understanding. Equipped with multi-modal and multi-task learning capabilities, multi-modal multi-task visual understanding foundation models (MM-VUFMs) effectively process and fuse data from diverse modalities and simultaneously handle various driving-related tasks with powerful adaptability, contributing to a more holistic understanding of the surrounding scene. In this survey, we present a systematic analysis of MM-VUFMs specifically designed for road scenes. Our objective is not only to provide a comprehensive overview of common practices, referring to task-specific models, unified multi-modal models, unified multi-task models, and foundation model prompting techniques, but also to highlight their advanced capabilities in diverse learning paradigms. These paradigms include open-world understanding, efficient transfer for road scenes, continual learning, interactive and generative capability. Moreover, we provide insights into key challenges and future trends, such as closed-loop driving systems, interpretability, embodied driving agents, and world models. To facilitate researchers in staying abreast of the latest developments in MM-VUFMs for road scenes, we have established a continuously updated repository at https://github.com/rolsheng/MM-VUFM4DS △ Less

Submitted 26 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: Accepted to IEEE Transactions on Intelligent Vehicles(T-IV). 24 pages, 9 figures, 1 table

arXiv:2402.02607 [pdf, other]

Flexible Non-interactive Short-term Implicit Certificate Generation for VANETs

Authors: Rui Liu, Yun Lu, Jian** Pan

Abstract: A leading industry standard for secure and trusted communication in vehicular ad-hoc networks (VANETs) is the Security Credential Management System (SCMS). It uses anonymous certificates, functioning as pseudonyms, to preserve the privacy of vehicles. With the rapid development of advanced applications in VANETs, such as crowdsensing and federated learning, vehicles need to communicate with each o… ▽ More A leading industry standard for secure and trusted communication in vehicular ad-hoc networks (VANETs) is the Security Credential Management System (SCMS). It uses anonymous certificates, functioning as pseudonyms, to preserve the privacy of vehicles. With the rapid development of advanced applications in VANETs, such as crowdsensing and federated learning, vehicles need to communicate with each other or infrastructures more frequently, leading to a higher demand for pseudonyms. However, the current approach of certificate provisioning in SCMS is not able to fully support pseudonyms, due to storage limitation, cost of connectivity establishment, and communication overhead of certificate downloading. To tackle this challenge, we propose a non-interactive approach for SCMS, allowing vehicles themselves to generate short-term key pairs and anonymous implicit certificates. Our evaluation and comparison with previous work show that our solution not only effectively reduces the communication cost, but also grants vehicles greater flexibility in certificate generation and use. On the technical side, to the best of our knowledge, this is the first work which (1) applies sanitizable signature for non-interactive anonymous certificate generation, and (2) is specifically designed for SCMS, which opens up possibilities for extensions and applications in industry. △ Less

Submitted 4 February, 2024; originally announced February 2024.

arXiv:2402.00996 [pdf, other]

mmID: High-Resolution mmWave Imaging for Human Identification

Authors: Sakila S. Jayaweera, Sai Deepika Regani, Yuqian Hu, Beibei Wang, K. J. Ray Liu

Abstract: Achieving accurate human identification through RF imaging has been a persistent challenge, primarily attributed to the limited aperture size and its consequent impact on imaging resolution. The existing imaging solution enables tasks such as pose estimation, activity recognition, and human tracking based on deep neural networks by estimating skeleton joints. In contrast to estimating joints, this… ▽ More Achieving accurate human identification through RF imaging has been a persistent challenge, primarily attributed to the limited aperture size and its consequent impact on imaging resolution. The existing imaging solution enables tasks such as pose estimation, activity recognition, and human tracking based on deep neural networks by estimating skeleton joints. In contrast to estimating joints, this paper proposes to improve imaging resolution by estimating the human figure as a whole using conditional generative adversarial networks (cGAN). In order to reduce training complexity, we use an estimated spatial spectrum using the MUltiple SIgnal Classification (MUSIC) algorithm as input to the cGAN. Our system generates environmentally independent, high-resolution images that can extract unique physical features useful for human identification. We use a simple convolution layers-based classification network to obtain the final identification result. From the experimental results, we show that resolution of the image produced by our trained generator is high enough to enable human identification. Our finding indicates high-resolution accuracy with 5% mean silhouette difference to the Kinect device. Extensive experiments in different environments on multiple testers demonstrate that our system can achieve 93% overall test accuracy in unseen environments for static human target identification. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: This paper was published in the IEEE 9th World Forum on Internet of Things

arXiv:2401.17982 [pdf, other]

Diagnosing the particle transport mechanism in the pulsar halo via X-ray observations

Authors: Qi-Zuo Wu, Chao-Ming Li, Xuan-Han Liang, Chong Ge, Ruo-Yu Liu

Abstract: Pulsar halos (also termed 'TeV halo') are a new class of $γ$-ray sources in Galaxy, which manifest as extended $γ$-ray emission around middle-age pulsars, as discovered around the Geminga pulsar, the Monogem pulsar and PSR~J0622+3749 by HAWC and LHAASO. A consensus has been reached that the TeV emission comes from the inverse Compton scattering of esca** electrons/positrons from the PWN off soft… ▽ More Pulsar halos (also termed 'TeV halo') are a new class of $γ$-ray sources in Galaxy, which manifest as extended $γ$-ray emission around middle-age pulsars, as discovered around the Geminga pulsar, the Monogem pulsar and PSR~J0622+3749 by HAWC and LHAASO. A consensus has been reached that the TeV emission comes from the inverse Compton scattering of esca** electrons/positrons from the PWN off soft background radiation field, while the particle transport mechanism in the halo is still in dispute. Currently, there are mainly three interpretations, namely, the isotropic, suppressed diffusion model; the isotropic, unsuppressed diffusion model with considering ballistic propagation of newly injected particles; the anisotropic diffusion model. While the predicted gamma-ray surface brightness profiles by all three models can be more or less consistent with the observation, the implication of the three models for cosmic-ray transport mechanisms and the properties of interstellar magnetic field are quite different. In this study, we calculate the anticipated X-ray emission of pulsar halos under the three models. We show that the synchrotron radiation of these esca** electrons can produce a corresponding X-ray halo around the pulsar, and the expected surface brightness profiles are distinct in three models. We suggest that sensitive X-ray detectors of a large field of view (such as eROSITA and Einstein Probe) with a reasonably long exposure time are crucial to understand the formation mechanism of pulsar halos and serve as a probe to the properties of the interstellar turbulence. △ Less

Submitted 31 January, 2024; originally announced January 2024.

Comments: 7 figures

arXiv:2401.17027 [pdf, other]

Heterogeneous treatment effect estimation with subpopulation identification for personalized medicine in opioid use disorder

Authors: Seungyeon Lee, Ruoqi Liu, Wenyu Song, ** Zhang

Abstract: Deep learning models have demonstrated promising results in estimating treatment effects (TEE). However, most of them overlook the variations in treatment outcomes among subgroups with distinct characteristics. This limitation hinders their ability to provide accurate estimations and treatment recommendations for specific subgroups. In this study, we introduce a novel neural network-based framewor… ▽ More Deep learning models have demonstrated promising results in estimating treatment effects (TEE). However, most of them overlook the variations in treatment outcomes among subgroups with distinct characteristics. This limitation hinders their ability to provide accurate estimations and treatment recommendations for specific subgroups. In this study, we introduce a novel neural network-based framework, named SubgroupTE, which incorporates subgroup identification and treatment effect estimation. SubgroupTE identifies diverse subgroups and simultaneously estimates treatment effects for each subgroup, improving the treatment effect estimation by considering the heterogeneity of treatment responses. Comparative experiments on synthetic data show that SubgroupTE outperforms existing models in treatment effect estimation. Furthermore, experiments on a real-world dataset related to opioid use disorder (OUD) demonstrate the potential of our approach to enhance personalized treatment recommendations for OUD patients. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: 2023 IEEE International Conference on Data Mining (ICDM)

arXiv:2401.16923 [pdf, other]

Fourier Prompt Tuning for Modality-Incomplete Scene Segmentation

Authors: Rui** Liu, Jiaming Zhang, Kunyu Peng, Yufan Chen, Ke Cao, Junwei Zheng, M. Saquib Sarfraz, Kailun Yang, Rainer Stiefelhagen

Abstract: Integrating information from multiple modalities enhances the robustness of scene perception systems in autonomous vehicles, providing a more comprehensive and reliable sensory framework. However, the modality incompleteness in multi-modal segmentation remains under-explored. In this work, we establish a task called Modality-Incomplete Scene Segmentation (MISS), which encompasses both system-level… ▽ More Integrating information from multiple modalities enhances the robustness of scene perception systems in autonomous vehicles, providing a more comprehensive and reliable sensory framework. However, the modality incompleteness in multi-modal segmentation remains under-explored. In this work, we establish a task called Modality-Incomplete Scene Segmentation (MISS), which encompasses both system-level modality absence and sensor-level modality errors. To avoid the predominant modality reliance in multi-modal fusion, we introduce a Missing-aware Modal Switch (MMS) strategy to proactively manage missing modalities during training. Utilizing bit-level batch-wise sampling enhances the model's performance in both complete and incomplete testing scenarios. Furthermore, we introduce the Fourier Prompt Tuning (FPT) method to incorporate representative spectral information into a limited number of learnable prompts that maintain robustness against all MISS scenarios. Akin to fine-tuning effects but with fewer tunable parameters (1.1%). Extensive experiments prove the efficacy of our proposed approach, showcasing an improvement of 5.84% mIoU over the prior state-of-the-art parameter-efficient methods in modality missing. The source code is publicly available at https://github.com/Rui**L/MISS. △ Less

Submitted 10 April, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: Accepted to IEEE IV 2024. The source code is publicly available at https://github.com/Rui**L/MISS

arXiv:2401.16385 [pdf, ps, other]

Dipole superfluid hydrodynamics II

Authors: Akash Jain, Kristan Jensen, Ruochuan Liu, Eric Mefford

Abstract: We present a dissipative hydrodynamic theory of "s-wave dipole superfluids" that arise in phases of translation-invariant and dipole-symmetric models in which the U(1) symmetry is spontaneously broken. The hydrodynamic description is subtle on account of an analogue of dangerously irrelevant operators, which requires us to formalize an entirely new derivative counting scheme suitable for these flu… ▽ More We present a dissipative hydrodynamic theory of "s-wave dipole superfluids" that arise in phases of translation-invariant and dipole-symmetric models in which the U(1) symmetry is spontaneously broken. The hydrodynamic description is subtle on account of an analogue of dangerously irrelevant operators, which requires us to formalize an entirely new derivative counting scheme suitable for these fluids. We use our hydrodynamic model to investigate the linearized response of such a fluid, characterized by sound modes $ω\sim \pm k - ik^2$, shear modes $ω\sim-ik^2$, and magnon-like propagating modes $ω\sim \pm k^2 - ik^4$ that are the dipole-invariant version of superfluid "second sound" modes. We find that these fluids can also admit equilibrium states with "dipole superflow" that resemble a polarized medium. Finally, we couple our theory to slowly varying background fields, which allows us to compute response functions of hydrodynamic operators and Kubo formulas for hydrodynamic transport coefficients. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: 54 pages; we have included a Mathematica notebook to the arXiv submission which computes dispersion relations and response functions

arXiv:2401.15287 [pdf, other]

Applications of Tao General Difference in Discrete Domain

Authors: Linmi Tao, Ruiyang Liu, Donglai Tao, Wu Xia, Feilong Ma, Yu Cheng, **gmao Cui

Abstract: Numerical difference computation is one of the cores and indispensable in the modern digital era. Tao general difference (TGD) is a novel theory and approach to difference computation for discrete sequences and arrays in multidimensional space. Built on the solid theoretical foundation of the general difference in a finite interval, the TGD operators demonstrate exceptional signal processing capab… ▽ More Numerical difference computation is one of the cores and indispensable in the modern digital era. Tao general difference (TGD) is a novel theory and approach to difference computation for discrete sequences and arrays in multidimensional space. Built on the solid theoretical foundation of the general difference in a finite interval, the TGD operators demonstrate exceptional signal processing capabilities in real-world applications. A novel smoothness property of a sequence is defined on the first- and second TGD. This property is used to denoise one-dimensional signals, where the noise is the non-smooth points in the sequence. Meanwhile, the center of the gradient in a finite interval can be accurately location via TGD calculation. This solves a traditional challenge in computer vision, which is the precise localization of image edges with noise robustness. Furthermore, the power of TGD operators extends to spatio-temporal edge detection in three-dimensional arrays, enabling the identification of kinetic edges in video data. These diverse applications highlight the properties of TGD in discrete domain and the significant promise of TGD for the computation across signal processing, image analysis, and video analytic. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: This paper is the application part of the paper "Tao General Differential and Difference: Theory and Application". The theory part of the paper is renamed as "A Theory of General Difference in Continuous and Discrete Domain", which is Arxived in arXiv:2305.08098v2

arXiv:2401.14398 [pdf, other]

pix2gestalt: Amodal Segmentation by Synthesizing Wholes

Authors: Ege Ozguroglu, Ruoshi Liu, Dídac Surís, Dian Chen, Achal Dave, Pavel Tokmakov, Carl Vondrick

Abstract: We introduce pix2gestalt, a framework for zero-shot amodal segmentation, which learns to estimate the shape and appearance of whole objects that are only partially visible behind occlusions. By capitalizing on large-scale diffusion models and transferring their representations to this task, we learn a conditional diffusion model for reconstructing whole objects in challenging zero-shot cases, incl… ▽ More We introduce pix2gestalt, a framework for zero-shot amodal segmentation, which learns to estimate the shape and appearance of whole objects that are only partially visible behind occlusions. By capitalizing on large-scale diffusion models and transferring their representations to this task, we learn a conditional diffusion model for reconstructing whole objects in challenging zero-shot cases, including examples that break natural and physical priors, such as art. As training data, we use a synthetically curated dataset containing occluded objects paired with their whole counterparts. Experiments show that our approach outperforms supervised baselines on established benchmarks. Our model can furthermore be used to significantly improve the performance of existing object recognition and 3D reconstruction methods in the presence of occlusions. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: Website: https://gestalt.cs.columbia.edu/

arXiv:2401.12544 [pdf]

Correlation between magnetic domain structures and quantum anomalous Hall effect in epitaxial MnBi2Te4 thin films

Authors: Yang Shi, Yunhe Bai, Yuanzhao Li, Yang Feng, Qiang Li, Huanyu Zhang, Yang Chen, Yitian Tong, Jianli Luan, Ruixuan Liu, Pengfei Ji, Zongwei Gao, Hangwen Guo, **song Zhang, Yayu Wang, Xiao Feng, Ke He, Xiaodong Zhou, Jian Shen

Abstract: We use magnetic force microscopy (MFM) to study spatial uniformity of magnetization of epitaxially grown MnBi2Te4 thin films. Compared to films which exhibit no quantum anomalous Hall effect (QAH), films with QAH are observed to have more spatial uniformity of magnetization with larger domain size. The domain evolution upon magnetic field swee** indicates that the magnetic domains or the spatial… ▽ More We use magnetic force microscopy (MFM) to study spatial uniformity of magnetization of epitaxially grown MnBi2Te4 thin films. Compared to films which exhibit no quantum anomalous Hall effect (QAH), films with QAH are observed to have more spatial uniformity of magnetization with larger domain size. The domain evolution upon magnetic field swee** indicates that the magnetic domains or the spatial nonuniformity of magnetization originates from the strong pinning of the inherent sample inhomogeneity. A direct correlation between the Hall resistivity and the domain size has been established by analyzing a series of thin films with and without QAH. Our observation shows that one has to suppress the spatial nonuniformity of magnetization to allow the Hall resistivity to be quantized. The fact that a sizable longitudinal resistivity remains even for the QAH sample suggests a quantized Hall insulator scenario. Our work provides important insights to the understanding of the quantization mechanism and the dissipation of the QAH state in MnBi2Te4 system. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: 14 pages, 4 figures

arXiv:2401.12369 [pdf, other]

SubgroupTE: Advancing Treatment Effect Estimation with Subgroup Identification

Authors: Seungyeon Lee, Ruoqi Liu, Wenyu Song, Lang Li, ** Zhang

Abstract: Precise estimation of treatment effects is crucial for evaluating intervention effectiveness. While deep learning models have exhibited promising performance in learning counterfactual representations for treatment effect estimation (TEE), a major limitation in most of these models is that they treat the entire population as a homogeneous group, overlooking the diversity of treatment effects acros… ▽ More Precise estimation of treatment effects is crucial for evaluating intervention effectiveness. While deep learning models have exhibited promising performance in learning counterfactual representations for treatment effect estimation (TEE), a major limitation in most of these models is that they treat the entire population as a homogeneous group, overlooking the diversity of treatment effects across potential subgroups that have varying treatment effects. This limitation restricts the ability to precisely estimate treatment effects and provide subgroup-specific treatment recommendations. In this paper, we propose a novel treatment effect estimation model, named SubgroupTE, which incorporates subgroup identification in TEE. SubgroupTE identifies heterogeneous subgroups with different treatment responses and more precisely estimates treatment effects by considering subgroup-specific causal effects. In addition, SubgroupTE iteratively optimizes subgrou** and treatment effect estimation networks to enhance both estimation and subgroup identification. Comprehensive experiments on the synthetic and semi-synthetic datasets exhibit the outstanding performance of SubgroupTE compared with the state-of-the-art models on treatment effect estimation. Additionally, a real-world study demonstrates the capabilities of SubgroupTE in enhancing personalized treatment recommendations for patients with opioid use disorder (OUD) by advancing treatment effect estimation with subgroup identification. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.12230 [pdf, other]

Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native

Authors: Yao Lu, Song Bian, Lequn Chen, Yongjun He, Yulong Hui, Matthew Lentz, Beibin Li, Fei Liu, Jialin Li, Qi Liu, Rui Liu, Xiaoxuan Liu, Lin Ma, Kexin Rong, Jianguo Wang, Yingjun Wu, Yongji Wu, Huanchen Zhang, Minjia Zhang, Qizhen Zhang, Tianyi Zhou, Danyang Zhuo

Abstract: In this paper, we investigate the intersection of large generative AI models and cloud-native computing architectures. Recent large models such as ChatGPT, while revolutionary in their capabilities, face challenges like escalating costs and demand for high-end GPUs. Drawing analogies between large-model-as-a-service (LMaaS) and cloud database-as-a-service (DBaaS), we describe an AI-native computin… ▽ More In this paper, we investigate the intersection of large generative AI models and cloud-native computing architectures. Recent large models such as ChatGPT, while revolutionary in their capabilities, face challenges like escalating costs and demand for high-end GPUs. Drawing analogies between large-model-as-a-service (LMaaS) and cloud database-as-a-service (DBaaS), we describe an AI-native computing paradigm that harnesses the power of both cloud-native technologies (e.g., multi-tenancy and serverless computing) and advanced machine learning runtime (e.g., batched LoRA inference). These joint efforts aim to optimize costs-of-goods-sold (COGS) and improve resource accessibility. The journey of merging these two domains is just at the beginning and we hope to stimulate future research and development in this area. △ Less

Submitted 17 January, 2024; originally announced January 2024.

arXiv:2401.12147 [pdf]

An Efficient Finite Difference-based Implicit Solver for Phase-Field Equations with Spatially and Temporally Varying Parameters

Authors: Zirui Mao, G. R. Liu, Michael J. Demkowicz

Abstract: The phase field method is an effective tool for modeling microstructure evolution in materials. Many efficient implicit numerical solvers have been proposed for phase field simulations under uniform and time-invariant model parameters. We use Eyre's theorem to develop an unconditionally stable implicit solver for spatially non-uniform and time-varying model parameters. The accuracy, unconditional… ▽ More The phase field method is an effective tool for modeling microstructure evolution in materials. Many efficient implicit numerical solvers have been proposed for phase field simulations under uniform and time-invariant model parameters. We use Eyre's theorem to develop an unconditionally stable implicit solver for spatially non-uniform and time-varying model parameters. The accuracy, unconditional stability, and efficiency of the solver is validated against benchmarking examples. In its current form, the solver requires a uniform mesh and may only be applied to problems with periodic, Neumann, or mixed periodic and Neumann boundary conditions. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.11450 [pdf]

Reentrant quantum anomalous Hall effect in molecular beam epitaxy-grown MnBi2Te4 thin films

Authors: Yuanzhao Li, Yunhe Bai, Yang Feng, Jianli Luan, Zongwei Gao, Yang Chen, Yitian Tong, Ruixuan Liu, Su Kong Chong, Kang L. Wang, Xiaodong Zhou, Jian Shen, **song Zhang, Yayu Wang, Chui-Zhen Chen, XinCheng Xie, Xiao Feng, Ke He, Qi-Kun Xue

Abstract: In this study, we investigate intrinsic magnetic topological insulator MnBi2Te4 thin films grown by molecular beam epitaxy. We observe a reentrant quantum anomalous Hall effect when the Fermi energy enters the valance band and magnetic field equals zero, indicating the emergence of the Chern Anderson insulator state. The discovery opens a new avenue for realizing the QAH effect and underscores the… ▽ More In this study, we investigate intrinsic magnetic topological insulator MnBi2Te4 thin films grown by molecular beam epitaxy. We observe a reentrant quantum anomalous Hall effect when the Fermi energy enters the valance band and magnetic field equals zero, indicating the emergence of the Chern Anderson insulator state. The discovery opens a new avenue for realizing the QAH effect and underscores the fundamental role of both Berry curvature and Anderson localization. △ Less

Submitted 21 January, 2024; originally announced January 2024.

Comments: 15 pages, 4 figures

arXiv:2401.11318 [pdf, ps, other]

Global well-posedness and enhanced dissipation for the 2D stochastic Nernst-Planck-Navier-Stokes equations with transport noise

Authors: Quyuan Lin, Rongchang Liu, Weinan Wang

Abstract: In this paper, we consider the 2D stochastic Nernst-Planck-Navier-Stokes equations with transport noise. By assuming the ionic species have the same diffusivity and opposite valences, we prove the global well-posedness of the system. Furthermore, we illustrate the enhanced dissipation phenomenon in the system with specific transportation noise by establishing that it enables an arbitrarily large e… ▽ More In this paper, we consider the 2D stochastic Nernst-Planck-Navier-Stokes equations with transport noise. By assuming the ionic species have the same diffusivity and opposite valences, we prove the global well-posedness of the system. Furthermore, we illustrate the enhanced dissipation phenomenon in the system with specific transportation noise by establishing that it enables an arbitrarily large exponential convergence rate of the solutions. △ Less

Submitted 20 January, 2024; originally announced January 2024.

Comments: 29 pages

arXiv:2401.10560 [pdf, other]

360ORB-SLAM: A Visual SLAM System for Panoramic Images with Depth Completion Network

Authors: Yichen Chen, Yiqi Pan, Ruyu Liu, Haoyu Zhang, Guodao Zhang, Bo Sun, Jianhua Zhang

Abstract: To enhance the performance and effect of AR/VR applications and visual assistance and inspection systems, visual simultaneous localization and map** (vSLAM) is a fundamental task in computer vision and robotics. However, traditional vSLAM systems are limited by the camera's narrow field-of-view, resulting in challenges such as sparse feature distribution and lack of dense depth information. To o… ▽ More To enhance the performance and effect of AR/VR applications and visual assistance and inspection systems, visual simultaneous localization and map** (vSLAM) is a fundamental task in computer vision and robotics. However, traditional vSLAM systems are limited by the camera's narrow field-of-view, resulting in challenges such as sparse feature distribution and lack of dense depth information. To overcome these limitations, this paper proposes a 360ORB-SLAM system for panoramic images that combines with a depth completion network. The system extracts feature points from the panoramic image, utilizes a panoramic triangulation module to generate sparse depth information, and employs a depth completion network to obtain a dense panoramic depth map. Experimental results on our novel panoramic dataset constructed based on Carla demonstrate that the proposed method achieves superior scale accuracy compared to existing monocular SLAM methods and effectively addresses the challenges of feature association and scale ambiguity. The integration of the depth completion network enhances system stability and mitigates the impact of dynamic elements on SLAM performance. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: 6 pages, 9 figures

arXiv:2401.08956 [pdf, other]

doi 10.1109/TAES.2023.3260059

A Unified NOMA Framework in Beam-Hop** Satellite Communication Systems

Authors: Xuyang Zhang, Xinwei Yue, Tian Li, Zhihao Han, Yafei Wang, Yong Ding, Rongke Liu

Abstract: This paper investigates the application of a unified non-orthogonal multiple access framework in beam hop** (U-NOMA-BH) based satellite communication systems. More specifically, the proposed U-NOMA-BH framework can be applied to code-domain NOMA based BH (CD-NOMA-BH) and power-domain NOMA based BH (PD-NOMA-BH) systems. To satisfy dynamic-uneven traffic demands, we formulate the optimization prob… ▽ More This paper investigates the application of a unified non-orthogonal multiple access framework in beam hop** (U-NOMA-BH) based satellite communication systems. More specifically, the proposed U-NOMA-BH framework can be applied to code-domain NOMA based BH (CD-NOMA-BH) and power-domain NOMA based BH (PD-NOMA-BH) systems. To satisfy dynamic-uneven traffic demands, we formulate the optimization problem to minimize the square of discrete difference by jointly optimizing power allocation, carrier assignment and beam scheduling. The non-convexity of the objective function and the constraint condition is solved through Dinkelbach's transform and variable relaxation. As a further development, the closed-from and asymptotic expressions of outage probability are derived for CD/PD-NOMA-BH systems. Based on approximated results, the diversity orders of a pair of users are obtained in detail. In addition, the system throughput of U-NOMA-BH is discussed in delay-limited transmission mode. Numerical results verify that: i) The gap between traffic requests of CD/PD-NOMA-BH systems appears to be more closely compared with orthogonal multiple access based BH (OMA-BH); ii) The CD-NOMA-BH system is capable of providing the enhanced traffic request and capacity provision; and iii) The outage behaviors of CD/PD-NOMA-BH are better than that of OMA-BH. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Journal ref: IEEE Transactions on Aerospace and Electronic Systems, vol. 59, no. 5, pp. 5390-5404, Oct. 2023

arXiv:2401.06304 [pdf, other]

A Unified Model for Multi-epoch Neutrino Events and Broadband Spectral Energy Distribution of $\rm TXS~0506+056$

Authors: Zhen-Jie Wang, Ruo-Yu Liu, Ze-Rui Wang, Junfeng Wang

Abstract: The blazar $TXS~0506+056$ has been proposed as a high-energy neutrino emitter. However, it has been shown that the standard one-zone model cannot produce sufficiently high neutrino flux due to constraints from the X-ray data, implying more complex properties of the radiation zones in the blazar than that described by the standard one-zone model. In this work we investigate multi-epoch high-energy… ▽ More The blazar $TXS~0506+056$ has been proposed as a high-energy neutrino emitter. However, it has been shown that the standard one-zone model cannot produce sufficiently high neutrino flux due to constraints from the X-ray data, implying more complex properties of the radiation zones in the blazar than that described by the standard one-zone model. In this work we investigate multi-epoch high-energy muon neutrino events associated with the blazar $TXS~0506+056$ occured in 2014-2015, 2017-2018, 2021-2022 and 2022-2023, respectively. We applied the so-called ``stochastic dissipation model'' to account for the neutrino-blazar associations detected in the four epochs simultaenously. This model describes a scenario in which the emission of the blazar arise from the superimposition of two components: a persistent component related to the quasi-stable state of the blazar and a transient component responsible for the sudden enhancement of the blazar's flux, either in electromagnetic radiation or in neutrino emission. The latter component could form at a random distance along the jet by a strong energy dissipation event. Under such assumption, the multi-epoch broadband spectral energy distribution (SED) can be well explained and the expected number of high-energy neutrino events is statistically realistic. The expected number of neutrino events in half-year is around 8.2, 0.07, 0.73 and 0.41, corresponding to the epoch in 2014-2015, 2017-2018, 2021-2022 and 2022-2023, respectively. Hence, our model self-consistently explains the episodic neutrino emission from $TXS~0506+056$. △ Less

Submitted 17 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

Comments: 19 pages,12 figures,accepted for publication in ApJ

arXiv:2401.05663 [pdf, other]

End-to-End Learning for SLP-Based ISAC Systems

Authors: Yixian Zheng, Rang Liu, Ming Li, Qian Liu

Abstract: Integrated sensing and communication (ISAC) is an encouraging wireless technology which can simultaneously perform both radar and communication functionalities by sharing the same transmit waveform, spectral resource, and hardware platform. Recently emerged symbol-level precoding (SLP) technique exhibits advancement in ISAC systems by leveraging the waveform design degrees of freedom (DoFs) in bot… ▽ More Integrated sensing and communication (ISAC) is an encouraging wireless technology which can simultaneously perform both radar and communication functionalities by sharing the same transmit waveform, spectral resource, and hardware platform. Recently emerged symbol-level precoding (SLP) technique exhibits advancement in ISAC systems by leveraging the waveform design degrees of freedom (DoFs) in both temporal and spatial domains. However, traditional SLP-based ISAC systems are designed in a modular paradigm, which potentially limits the overall performance of communication and radar sensing. The high complexity of existing SLP design algorithms is another issue that hurdles the practical deployment. To break through the bottleneck of these approaches, in this paper we propose an end-to-end approach to jointly design the SLP-based dual-functional transmitter and receivers of communication and radar sensing. In particular, we aim to utilize deep learning-based methods to minimize the symbol error rate (SER) of communication users, maximize the detection probability, and minimize the root mean square error (RMSE) of the target angle estimation. Multi-layer perceptron (MLP) networks and a long short term memory (LSTM) network are respectively applied to the transmitter, communication users and radar receiver. Simulation results verify the feasibility of the proposed deep-learning-based end-to-end optimization for ISAC systems and reveal the effectiveness of the proposed neural networks for the end-to-end design. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: 6 pages, 7 figures, accepted by WCNC 2024

arXiv:2401.05246 [pdf, other]

Loophole-free test of macroscopic realism via high-order correlations of measurement

Authors: ** Wang, Chong Chen, Hao Liao, Vadim V. Vorobyov, Joerg Wrachtrup, and Ren-Bao Liu

Abstract: Test of {macroscopic realism} (MR) is key to understanding the foundation of quantum mechanics. Due to the existence of the {non-invasive measurability} loophole and other interpretation loopholes, however, such test remains an open question. Here we propose a general inequality based on high-order correlations of measurements for a loophole-free test of MR at the weak signal limit. Importantly, t… ▽ More Test of {macroscopic realism} (MR) is key to understanding the foundation of quantum mechanics. Due to the existence of the {non-invasive measurability} loophole and other interpretation loopholes, however, such test remains an open question. Here we propose a general inequality based on high-order correlations of measurements for a loophole-free test of MR at the weak signal limit. Importantly, the inequality is established using the statistics of \textit{raw data} recorded by classical devices, without requiring a specific model for the measurement process, so its violation would falsify MR without the interpretation loophole. The non-invasive measurability loophole is also closed, since the weak signal limit can be verified solely by measurement data (using the relative scaling behaviors of different orders of correlations). We demonstrate that the inequality can be broken by a quantum spin model. The inequality proposed here provides an unambiguous test of the MR principle and is also useful to characterizing {quantum coherence}. △ Less

Submitted 15 January, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

arXiv:2401.05092 [pdf, other]

Nematic quantum disordered state in FeSe

Authors: Ruixian Liu, Matthew B. Stone, Shang Gao, Mitsutaka Nakamura, Kazuya Kamazawa, Aleksandra Krajewska, Helen C. Walker, Peng Cheng, Rong Yu, Qimiao Si, Pengcheng Dai, Xingye Lu

Abstract: The unusual quantum-disordered magnetic ground state intertwined with superconductivity and electronic nematicity in FeSe has been a research focus in iron-based superconductors. However, the intrinsic spin excitations across the entire Brillouin zone in detwinned FeSe, which forms the basis for a microscopic understanding of the magnetic state and superconductivity, remain to be determined. Here,… ▽ More The unusual quantum-disordered magnetic ground state intertwined with superconductivity and electronic nematicity in FeSe has been a research focus in iron-based superconductors. However, the intrinsic spin excitations across the entire Brillouin zone in detwinned FeSe, which forms the basis for a microscopic understanding of the magnetic state and superconductivity, remain to be determined. Here, we use inelastic neutron scattering to map out the spin excitations of FeSe dewtinned with a uniaxial-strain device. We find that the stripe spin excitations (Q=(1, 0)/(0, 1)) exhibit the $C_2$ symmetry up to $E\approx120$ meV, while the N{é}el spin excitations (Q=(1, 1)) retain their $C_4$ symmetry in the nematic state. The temperature dependence of the difference in the spin excitations at Q=(1, 0) and (0, 1) for temperatures above the structural phase transition unambiguously shows the establishment of the nematic quantum disordered state. The similarity of the Néel excitations in FeSe and NaFeAs suggests that the Néel excitations are driven by the enhanced electron correlations in the $3d_{xy}$ orbital. By determining the key features of the stripe excitations and fitting their dispersions using a Heisenberg Hamiltonian with biquadratic interaction ($J_1$-$K$-$J_2$), we establish a spin-interaction phase diagram and conclude that FeSe is close to a crossover region between the antiferroquadrupolar, Néel, and stripe ordering regimes. The results provide an experimental basis for establishing a microscopic theoretical model to describe the origin and intertwining of the emergent orders in iron-based superconductors. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: 9 pages, 5 figures

arXiv:2401.04962 [pdf, other]

Large Model based Sequential Keyframe Extraction for Video Summarization

Authors: Kailong Tan, Yuxiang Zhou, Qianchen Xia, Rui Liu, Yong Chen

Abstract: Keyframe extraction aims to sum up a video's semantics with the minimum number of its frames. This paper puts forward a Large Model based Sequential Keyframe Extraction for video summarization, dubbed LMSKE, which contains three stages as below. First, we use the large model "TransNetV21" to cut the video into consecutive shots, and employ the large model "CLIP2" to generate each frame's visual fe… ▽ More Keyframe extraction aims to sum up a video's semantics with the minimum number of its frames. This paper puts forward a Large Model based Sequential Keyframe Extraction for video summarization, dubbed LMSKE, which contains three stages as below. First, we use the large model "TransNetV21" to cut the video into consecutive shots, and employ the large model "CLIP2" to generate each frame's visual feature within each shot; Second, we develop an adaptive clustering algorithm to yield candidate keyframes for each shot, with each candidate keyframe locating nearest to a cluster center; Third, we further reduce the above candidate keyframes via redundancy elimination within each shot, and finally concatenate them in accordance with the sequence of shots as the final sequential keyframes. To evaluate LMSKE, we curate a benchmark dataset and conduct rich experiments, whose results exhibit that LMSKE performs much better than quite a few SOTA competitors with average F1 of 0.5311, average fidelity of 0.8141, and average compression ratio of 0.9922. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: This paper has been accepted for CDIVP 2024

arXiv:2401.04450 [pdf, other]

Recanting twins: addressing intermediate confounding in mediation analysis

Authors: Tat-Thang Vo, Nicholas Williams, Richard Liu, Kara E. Rudolph, Ivan Dıaz

Abstract: The presence of intermediate confounders, also called recanting witnesses, is a fundamental challenge to the investigation of causal mechanisms in mediation analysis, preventing the identification of natural path-specific effects. Proposed alternative parameters (such as randomizational interventional effects) are problematic because they can be non-null even when there is no mediation for any ind… ▽ More The presence of intermediate confounders, also called recanting witnesses, is a fundamental challenge to the investigation of causal mechanisms in mediation analysis, preventing the identification of natural path-specific effects. Proposed alternative parameters (such as randomizational interventional effects) are problematic because they can be non-null even when there is no mediation for any individual in the population; i.e., they are not an average of underlying individual-level mechanisms. In this paper we develop a novel method for mediation analysis in settings with intermediate confounding, with guarantees that the causal parameters are summaries of the individual-level mechanisms of interest. The method is based on recently proposed ideas that view causality as the transfer of information, and thus replace recanting witnesses by draws from their conditional distribution, what we call "recanting twins". We show that, in the absence of intermediate confounding, recanting twin effects recover natural path-specific effects. We present the assumptions required for identification of recanting twins effects under a standard structural causal model, as well as the assumptions under which the recanting twin identification formulas can be interpreted in the context of the recently proposed separable effects models. To estimate recanting-twin effects, we develop efficient semi-parametric estimators that allow the use of data driven methods in the estimation of the nuisance parameters. We present numerical studies of the methods using synthetic data, as well as an application to evaluate the role of new-onset anxiety and depressive disorder in explaining the relationship between gabapentin/pregabalin prescription and incident opioid use disorder among Medicaid beneficiaries with chronic pain. △ Less

Submitted 9 January, 2024; originally announced January 2024.

arXiv:2401.04263 [pdf, ps, other]

Two-Step Targeted Minimum-Loss Based Estimation for Non-Negative Two-Part Outcomes

Authors: Nicholas T. Williams, Richard Liu, Katherine L. Hoffman, Sarah Forrest, Kara E. Rudolph, Iván Díaz

Abstract: Non-negative two-part outcomes are defined as outcomes with a density function that have a zero point mass but are otherwise positive. Examples, such as healthcare expenditure and hospital length of stay, are common in healthcare utilization research. Despite the practical relevance of non-negative two-part outcomes, very few methods exist to leverage knowledge of their semicontinuity to achieve i… ▽ More Non-negative two-part outcomes are defined as outcomes with a density function that have a zero point mass but are otherwise positive. Examples, such as healthcare expenditure and hospital length of stay, are common in healthcare utilization research. Despite the practical relevance of non-negative two-part outcomes, very few methods exist to leverage knowledge of their semicontinuity to achieve improved performance in estimating causal effects. In this paper, we develop a nonparametric two-step targeted minimum-loss based estimator (denoted as hTMLE) for non-negative two-part outcomes. We present methods for a general class of interventions referred to as modified treatment policies, which can accommodate continuous, categorical, and binary exposures. The two-step TMLE uses a targeted estimate of the intensity component of the outcome to produce a targeted estimate of the binary component of the outcome that may improve finite sample efficiency. We demonstrate the efficiency gains achieved by the two-step TMLE with simulated examples and then apply it to a cohort of Medicaid beneficiaries to estimate the effect of chronic pain and physical disability on days' supply of opioids. △ Less

Submitted 22 April, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

arXiv:2401.03873 [pdf, other]

A Practical Beamforming Design for Active RIS-assisted MU-MISO Systems

Authors: Yun Yang, Zhi** Lu, Ming Li, Rang Liu, Qian Liu

Abstract: Reconfigurable Intelligent Surfaces (RIS) have been proposed as a revolutionary technology with the potential to address several critical requirements of 6G communication systems. Despite its powerful ability for radio environment reconfiguration, the ``double fading'' effect constricts the practical system performance enhancements due to the significant path loss. A new active RIS architecture ha… ▽ More Reconfigurable Intelligent Surfaces (RIS) have been proposed as a revolutionary technology with the potential to address several critical requirements of 6G communication systems. Despite its powerful ability for radio environment reconfiguration, the ``double fading'' effect constricts the practical system performance enhancements due to the significant path loss. A new active RIS architecture has been recently proposed to overcome this challenge. However, existing active RIS studies rely on an ideal amplification model without considering the practical hardware limitation of amplifiers, which may cause performance degradation using such inaccurate active RIS modeling. Motivated by this fact, in this paper we first investigate the amplification principle of typical active RIS and propose a more accurate amplification model based on amplifier hardware characteristics. Then, based on the new amplification model, we propose a novel joint transmit beamforming and RIS reflection beamforming design considering the incident signal power on practical active RIS for multiuser multi-input single-output (MU-MISO) communication system. Fractional programming (FP), majorization minimization (MM) and block coordinate descent (BCD) methods are used to solve for the complex problem. Simulation results indicate the importance of the consideration of practical amplifier hardware characteristics in the joint beamforming designs and demonstrate the effectiveness of the proposed algorithm compared to other benchmarks. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: 6 pages, 5 figures, accepted by WCNC2024

arXiv:2401.03764 [pdf, other]

3D-SSGAN: Lifting 2D Semantics for 3D-Aware Compositional Portrait Synthesis

Authors: Ruiqi Liu, Peng Zheng, Ye Wang, Rui Ma

Abstract: Existing 3D-aware portrait synthesis methods can generate impressive high-quality images while preserving strong 3D consistency. However, most of them cannot support the fine-grained part-level control over synthesized images. Conversely, some GAN-based 2D portrait synthesis methods can achieve clear disentanglement of facial regions, but they cannot preserve view consistency due to a lack of 3D m… ▽ More Existing 3D-aware portrait synthesis methods can generate impressive high-quality images while preserving strong 3D consistency. However, most of them cannot support the fine-grained part-level control over synthesized images. Conversely, some GAN-based 2D portrait synthesis methods can achieve clear disentanglement of facial regions, but they cannot preserve view consistency due to a lack of 3D modeling abilities. To address these issues, we propose 3D-SSGAN, a novel framework for 3D-aware compositional portrait image synthesis. First, a simple yet effective depth-guided 2D-to-3D lifting module maps the generated 2D part features and semantics to 3D. Then, a volume renderer with a novel 3D-aware semantic mask renderer is utilized to produce the composed face features and corresponding masks. The whole framework is trained end-to-end by discriminating between real and synthesized 2D images and their semantic masks. Quantitative and qualitative evaluations demonstrate the superiority of 3D-SSGAN in controllable part-level synthesis while preserving 3D view consistency. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2401.03636 [pdf, other]

A Perturbed Value-Function-Based Interior-Point Method for Perturbed Pessimistic Bilevel Problems

Authors: Haimei Huo, Risheng Liu, Zhixun Su

Abstract: Bilevel optimizaiton serves as a powerful tool for many machine learning applications. Perturbed pessimistic bilevel problem PBP$ε$, with $ε$ being an arbitrary positive number, is a variant of the bilevel problem to deal with the case where there are multiple solutions in the lower level problem. However, the provably convergent algorithms for PBP$ε$ with a nonlinear lower level problem are lacki… ▽ More Bilevel optimizaiton serves as a powerful tool for many machine learning applications. Perturbed pessimistic bilevel problem PBP$ε$, with $ε$ being an arbitrary positive number, is a variant of the bilevel problem to deal with the case where there are multiple solutions in the lower level problem. However, the provably convergent algorithms for PBP$ε$ with a nonlinear lower level problem are lacking. To fill the gap, we consider in the paper the problem PBP$ε$ with a nonlinear lower level problem. By introducing a log-barrier function to replace the inequality constraint associated with the value function of the lower level problem, and approximating this value function, an algorithm named Perturbed Value-Function-based Interior-point Method(PVFIM) is proposed. We present a stationary condition for PBP$ε$, which has not been given before, and we show that PVFIM can converge to a stationary point of PBP$ε$. Finally, experiments are presented to verify the theoretical results and to show the application of the algorithm to GAN. △ Less

Submitted 7 January, 2024; originally announced January 2024.

arXiv:2401.02654 [pdf, ps, other]

doi 10.1093/mnras/stae058

PeVatron Candidate SNR G106.3+2.7 in a Low-density Cavity: a Multiwavelength Test

Authors: Yiwei Bao, Ruo-Yu Liu, Chong Ge, Yang Chen

Abstract: In this paper, we constrain the density of the interstellar medium (ISM) around the hadronic PeVatron candidate, supernova remnant (SNR) G106.3+2.7, based on X-ray and $γ$-ray observations. The purpose of this investigation is to understand the influence of the gaseous environment on this SNR as a proton PeVatron candidate. By modelling the self-regulated propagation of the CRs injected from the S… ▽ More In this paper, we constrain the density of the interstellar medium (ISM) around the hadronic PeVatron candidate, supernova remnant (SNR) G106.3+2.7, based on X-ray and $γ$-ray observations. The purpose of this investigation is to understand the influence of the gaseous environment on this SNR as a proton PeVatron candidate. By modelling the self-regulated propagation of the CRs injected from the SNR, we calculate the $γ$-ray emission of CRs via the hadronuclear interactions with the molecular cloud and the ISM, and use the measured $γ$-ray flux to constrain the ISM density around the SNR. Our results support the picture that the SNR is expanding into a low-density ($n<0.05 cm^{-3}$) cavity, enabling the SNR to be a potential proton PeVatron despite that it presently is not in the very early phase. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: submitted to MNRAS

Journal ref: 2024MNRAS.528.5487B

arXiv:2401.02596 [pdf, other]

doi 10.1007/s11075-024-01810-2

Unconditionally positivity-preserving explicit Euler-type schemes for a generalized Ait-Sahalia model

Authors: Ruishu Liu, Yulin Cao, Xiaojie Wang

Abstract: The present work is devoted to strong approximations of a generalized Aït-Sahalia model arising from mathematical finance. The numerical study of the considered model faces essential difficulties caused by a drift that blows up at the origin, highly nonlinear drift and diffusion coefficients and positivity-preserving requirement. In this paper, a novel explicit Euler-type scheme is proposed, which… ▽ More The present work is devoted to strong approximations of a generalized Aït-Sahalia model arising from mathematical finance. The numerical study of the considered model faces essential difficulties caused by a drift that blows up at the origin, highly nonlinear drift and diffusion coefficients and positivity-preserving requirement. In this paper, a novel explicit Euler-type scheme is proposed, which is easily implementable and able to preserve positivity of the original model unconditionally, i.e., for any time step-size $h >0$. A mean-square convergence rate of order $0.5$ is also obtained for the proposed scheme in both non-critical and general critical cases. Our work is motivated by the need to justify the multi-level Monte Carlo (MLMC) simulations for the underlying model, where the rate of mean-square convergence is required and the preservation of positivity is desirable particularly for large discretization time steps. Numerical experiments are finally provided to confirm the theoretical findings. △ Less

Submitted 25 March, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

Comments: 25 pages. 4 figures

Journal ref: Numerical Algorithms, 2024

arXiv:2401.01207 [pdf, other]

Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation

Authors: Renshuai Liu, Bowen Ma, Wei Zhang, Zhipeng Hu, Changjie Fan, Tangjie Lv, Yu Ding, Xuan Cheng

Abstract: In human-centric content generation, the pre-trained text-to-image models struggle to produce user-wanted portrait images, which retain the identity of individuals while exhibiting diverse expressions. This paper introduces our efforts towards personalized face generation. To this end, we propose a novel multi-modal face generation framework, capable of simultaneous identity-expression control and… ▽ More In human-centric content generation, the pre-trained text-to-image models struggle to produce user-wanted portrait images, which retain the identity of individuals while exhibiting diverse expressions. This paper introduces our efforts towards personalized face generation. To this end, we propose a novel multi-modal face generation framework, capable of simultaneous identity-expression control and more fine-grained expression synthesis. Our expression control is so sophisticated that it can be specialized by the fine-grained emotional vocabulary. We devise a novel diffusion model that can undertake the task of simultaneously face swap** and reenactment. Due to the entanglement of identity and expression, it's nontrivial to separately and precisely control them in one framework, thus has not been explored yet. To overcome this, we propose several innovative designs in the conditional diffusion model, including balancing identity and expression encoder, improved midpoint sampling, and explicitly background conditioning. Extensive experiments have demonstrated the controllability and scalability of the proposed framework, in comparison with state-of-the-art text-to-image, face swap**, and face reenactment methods. △ Less

Submitted 6 April, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

arXiv:2401.00632 [pdf, other]

TBDD: A New Trust-based, DRL-driven Framework for Blockchain Sharding in IoT

Authors: Zixu Zhang, Guangsheng Yu, Caijun Sun, Xu Wang, Ying Wang, Ming Zhang, Wei Ni, Ren ** Liu, Andrew Reeves, Nektarios Georgalas

Abstract: Integrating sharded blockchain with IoT presents a solution for trust issues and optimized data flow. Sharding boosts blockchain scalability by dividing its nodes into parallel shards, yet it's vulnerable to the $1\%$ attacks where dishonest nodes target a shard to corrupt the entire blockchain. Balancing security with scalability is pivotal for such systems. Deep Reinforcement Learning (DRL) adep… ▽ More Integrating sharded blockchain with IoT presents a solution for trust issues and optimized data flow. Sharding boosts blockchain scalability by dividing its nodes into parallel shards, yet it's vulnerable to the $1\%$ attacks where dishonest nodes target a shard to corrupt the entire blockchain. Balancing security with scalability is pivotal for such systems. Deep Reinforcement Learning (DRL) adeptly handles dynamic, complex systems and multi-dimensional optimization. This paper introduces a Trust-based and DRL-driven (\textsc{TbDd}) framework, crafted to counter shard collusion risks and dynamically adjust node allocation, enhancing throughput while maintaining network security. With a comprehensive trust evaluation mechanism, \textsc{TbDd} discerns node types and performs targeted resharding against potential threats. The model maximizes tolerance for dishonest nodes, optimizes node movement frequency, ensures even node distribution in shards, and balances sharding risks. Rigorous evaluations prove \textsc{TbDd}'s superiority over conventional random-, community-, and trust-based sharding methods in shard risk equilibrium and reducing cross-shard transactions. △ Less

Submitted 31 December, 2023; originally announced January 2024.

arXiv:2401.00421 [pdf, other]

From Text to Pixels: A Context-Aware Semantic Synergy Solution for Infrared and Visible Image Fusion

Authors: Xingyuan Li, Yang Zou, **yuan Liu, Zhiying Jiang, Long Ma, Xin Fan, Risheng Liu

Abstract: With the rapid progression of deep learning technologies, multi-modality image fusion has become increasingly prevalent in object detection tasks. Despite its popularity, the inherent disparities in how different sources depict scene content make fusion a challenging problem. Current fusion methodologies identify shared characteristics between the two modalities and integrate them within this shar… ▽ More With the rapid progression of deep learning technologies, multi-modality image fusion has become increasingly prevalent in object detection tasks. Despite its popularity, the inherent disparities in how different sources depict scene content make fusion a challenging problem. Current fusion methodologies identify shared characteristics between the two modalities and integrate them within this shared domain using either iterative optimization or deep learning architectures, which often neglect the intricate semantic relationships between modalities, resulting in a superficial understanding of inter-modal connections and, consequently, suboptimal fusion outcomes. To address this, we introduce a text-guided multi-modality image fusion method that leverages the high-level semantics from textual descriptions to integrate semantics from infrared and visible images. This method capitalizes on the complementary characteristics of diverse modalities, bolstering both the accuracy and robustness of object detection. The codebook is utilized to enhance a streamlined and concise depiction of the fused intra- and inter-domain dynamics, fine-tuned for optimal performance in detection tasks. We present a bilevel optimization strategy that establishes a nexus between the joint problem of fusion and detection, optimizing both processes concurrently. Furthermore, we introduce the first dataset of paired infrared and visible images accompanied by text prompts, paving the way for future research. Extensive experiments on several datasets demonstrate that our method not only produces visually superior fusion results but also achieves a higher detection mAP over existing methods, achieving state-of-the-art results. △ Less

Submitted 31 December, 2023; originally announced January 2024.

Comments: 10 pages, 12 figures, 3 tables, conference

MSC Class: 68T45 ACM Class: I.4.3

arXiv:2401.00283 [pdf, other]

Near-Space Communications: the Last Piece of 6G Space-Air-Ground-Sea Integrated Network Puzzle

Authors: Hongshan Liu, Tong Qin, Zhen Gao, Tianqi Mao, Keke Ying, Ziwei Wan, Li Qiao, Rui Na, Zhongxiang Li, Chun Hu, Yikun Mei, Tuan Li, Guanghui Wen, Lei Chen, Zhonghuai Wu, Ruiqi Liu, Gaojie Chen, Shuo Wang, Dezhi Zheng

Abstract: This article presents a comprehensive study on the emerging near-space communications (NS-COM) within the context of space-air-ground-sea integrated network (SAGSIN). Specifically, we firstly explore the recent technical developments of NS-COM, followed by the discussions about motivations behind integrating NS-COM into SAGSIN. To further demonstrate the necessity of NS-COM, a comparative analysis… ▽ More This article presents a comprehensive study on the emerging near-space communications (NS-COM) within the context of space-air-ground-sea integrated network (SAGSIN). Specifically, we firstly explore the recent technical developments of NS-COM, followed by the discussions about motivations behind integrating NS-COM into SAGSIN. To further demonstrate the necessity of NS-COM, a comparative analysis between the NS-COM network and other counterparts in SAGSIN is conducted, covering aspects of deployment, coverage, channel characteristics and unique problems of NS-COM network. Afterwards, the technical aspects of NS-COM, including channel modeling, random access, channel estimation, array-based beam management and joint network optimization, are examined in detail. Furthermore, we explore the potential applications of NS-COM, such as structural expansion in SAGSIN communication, civil aviation communication, remote and urgent communication, weather monitoring and carbon neutrality. Finally, some promising research avenues are identified, including stratospheric satellite (StratoSat) -to-ground direct links for mobile terminals, reconfigurable multiple-input multiple-output (MIMO) and holographic MIMO, federated learning in NS-COM networks, maritime communication, electromagnetic spectrum sensing and adversarial game, integrated sensing and communications, StratoSat-based radar detection and imaging, NS-COM assisted enhanced global navigation system, NS-COM assisted intelligent unmanned system and free space optical (FSO) communication. Overall, this paper highlights that the NS-COM plays an indispensable role in the SAGSIN puzzle, providing substantial performance and coverage enhancement to the traditional SAGSIN architecture. △ Less

Submitted 4 March, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

Comments: 28 pages, 8 figures, 2 tables

arXiv:2401.00269 [pdf]

doi 10.1109/TPWRS.2021.3081557

Sample Robust Scheduling of Electricity-Gas Systems Under Wind Power Uncertainty

Authors: Rong-Peng Liu, Yunhe Hou, Yujia Li, Shunbo Lei, Wei Wei, Xiaozhe Wang

Abstract: This paper adopts a two-stage sample robust optimization (SRO) model to address the wind power penetrated unit commitment optimal energy flow (UC-OEF) problem for IEGSs. The two-stage SRO model can be approximately transformed into a computationally efficient form. Specifically, we employ linear decision rules to simplify the proposed UC-OEF model. Moreover, we further enhance the tractability of… ▽ More This paper adopts a two-stage sample robust optimization (SRO) model to address the wind power penetrated unit commitment optimal energy flow (UC-OEF) problem for IEGSs. The two-stage SRO model can be approximately transformed into a computationally efficient form. Specifically, we employ linear decision rules to simplify the proposed UC-OEF model. Moreover, we further enhance the tractability of the simplified model by exploring its structural features and, accordingly, develop a solution method. △ Less

Submitted 30 December, 2023; originally announced January 2024.

Comments: 10 pages

Journal ref: IEEE Trans. Power Syst., vol. 36, no. 6, pp. 5889-5900, Nov. 2021

arXiv:2312.17454 [pdf, ps, other]

Sparsity Exploitation via Joint Receive Processing and Transmit Beamforming Design for MIMO-OFDM ISAC Systems

Authors: Zichao Xiao, Rang Liu, Ming Li, Wei Wang, Qian Liu

Abstract: Integrated sensing and communication (ISAC) is widely recognized as a pivotal enabling technique for the advancement of future wireless networks. This paper aims to efficiently exploit the inherent sparsity of echo signals for the multi-input-multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) based ISAC system. A novel joint receive echo processing and transmit beamforming desig… ▽ More Integrated sensing and communication (ISAC) is widely recognized as a pivotal enabling technique for the advancement of future wireless networks. This paper aims to efficiently exploit the inherent sparsity of echo signals for the multi-input-multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) based ISAC system. A novel joint receive echo processing and transmit beamforming design is presented to achieve this goal. Specifically, we first propose a compressive sensing (CS)-assisted estimation approach to facilitate ISAC receive echo processing, which can not only enable accurate recovery of target information, but also allow substantial reduction in the number of sensing subcarriers to be sampled and processed. Then, based on the proposed CS-assisted processing method, the associated transmit beamforming design is formulated with the objective of maximizing the sum-rate of multiuser communications while satisfying the transmit power budget and ensuring the received signal-to-noise ratio (SNR) for the designated sensing subcarriers. In order to address the formulated non-convex problem involving high-dimensional variables, an effective iterative algorithm employing majorization minimization (MM), fractional programming (FP), and the nonlinear equality alternative direction method of multipliers (neADMM) with closed-form solutions has been developed. Finally, extensive numerical simulations are conducted to verify the effectiveness of the proposed algorithm and the superior performance of the introduced sparsity exploitation strategy. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: 13 pages, 6 Figures, submitted to IEEE Trans

arXiv:2312.16712 [pdf]

Modeling Load Redistribution Attacks in Integrated Electricity-Gas Systems

Authors: Rong-Peng Liu, Xiaozhe Wang, Bo Zeng, Rawad Zgheib

Abstract: We investigate load redistribution (LR) attacks on integrated electricity-gas systems (IEGSs) and proposes a bilevel mixed-integer model to identify the most severe LR attack from an economic perspective. Under a mild assumption, we prove that the proposed model does not exclude any possible upper-level attack. A modified reformulation and decomposition (R&D) algorithm is developed to solve this m… ▽ More We investigate load redistribution (LR) attacks on integrated electricity-gas systems (IEGSs) and proposes a bilevel mixed-integer model to identify the most severe LR attack from an economic perspective. Under a mild assumption, we prove that the proposed model does not exclude any possible upper-level attack. A modified reformulation and decomposition (R&D) algorithm is developed to solve this model in a master-subproblem framework. Particularly, we design a subproblem to address infeasibility issues in the master problem. Accordingly, two types of cuts are added to the master problem for ensuring algorithm feasibility and solution optimality. △ Less

Submitted 27 December, 2023; originally announced December 2023.

Comments: 12 pages

arXiv:2312.15907 [pdf, other]

Align on the Fly: Adapting Chatbot Behavior to Established Norms

Authors: Chunpu Xu, Steffi Chern, Ethan Chern, Ge Zhang, Zekun Wang, Ruibo Liu, **g Li, Jie Fu, Pengfei Liu

Abstract: In this paper, we aim to align large language models with the ever-changing, complex, and diverse human values (e.g., social norms) across time and locations. This presents a challenge to existing alignment techniques, such as supervised fine-tuning, which internalize values within model parameters. To overcome this, we propose an On-the-fly Preference Optimization (OPO) method, which is a real-ti… ▽ More In this paper, we aim to align large language models with the ever-changing, complex, and diverse human values (e.g., social norms) across time and locations. This presents a challenge to existing alignment techniques, such as supervised fine-tuning, which internalize values within model parameters. To overcome this, we propose an On-the-fly Preference Optimization (OPO) method, which is a real-time alignment that works in a streaming way. It employs an external memory to store established rules for alignment, which can constrain LLMs' behaviors without further training, allowing for convenient updates and customization of human values. We also introduce a scalable evaluation to assess the proposed method more effectively. Experimental results on both human-annotated and auto-generated questions from legal and moral domains indicate the effectiveness of the proposed OPO method. Our code and data are released at https://github.com/GAIR-NLP/OPO. △ Less

Submitted 26 December, 2023; originally announced December 2023.

arXiv:2312.15644 [pdf, other]

UVAGaze: Unsupervised 1-to-2 Views Adaptation for Gaze Estimation

Authors: Ruicong Liu, Feng Lu

Abstract: Gaze estimation has become a subject of growing interest in recent research. Most of the current methods rely on single-view facial images as input. Yet, it is hard for these approaches to handle large head angles, leading to potential inaccuracies in the estimation. To address this issue, adding a second-view camera can help better capture eye appearance. However, existing multi-view methods have… ▽ More Gaze estimation has become a subject of growing interest in recent research. Most of the current methods rely on single-view facial images as input. Yet, it is hard for these approaches to handle large head angles, leading to potential inaccuracies in the estimation. To address this issue, adding a second-view camera can help better capture eye appearance. However, existing multi-view methods have two limitations. 1) They require multi-view annotations for training, which are expensive. 2) More importantly, during testing, the exact positions of the multiple cameras must be known and match those used in training, which limits the application scenario. To address these challenges, we propose a novel 1-view-to-2-views (1-to-2 views) adaptation solution in this paper, the Unsupervised 1-to-2 Views Adaptation framework for Gaze estimation (UVAGaze). Our method adapts a traditional single-view gaze estimator for flexibly placed dual cameras. Here, the "flexibly" means we place the dual cameras in arbitrary places regardless of the training data, without knowing their extrinsic parameters. Specifically, the UVAGaze builds a dual-view mutual supervision adaptation strategy, which takes advantage of the intrinsic consistency of gaze directions between both views. In this way, our method can not only benefit from common single-view pre-training, but also achieve more advanced dual-view gaze estimation. The experimental results show that a single-view estimator, when adapted for dual views, can achieve much higher accuracy, especially in cross-dataset settings, with a substantial improvement of 47.0%. Project page: https://github.com/MickeyLLG/UVAGaze. △ Less

Submitted 25 December, 2023; originally announced December 2023.

Comments: This paper is accepted by AAAI2024. Code has been released at https://github.com/MickeyLLG/UVAGaze

arXiv:2312.13901 [pdf, ps, other]

Global dynamics for the stochastic nonlinear beam equations on the four-dimensional torus

Authors: Andreia Chapouto, Guopeng Li, Ruoyuan Liu

Abstract: We study global-in-time dynamics of the stochastic nonlinear beam equations (SNLB) with an additive space-time white noise, posed on the four-dimensional torus. The roughness of the noise leads us to introducing a time-dependent renormalization, after which we show that SNLB is pathwise locally well-posed in all subcritical and most of the critical regimes. For the (renormalized) defocusing cubic… ▽ More We study global-in-time dynamics of the stochastic nonlinear beam equations (SNLB) with an additive space-time white noise, posed on the four-dimensional torus. The roughness of the noise leads us to introducing a time-dependent renormalization, after which we show that SNLB is pathwise locally well-posed in all subcritical and most of the critical regimes. For the (renormalized) defocusing cubic SNLB, we establish pathwise global well-posedness below the energy space, by adapting a hybrid argument of Gubinelli- Koch-Oh-Tolomeo (2022) that combines the $I$-method with a Gronwall-type argument. Lastly, we show almost sure global well-posedness and invariance of the Gibbs measure for the stochastic damped nonlinear beam equations in the defocusing case. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: arXiv admin note: text overlap with arXiv:2005.10570 by other authors

MSC Class: 35L71; 35R60; 60H15

arXiv:2312.12749 [pdf, other]

Nematic charge-density-wave correlations in FeSe$_{1-x}$S$_{x}$

Authors: Ruixian Liu, Wenliang Zhang, Yuan Wei, Zhen Tao, Teguh C. Asmara, Vladimir N. Strocov, Thorsten Schmitt, Xingye Lu

Abstract: The occurrence of charge-density-wave (CDW) order is a common thread in the phase diagram of cuprate high-transition-temperature ($T_c$) superconductors. In iron-based superconductors (FeSCs), nematic order and fluctuations play a decisive role in driving other emergent orders. CDW order has been observed by scanning tunneling microscopy for various FeSCs such as FeSe thin films, uniaxially strain… ▽ More The occurrence of charge-density-wave (CDW) order is a common thread in the phase diagram of cuprate high-transition-temperature ($T_c$) superconductors. In iron-based superconductors (FeSCs), nematic order and fluctuations play a decisive role in driving other emergent orders. CDW order has been observed by scanning tunneling microscopy for various FeSCs such as FeSe thin films, uniaxially strained LiFeAs, and tetragonal FeSe$_{0.81}$S$_{0.19}$. However, it remains elusive if the CDW in these materials is a bulk phenomenon as well as if and how it intertwines with the electronic nematicity. Using energy-resolved resonant X-ray scattering at the Fe-L$_3$ edge, we report the discovery of a local-strain-induced incommensurate isotropic CDW order in FeSe$_{0.82}$S$_{0.18}$. A highly anisotropic CDW response under uniaxial strain unambiguously manifests that the CDW is directly coupled to the nematicity. Transforming part of Fe$^{2+}$ to Fe$^{3+}$ on the surface of FeSe$_{1-x}$S$_{x}$ reveals that the same isotropic CDW can be induced, enhanced, and stabilized in the whole nematic regime measured ($x=0-0.19$). As Fe$^{3+}$ can create local lattice distortions on the surface, the CDW could arise from the interaction between the local strain around Fe$^{3+}$ and the nematic electron correlations. Our experimental observation of a local-strain-induced CDW gives vital information for understanding the interplay between electron correlations and the electronic nematicity in FeSCs. △ Less

Submitted 21 December, 2023; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: 7 pages, 4 figures

arXiv:2312.11947 [pdf, other]

Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling

Authors: Rui Liu, Yifan Hu, Yi Ren, Xiang Yin, Haizhou Li

Abstract: Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting. While recognising the significance of CSS task, the prior studies have not thoroughly investigated the emotional expressiveness problems due to the scarcity of emotional conversational datasets and the difficulty of stateful emotion mo… ▽ More Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting. While recognising the significance of CSS task, the prior studies have not thoroughly investigated the emotional expressiveness problems due to the scarcity of emotional conversational datasets and the difficulty of stateful emotion modeling. In this paper, we propose a novel emotional CSS model, termed ECSS, that includes two main components: 1) to enhance emotion understanding, we introduce a heterogeneous graph-based emotional context modeling mechanism, which takes the multi-source dialogue history as input to model the dialogue context and learn the emotion cues from the context; 2) to achieve emotion rendering, we employ a contrastive learning-based emotion renderer module to infer the accurate emotion style for the target utterance. To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity, and annotate additional emotional information on the existing conversational dataset (DailyTalk). Both objective and subjective evaluations suggest that our model outperforms the baseline models in understanding and rendering emotions. These evaluations also underscore the importance of comprehensive emotional annotations. Code and audio samples can be found at: https://github.com/walker-hyf/ECSS. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 9 pages, 4 figures, Accepted by AAAI'2024, Code and audio samples: https://github.com/walker-hyf/ECSS

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.11152 [pdf, other]

Prompt Based Tri-Channel Graph Convolution Neural Network for Aspect Sentiment Triplet Extraction

Authors: Kun Peng, Lei Jiang, Hao Peng, Rui Liu, Zhengtao Yu, Jiaqian Ren, Zhifeng Hao, Philip S. Yu

Abstract: Aspect Sentiment Triplet Extraction (ASTE) is an emerging task to extract a given sentence's triplets, which consist of aspects, opinions, and sentiments. Recent studies tend to address this task with a table-filling paradigm, wherein word relations are encoded in a two-dimensional table, and the process involves clarifying all the individual cells to extract triples. However, these studies ignore… ▽ More Aspect Sentiment Triplet Extraction (ASTE) is an emerging task to extract a given sentence's triplets, which consist of aspects, opinions, and sentiments. Recent studies tend to address this task with a table-filling paradigm, wherein word relations are encoded in a two-dimensional table, and the process involves clarifying all the individual cells to extract triples. However, these studies ignore the deep interaction between neighbor cells, which we find quite helpful for accurate extraction. To this end, we propose a novel model for the ASTE task, called Prompt-based Tri-Channel Graph Convolution Neural Network (PT-GCN), which converts the relation table into a graph to explore more comprehensive relational information. Specifically, we treat the original table cells as nodes and utilize a prompt attention score computation module to determine the edges' weights. This enables us to construct a target-aware grid-like graph to enhance the overall extraction process. After that, a triple-channel convolution module is conducted to extract precise sentiment knowledge. Extensive experiments on the benchmark datasets show that our model achieves state-of-the-art performance. The code is available at https://github.com/KunPunCN/PT-GCN. △ Less

Submitted 24 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: Accepted in SIAM International Conference on Data Mining (SDM24)

arXiv:2312.10885

A novel diffusion recommendation algorithm based on multi-scale cnn and residual lstm

Authors: Yong Niu, Xing Xing, Zhichun Jia, Ruidi Liu, Mindong Xin

Abstract: Sequential recommendation aims to infer user preferences from historical interaction sequences and predict the next item that users may be interested in the future. The current mainstream design approach is to represent items as fixed vectors, capturing the underlying relationships between items and user preferences based on the order of interactions. However, relying on a single fixed-item embedd… ▽ More Sequential recommendation aims to infer user preferences from historical interaction sequences and predict the next item that users may be interested in the future. The current mainstream design approach is to represent items as fixed vectors, capturing the underlying relationships between items and user preferences based on the order of interactions. However, relying on a single fixed-item embedding may weaken the modeling capability of the system, and the global dynamics and local saliency exhibited by user preferences need to be distinguished. To address these issues, this paper proposes a novel diffusion recommendation algorithm based on multi-scale cnn and residual lstm (AREAL). We introduce diffusion models into the recommend system, representing items as probability distributions instead of fixed vectors. This approach enables adaptive reflection of multiple aspects of the items and generates item distributions in a denoising manner. We use multi-scale cnn and residual lstm methods to extract the local and global dependency features of user history interactions, and use attention mechanism to distinguish weights as the guide features of reverse diffusion recovery. The effectiveness of the proposed method is validated through experiments conducted on two real-world datasets. Specifically, AREAL obtains improvements over the best baselines by 2.63% and 4.25% in terms of HR@20 and 5.05% and 3.94% in terms of NDCG@20 on all datasets. △ Less

Submitted 20 December, 2023; v1 submitted 17 December, 2023; originally announced December 2023.

Comments: This paper needs to be further modified, including the ablation experiment, model framework and other information in Chapter 5. There are some inaccuracies in the presentation of this paper. Two datasets are used instead of three, and there are many inaccuracies in the presentation, which need to be further corrected

arXiv:2312.10593 [pdf, other]

A Novel RFID Authentication Protocol Based on A Block-Order-Modulus Variable Matrix Encryption Algorithm

Authors: Yan Wang, Ruiqi Liu, Tong Gao, Feng Shu, Xuemei Lei, Guan Gui, Jiangzhou Wang

Abstract: In this paper, authentication for mobile radio frequency identification (RFID) systems with low-cost tags is studied. Firstly, an adaptive modulus (AM) encryption algorithm is proposed. Subsequently, in order to enhance the security without additional storage of new key matrices, a self-updating encryption order (SUEO) algorithm is designed. Furthermore, a diagonal block local transpose key matrix… ▽ More In this paper, authentication for mobile radio frequency identification (RFID) systems with low-cost tags is studied. Firstly, an adaptive modulus (AM) encryption algorithm is proposed. Subsequently, in order to enhance the security without additional storage of new key matrices, a self-updating encryption order (SUEO) algorithm is designed. Furthermore, a diagonal block local transpose key matrix (DBLTKM) encryption algorithm is presented, which effectively expands the feasible domain of the key space. Based on the above three algorithms, a novel joint AM-SUEO-DBLTKM encryption algorithm is constructed. Making full use of the advantages of the proposed joint algorithm, a two-way RFID authentication protocol, named AM-SUEO-DBLTKM-RFID, is proposed for mobile RFID systems. In addition, the Burrows-Abadi-Needham (BAN) logic and security analysis indicate that the proposed AM-SUEO-DBLTKM-RFID protocol can effectively combat various typical attacks. Numerical results demonstrate that the proposed AM-SUEO-DBLTKM algorithm can save 99.59\% of tag storage over traditional algorithms. Finally, the low computational complexity as well as the low storage cost of the proposed AM-SUEO-DBLTKM-RFID protocol facilitates deployment within low-cost RFID tags. △ Less

Submitted 9 May, 2024; v1 submitted 16 December, 2023; originally announced December 2023.

arXiv:2312.10477 [pdf]

Quantitative Measurement of adhesion energy between nanolayers and substrates using a nanowire-supported bridging method

Authors: Xiaodong Song, Lizhen Hou, Ruizhe Liu, Noman Akhtar, Peng Wang, Shiliang Wang

Abstract: The measurement of adhesion energy between nanolayers and substrates holds significant importance for the design, fabrication, and stability assessment of micro-/nanoscale devices relying on nanolayers. In this study, we propose a nanowire-supported bridging method based on an optical microscope-based nanomanipulation technique to quantitatively measure the adhesion energy between nanolayers and s… ▽ More The measurement of adhesion energy between nanolayers and substrates holds significant importance for the design, fabrication, and stability assessment of micro-/nanoscale devices relying on nanolayers. In this study, we propose a nanowire-supported bridging method based on an optical microscope-based nanomanipulation technique to quantitatively measure the adhesion energy between nanolayers and substrates. Using this innovative approach, we conducted adhesion energy measurements between mica nanolayers and Si substrates, revealing a value of approximately 110 J/m2. Additionally, we discuss the applicable conditions of this new method. The proposed technique allows measurements in atmospheric conditions and is, in principle, applicable to all types of nanolayers and substrates. Consequently, it holds promise as a universal method for assessing adhesion energy between nanolayers and substrates, considering environmental factors such as atmosphere and roughness. △ Less

Submitted 19 December, 2023; v1 submitted 16 December, 2023; originally announced December 2023.

arXiv:2312.10162 [pdf, other]

Surface wrinkling of a film coated to a graded substrate

Authors: Rui-Cheng Liu, Yang Liu, Alain Goriely

Abstract: We study the surface wrinkling of a stiff thin elastic film bonded to a compliant graded elastic substrate subject to compressive stress generated either by compression or growth of the bilayer. Our aim is to clarify the influence of the modulus gradient on the onset and surface pattern in this bilayers. Within the framework of finite elasticity, an exact bifurcation condition is obtained using th… ▽ More We study the surface wrinkling of a stiff thin elastic film bonded to a compliant graded elastic substrate subject to compressive stress generated either by compression or growth of the bilayer. Our aim is to clarify the influence of the modulus gradient on the onset and surface pattern in this bilayers. Within the framework of finite elasticity, an exact bifurcation condition is obtained using the Stroh formulation and the surface impedance matrix method. Further analytical progress is made by focusing on the case of short wavelength limit for which the Wentzel-Kramers-Brillouin method can be used to resolve the eigenvalue problem of ordinary differential equations with variable coefficients. An explicit bifurcation condition is obtained from which asymptotic the critical buckling load and the critical wavelength are derived. In particular, we consider two distinct situations depending on the ratio $β$ of the shear modulus at the substrate surface to that at infinity. If $β$ is of $\mathcal{O}(1)$ or small, the parameters related to modulus gradient all appear in the high order terms and play an insignificant role in the bifurcation. In that case, it is the modulus ratio between the film and substrate surface that governs the onset of surface wrinkling. If, however, $β\gg1$, the modulus gradient affects the critical condition through leading-order terms. Through our analysis we unravel the influence of different material and geometric parameters, including the modulus gradient, on the bifurcation threshold and the associated wavelength which can be of importance in many biological and technological settings. △ Less

Submitted 15 December, 2023; originally announced December 2023.

arXiv:2312.08034 [pdf, other]

Individualized Deepfake Detection Exploiting Traces Due to Double Neural-Network Operations

Authors: Mushfiqur Rahman, Runze Liu, Chau-Wai Wong, Huaiyu Dai

Abstract: In today's digital landscape, journalists urgently require tools to verify the authenticity of facial images and videos depicting specific public figures before incorporating them into news stories. Existing deepfake detectors are not optimized for this detection task when an image is associated with a specific and identifiable individual. This study focuses on the deepfake detection of facial ima… ▽ More In today's digital landscape, journalists urgently require tools to verify the authenticity of facial images and videos depicting specific public figures before incorporating them into news stories. Existing deepfake detectors are not optimized for this detection task when an image is associated with a specific and identifiable individual. This study focuses on the deepfake detection of facial images of individual public figures. We propose to condition the proposed detector on the identity of the identified individual given the advantages revealed by our theory-driven simulations. While most detectors in the literature rely on perceptible or imperceptible artifacts present in deepfake facial images, we demonstrate that the detection performance can be improved by exploiting the idempotency property of neural networks. In our approach, the training process involves double neural-network operations where we pass an authentic image through a deepfake simulating network twice. Experimental results show that the proposed method improves the area under the curve (AUC) from 0.92 to 0.94 and reduces its standard deviation by 17\%. For evaluating the detection performance of individual public figures, a facial image dataset with individuals' names is required, a criterion not met by the current deepfake datasets. To address this, we curated a dataset comprising 32k images featuring 45 public figures, which we intend to release to the public after the paper is published. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2312.07258 [pdf, other]

SSTA: Salient Spatially Transformed Attack

Authors: Renyang Liu, Wei Zhou, Sixin Wu, Jun Zhao, Kwok-Yan Lam

Abstract: Extensive studies have demonstrated that deep neural networks (DNNs) are vulnerable to adversarial attacks, which brings a huge security risk to the further application of DNNs, especially for the AI models developed in the real world. Despite the significant progress that has been made recently, existing attack methods still suffer from the unsatisfactory performance of esca** from being detect… ▽ More Extensive studies have demonstrated that deep neural networks (DNNs) are vulnerable to adversarial attacks, which brings a huge security risk to the further application of DNNs, especially for the AI models developed in the real world. Despite the significant progress that has been made recently, existing attack methods still suffer from the unsatisfactory performance of esca** from being detected by naked human eyes due to the formulation of adversarial example (AE) heavily relying on a noise-adding manner. Such mentioned challenges will significantly increase the risk of exposure and result in an attack to be failed. Therefore, in this paper, we propose the Salient Spatially Transformed Attack (SSTA), a novel framework to craft imperceptible AEs, which enhance the stealthiness of AEs by estimating a smooth spatial transform metric on a most critical area to generate AEs instead of adding external noise to the whole image. Compared to state-of-the-art baselines, extensive experiments indicated that SSTA could effectively improve the imperceptibility of the AEs while maintaining a 100\% attack success rate. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.07245 [pdf, other]

DTA: Distribution Transform-based Attack for Query-Limited Scenario

Authors: Renyang Liu, Wei Zhou, Xin **, Song Gao, Yuanyu Wang, Ruxin Wang

Abstract: In generating adversarial examples, the conventional black-box attack methods rely on sufficient feedback from the to-be-attacked models by repeatedly querying until the attack is successful, which usually results in thousands of trials during an attack. This may be unacceptable in real applications since Machine Learning as a Service Platform (MLaaS) usually only returns the final result (i.e., h… ▽ More In generating adversarial examples, the conventional black-box attack methods rely on sufficient feedback from the to-be-attacked models by repeatedly querying until the attack is successful, which usually results in thousands of trials during an attack. This may be unacceptable in real applications since Machine Learning as a Service Platform (MLaaS) usually only returns the final result (i.e., hard-label) to the client and a system equipped with certain defense mechanisms could easily detect malicious queries. By contrast, a feasible way is a hard-label attack that simulates an attacked action being permitted to conduct a limited number of queries. To implement this idea, in this paper, we bypass the dependency on the to-be-attacked model and benefit from the characteristics of the distributions of adversarial examples to reformulate the attack problem in a distribution transform manner and propose a distribution transform-based attack (DTA). DTA builds a statistical map** from the benign example to its adversarial counterparts by tackling the conditional likelihood under the hard-label black-box settings. In this way, it is no longer necessary to query the target model frequently. A well-trained DTA model can directly and efficiently generate a batch of adversarial examples for a certain input, which can be used to attack un-seen models based on the assumed transferability. Furthermore, we surprisingly find that the well-trained DTA model is not sensitive to the semantic spaces of the training dataset, meaning that the model yields acceptable attack performance on other datasets. Extensive experiments validate the effectiveness of the proposed idea and the superiority of DTA over the state-of-the-art. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Showing 151–200 of 2,010 results for author: Liu, R