-
Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives
Authors:
Sheng Luo,
Wei Chen,
Wanxin Tian,
Rui Liu,
Luanxuan Hou,
Xiubao Zhang,
Haifeng Shen,
Ruiqi Wu,
Shuyi Geng,
Yi Zhou,
Ling Shao,
Yi Yang,
Bojun Gao,
Qun Li,
Guobin Wu
Abstract:
Foundation models have indeed made a profound impact on various fields, emerging as pivotal components that significantly shape the capabilities of intelligent systems. In the context of intelligent vehicles, leveraging the power of foundation models has proven to be transformative, offering notable advancements in visual understanding. Equipped with multi-modal and multi-task learning capabilitie…
▽ More
Foundation models have indeed made a profound impact on various fields, emerging as pivotal components that significantly shape the capabilities of intelligent systems. In the context of intelligent vehicles, leveraging the power of foundation models has proven to be transformative, offering notable advancements in visual understanding. Equipped with multi-modal and multi-task learning capabilities, multi-modal multi-task visual understanding foundation models (MM-VUFMs) effectively process and fuse data from diverse modalities and simultaneously handle various driving-related tasks with powerful adaptability, contributing to a more holistic understanding of the surrounding scene. In this survey, we present a systematic analysis of MM-VUFMs specifically designed for road scenes. Our objective is not only to provide a comprehensive overview of common practices, referring to task-specific models, unified multi-modal models, unified multi-task models, and foundation model prompting techniques, but also to highlight their advanced capabilities in diverse learning paradigms. These paradigms include open-world understanding, efficient transfer for road scenes, continual learning, interactive and generative capability. Moreover, we provide insights into key challenges and future trends, such as closed-loop driving systems, interpretability, embodied driving agents, and world models. To facilitate researchers in staying abreast of the latest developments in MM-VUFMs for road scenes, we have established a continuously updated repository at https://github.com/rolsheng/MM-VUFM4DS
△ Less
Submitted 26 May, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Flexible Non-interactive Short-term Implicit Certificate Generation for VANETs
Authors:
Rui Liu,
Yun Lu,
Jian** Pan
Abstract:
A leading industry standard for secure and trusted communication in vehicular ad-hoc networks (VANETs) is the Security Credential Management System (SCMS). It uses anonymous certificates, functioning as pseudonyms, to preserve the privacy of vehicles. With the rapid development of advanced applications in VANETs, such as crowdsensing and federated learning, vehicles need to communicate with each o…
▽ More
A leading industry standard for secure and trusted communication in vehicular ad-hoc networks (VANETs) is the Security Credential Management System (SCMS). It uses anonymous certificates, functioning as pseudonyms, to preserve the privacy of vehicles. With the rapid development of advanced applications in VANETs, such as crowdsensing and federated learning, vehicles need to communicate with each other or infrastructures more frequently, leading to a higher demand for pseudonyms. However, the current approach of certificate provisioning in SCMS is not able to fully support pseudonyms, due to storage limitation, cost of connectivity establishment, and communication overhead of certificate downloading. To tackle this challenge, we propose a non-interactive approach for SCMS, allowing vehicles themselves to generate short-term key pairs and anonymous implicit certificates. Our evaluation and comparison with previous work show that our solution not only effectively reduces the communication cost, but also grants vehicles greater flexibility in certificate generation and use. On the technical side, to the best of our knowledge, this is the first work which (1) applies sanitizable signature for non-interactive anonymous certificate generation, and (2) is specifically designed for SCMS, which opens up possibilities for extensions and applications in industry.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
mmID: High-Resolution mmWave Imaging for Human Identification
Authors:
Sakila S. Jayaweera,
Sai Deepika Regani,
Yuqian Hu,
Beibei Wang,
K. J. Ray Liu
Abstract:
Achieving accurate human identification through RF imaging has been a persistent challenge, primarily attributed to the limited aperture size and its consequent impact on imaging resolution. The existing imaging solution enables tasks such as pose estimation, activity recognition, and human tracking based on deep neural networks by estimating skeleton joints. In contrast to estimating joints, this…
▽ More
Achieving accurate human identification through RF imaging has been a persistent challenge, primarily attributed to the limited aperture size and its consequent impact on imaging resolution. The existing imaging solution enables tasks such as pose estimation, activity recognition, and human tracking based on deep neural networks by estimating skeleton joints. In contrast to estimating joints, this paper proposes to improve imaging resolution by estimating the human figure as a whole using conditional generative adversarial networks (cGAN). In order to reduce training complexity, we use an estimated spatial spectrum using the MUltiple SIgnal Classification (MUSIC) algorithm as input to the cGAN. Our system generates environmentally independent, high-resolution images that can extract unique physical features useful for human identification. We use a simple convolution layers-based classification network to obtain the final identification result. From the experimental results, we show that resolution of the image produced by our trained generator is high enough to enable human identification. Our finding indicates high-resolution accuracy with 5% mean silhouette difference to the Kinect device. Extensive experiments in different environments on multiple testers demonstrate that our system can achieve 93% overall test accuracy in unseen environments for static human target identification.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Diagnosing the particle transport mechanism in the pulsar halo via X-ray observations
Authors:
Qi-Zuo Wu,
Chao-Ming Li,
Xuan-Han Liang,
Chong Ge,
Ruo-Yu Liu
Abstract:
Pulsar halos (also termed 'TeV halo') are a new class of $γ$-ray sources in Galaxy, which manifest as extended $γ$-ray emission around middle-age pulsars, as discovered around the Geminga pulsar, the Monogem pulsar and PSR~J0622+3749 by HAWC and LHAASO. A consensus has been reached that the TeV emission comes from the inverse Compton scattering of esca** electrons/positrons from the PWN off soft…
▽ More
Pulsar halos (also termed 'TeV halo') are a new class of $γ$-ray sources in Galaxy, which manifest as extended $γ$-ray emission around middle-age pulsars, as discovered around the Geminga pulsar, the Monogem pulsar and PSR~J0622+3749 by HAWC and LHAASO. A consensus has been reached that the TeV emission comes from the inverse Compton scattering of esca** electrons/positrons from the PWN off soft background radiation field, while the particle transport mechanism in the halo is still in dispute. Currently, there are mainly three interpretations, namely, the isotropic, suppressed diffusion model; the isotropic, unsuppressed diffusion model with considering ballistic propagation of newly injected particles; the anisotropic diffusion model. While the predicted gamma-ray surface brightness profiles by all three models can be more or less consistent with the observation, the implication of the three models for cosmic-ray transport mechanisms and the properties of interstellar magnetic field are quite different. In this study, we calculate the anticipated X-ray emission of pulsar halos under the three models. We show that the synchrotron radiation of these esca** electrons can produce a corresponding X-ray halo around the pulsar, and the expected surface brightness profiles are distinct in three models. We suggest that sensitive X-ray detectors of a large field of view (such as eROSITA and Einstein Probe) with a reasonably long exposure time are crucial to understand the formation mechanism of pulsar halos and serve as a probe to the properties of the interstellar turbulence.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Heterogeneous treatment effect estimation with subpopulation identification for personalized medicine in opioid use disorder
Authors:
Seungyeon Lee,
Ruoqi Liu,
Wenyu Song,
** Zhang
Abstract:
Deep learning models have demonstrated promising results in estimating treatment effects (TEE). However, most of them overlook the variations in treatment outcomes among subgroups with distinct characteristics. This limitation hinders their ability to provide accurate estimations and treatment recommendations for specific subgroups. In this study, we introduce a novel neural network-based framewor…
▽ More
Deep learning models have demonstrated promising results in estimating treatment effects (TEE). However, most of them overlook the variations in treatment outcomes among subgroups with distinct characteristics. This limitation hinders their ability to provide accurate estimations and treatment recommendations for specific subgroups. In this study, we introduce a novel neural network-based framework, named SubgroupTE, which incorporates subgroup identification and treatment effect estimation. SubgroupTE identifies diverse subgroups and simultaneously estimates treatment effects for each subgroup, improving the treatment effect estimation by considering the heterogeneity of treatment responses. Comparative experiments on synthetic data show that SubgroupTE outperforms existing models in treatment effect estimation. Furthermore, experiments on a real-world dataset related to opioid use disorder (OUD) demonstrate the potential of our approach to enhance personalized treatment recommendations for OUD patients.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Fourier Prompt Tuning for Modality-Incomplete Scene Segmentation
Authors:
Rui** Liu,
Jiaming Zhang,
Kunyu Peng,
Yufan Chen,
Ke Cao,
Junwei Zheng,
M. Saquib Sarfraz,
Kailun Yang,
Rainer Stiefelhagen
Abstract:
Integrating information from multiple modalities enhances the robustness of scene perception systems in autonomous vehicles, providing a more comprehensive and reliable sensory framework. However, the modality incompleteness in multi-modal segmentation remains under-explored. In this work, we establish a task called Modality-Incomplete Scene Segmentation (MISS), which encompasses both system-level…
▽ More
Integrating information from multiple modalities enhances the robustness of scene perception systems in autonomous vehicles, providing a more comprehensive and reliable sensory framework. However, the modality incompleteness in multi-modal segmentation remains under-explored. In this work, we establish a task called Modality-Incomplete Scene Segmentation (MISS), which encompasses both system-level modality absence and sensor-level modality errors. To avoid the predominant modality reliance in multi-modal fusion, we introduce a Missing-aware Modal Switch (MMS) strategy to proactively manage missing modalities during training. Utilizing bit-level batch-wise sampling enhances the model's performance in both complete and incomplete testing scenarios. Furthermore, we introduce the Fourier Prompt Tuning (FPT) method to incorporate representative spectral information into a limited number of learnable prompts that maintain robustness against all MISS scenarios. Akin to fine-tuning effects but with fewer tunable parameters (1.1%). Extensive experiments prove the efficacy of our proposed approach, showcasing an improvement of 5.84% mIoU over the prior state-of-the-art parameter-efficient methods in modality missing. The source code is publicly available at https://github.com/Rui**L/MISS.
△ Less
Submitted 10 April, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
Dipole superfluid hydrodynamics II
Authors:
Akash Jain,
Kristan Jensen,
Ruochuan Liu,
Eric Mefford
Abstract:
We present a dissipative hydrodynamic theory of "s-wave dipole superfluids" that arise in phases of translation-invariant and dipole-symmetric models in which the U(1) symmetry is spontaneously broken. The hydrodynamic description is subtle on account of an analogue of dangerously irrelevant operators, which requires us to formalize an entirely new derivative counting scheme suitable for these flu…
▽ More
We present a dissipative hydrodynamic theory of "s-wave dipole superfluids" that arise in phases of translation-invariant and dipole-symmetric models in which the U(1) symmetry is spontaneously broken. The hydrodynamic description is subtle on account of an analogue of dangerously irrelevant operators, which requires us to formalize an entirely new derivative counting scheme suitable for these fluids. We use our hydrodynamic model to investigate the linearized response of such a fluid, characterized by sound modes $ω\sim \pm k - ik^2$, shear modes $ω\sim-ik^2$, and magnon-like propagating modes $ω\sim \pm k^2 - ik^4$ that are the dipole-invariant version of superfluid "second sound" modes. We find that these fluids can also admit equilibrium states with "dipole superflow" that resemble a polarized medium. Finally, we couple our theory to slowly varying background fields, which allows us to compute response functions of hydrodynamic operators and Kubo formulas for hydrodynamic transport coefficients.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Applications of Tao General Difference in Discrete Domain
Authors:
Linmi Tao,
Ruiyang Liu,
Donglai Tao,
Wu Xia,
Feilong Ma,
Yu Cheng,
**gmao Cui
Abstract:
Numerical difference computation is one of the cores and indispensable in the modern digital era. Tao general difference (TGD) is a novel theory and approach to difference computation for discrete sequences and arrays in multidimensional space. Built on the solid theoretical foundation of the general difference in a finite interval, the TGD operators demonstrate exceptional signal processing capab…
▽ More
Numerical difference computation is one of the cores and indispensable in the modern digital era. Tao general difference (TGD) is a novel theory and approach to difference computation for discrete sequences and arrays in multidimensional space. Built on the solid theoretical foundation of the general difference in a finite interval, the TGD operators demonstrate exceptional signal processing capabilities in real-world applications. A novel smoothness property of a sequence is defined on the first- and second TGD. This property is used to denoise one-dimensional signals, where the noise is the non-smooth points in the sequence. Meanwhile, the center of the gradient in a finite interval can be accurately location via TGD calculation. This solves a traditional challenge in computer vision, which is the precise localization of image edges with noise robustness. Furthermore, the power of TGD operators extends to spatio-temporal edge detection in three-dimensional arrays, enabling the identification of kinetic edges in video data. These diverse applications highlight the properties of TGD in discrete domain and the significant promise of TGD for the computation across signal processing, image analysis, and video analytic.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
pix2gestalt: Amodal Segmentation by Synthesizing Wholes
Authors:
Ege Ozguroglu,
Ruoshi Liu,
Dídac Surís,
Dian Chen,
Achal Dave,
Pavel Tokmakov,
Carl Vondrick
Abstract:
We introduce pix2gestalt, a framework for zero-shot amodal segmentation, which learns to estimate the shape and appearance of whole objects that are only partially visible behind occlusions. By capitalizing on large-scale diffusion models and transferring their representations to this task, we learn a conditional diffusion model for reconstructing whole objects in challenging zero-shot cases, incl…
▽ More
We introduce pix2gestalt, a framework for zero-shot amodal segmentation, which learns to estimate the shape and appearance of whole objects that are only partially visible behind occlusions. By capitalizing on large-scale diffusion models and transferring their representations to this task, we learn a conditional diffusion model for reconstructing whole objects in challenging zero-shot cases, including examples that break natural and physical priors, such as art. As training data, we use a synthetically curated dataset containing occluded objects paired with their whole counterparts. Experiments show that our approach outperforms supervised baselines on established benchmarks. Our model can furthermore be used to significantly improve the performance of existing object recognition and 3D reconstruction methods in the presence of occlusions.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Correlation between magnetic domain structures and quantum anomalous Hall effect in epitaxial MnBi2Te4 thin films
Authors:
Yang Shi,
Yunhe Bai,
Yuanzhao Li,
Yang Feng,
Qiang Li,
Huanyu Zhang,
Yang Chen,
Yitian Tong,
Jianli Luan,
Ruixuan Liu,
Pengfei Ji,
Zongwei Gao,
Hangwen Guo,
**song Zhang,
Yayu Wang,
Xiao Feng,
Ke He,
Xiaodong Zhou,
Jian Shen
Abstract:
We use magnetic force microscopy (MFM) to study spatial uniformity of magnetization of epitaxially grown MnBi2Te4 thin films. Compared to films which exhibit no quantum anomalous Hall effect (QAH), films with QAH are observed to have more spatial uniformity of magnetization with larger domain size. The domain evolution upon magnetic field swee** indicates that the magnetic domains or the spatial…
▽ More
We use magnetic force microscopy (MFM) to study spatial uniformity of magnetization of epitaxially grown MnBi2Te4 thin films. Compared to films which exhibit no quantum anomalous Hall effect (QAH), films with QAH are observed to have more spatial uniformity of magnetization with larger domain size. The domain evolution upon magnetic field swee** indicates that the magnetic domains or the spatial nonuniformity of magnetization originates from the strong pinning of the inherent sample inhomogeneity. A direct correlation between the Hall resistivity and the domain size has been established by analyzing a series of thin films with and without QAH. Our observation shows that one has to suppress the spatial nonuniformity of magnetization to allow the Hall resistivity to be quantized. The fact that a sizable longitudinal resistivity remains even for the QAH sample suggests a quantized Hall insulator scenario. Our work provides important insights to the understanding of the quantization mechanism and the dissipation of the QAH state in MnBi2Te4 system.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
SubgroupTE: Advancing Treatment Effect Estimation with Subgroup Identification
Authors:
Seungyeon Lee,
Ruoqi Liu,
Wenyu Song,
Lang Li,
** Zhang
Abstract:
Precise estimation of treatment effects is crucial for evaluating intervention effectiveness. While deep learning models have exhibited promising performance in learning counterfactual representations for treatment effect estimation (TEE), a major limitation in most of these models is that they treat the entire population as a homogeneous group, overlooking the diversity of treatment effects acros…
▽ More
Precise estimation of treatment effects is crucial for evaluating intervention effectiveness. While deep learning models have exhibited promising performance in learning counterfactual representations for treatment effect estimation (TEE), a major limitation in most of these models is that they treat the entire population as a homogeneous group, overlooking the diversity of treatment effects across potential subgroups that have varying treatment effects. This limitation restricts the ability to precisely estimate treatment effects and provide subgroup-specific treatment recommendations. In this paper, we propose a novel treatment effect estimation model, named SubgroupTE, which incorporates subgroup identification in TEE. SubgroupTE identifies heterogeneous subgroups with different treatment responses and more precisely estimates treatment effects by considering subgroup-specific causal effects. In addition, SubgroupTE iteratively optimizes subgrou** and treatment effect estimation networks to enhance both estimation and subgroup identification. Comprehensive experiments on the synthetic and semi-synthetic datasets exhibit the outstanding performance of SubgroupTE compared with the state-of-the-art models on treatment effect estimation. Additionally, a real-world study demonstrates the capabilities of SubgroupTE in enhancing personalized treatment recommendations for patients with opioid use disorder (OUD) by advancing treatment effect estimation with subgroup identification.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native
Authors:
Yao Lu,
Song Bian,
Lequn Chen,
Yongjun He,
Yulong Hui,
Matthew Lentz,
Beibin Li,
Fei Liu,
Jialin Li,
Qi Liu,
Rui Liu,
Xiaoxuan Liu,
Lin Ma,
Kexin Rong,
Jianguo Wang,
Yingjun Wu,
Yongji Wu,
Huanchen Zhang,
Minjia Zhang,
Qizhen Zhang,
Tianyi Zhou,
Danyang Zhuo
Abstract:
In this paper, we investigate the intersection of large generative AI models and cloud-native computing architectures. Recent large models such as ChatGPT, while revolutionary in their capabilities, face challenges like escalating costs and demand for high-end GPUs. Drawing analogies between large-model-as-a-service (LMaaS) and cloud database-as-a-service (DBaaS), we describe an AI-native computin…
▽ More
In this paper, we investigate the intersection of large generative AI models and cloud-native computing architectures. Recent large models such as ChatGPT, while revolutionary in their capabilities, face challenges like escalating costs and demand for high-end GPUs. Drawing analogies between large-model-as-a-service (LMaaS) and cloud database-as-a-service (DBaaS), we describe an AI-native computing paradigm that harnesses the power of both cloud-native technologies (e.g., multi-tenancy and serverless computing) and advanced machine learning runtime (e.g., batched LoRA inference). These joint efforts aim to optimize costs-of-goods-sold (COGS) and improve resource accessibility. The journey of merging these two domains is just at the beginning and we hope to stimulate future research and development in this area.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
An Efficient Finite Difference-based Implicit Solver for Phase-Field Equations with Spatially and Temporally Varying Parameters
Authors:
Zirui Mao,
G. R. Liu,
Michael J. Demkowicz
Abstract:
The phase field method is an effective tool for modeling microstructure evolution in materials. Many efficient implicit numerical solvers have been proposed for phase field simulations under uniform and time-invariant model parameters. We use Eyre's theorem to develop an unconditionally stable implicit solver for spatially non-uniform and time-varying model parameters. The accuracy, unconditional…
▽ More
The phase field method is an effective tool for modeling microstructure evolution in materials. Many efficient implicit numerical solvers have been proposed for phase field simulations under uniform and time-invariant model parameters. We use Eyre's theorem to develop an unconditionally stable implicit solver for spatially non-uniform and time-varying model parameters. The accuracy, unconditional stability, and efficiency of the solver is validated against benchmarking examples. In its current form, the solver requires a uniform mesh and may only be applied to problems with periodic, Neumann, or mixed periodic and Neumann boundary conditions.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Reentrant quantum anomalous Hall effect in molecular beam epitaxy-grown MnBi2Te4 thin films
Authors:
Yuanzhao Li,
Yunhe Bai,
Yang Feng,
Jianli Luan,
Zongwei Gao,
Yang Chen,
Yitian Tong,
Ruixuan Liu,
Su Kong Chong,
Kang L. Wang,
Xiaodong Zhou,
Jian Shen,
**song Zhang,
Yayu Wang,
Chui-Zhen Chen,
XinCheng Xie,
Xiao Feng,
Ke He,
Qi-Kun Xue
Abstract:
In this study, we investigate intrinsic magnetic topological insulator MnBi2Te4 thin films grown by molecular beam epitaxy. We observe a reentrant quantum anomalous Hall effect when the Fermi energy enters the valance band and magnetic field equals zero, indicating the emergence of the Chern Anderson insulator state. The discovery opens a new avenue for realizing the QAH effect and underscores the…
▽ More
In this study, we investigate intrinsic magnetic topological insulator MnBi2Te4 thin films grown by molecular beam epitaxy. We observe a reentrant quantum anomalous Hall effect when the Fermi energy enters the valance band and magnetic field equals zero, indicating the emergence of the Chern Anderson insulator state. The discovery opens a new avenue for realizing the QAH effect and underscores the fundamental role of both Berry curvature and Anderson localization.
△ Less
Submitted 21 January, 2024;
originally announced January 2024.
-
Global well-posedness and enhanced dissipation for the 2D stochastic Nernst-Planck-Navier-Stokes equations with transport noise
Authors:
Quyuan Lin,
Rongchang Liu,
Weinan Wang
Abstract:
In this paper, we consider the 2D stochastic Nernst-Planck-Navier-Stokes equations with transport noise. By assuming the ionic species have the same diffusivity and opposite valences, we prove the global well-posedness of the system. Furthermore, we illustrate the enhanced dissipation phenomenon in the system with specific transportation noise by establishing that it enables an arbitrarily large e…
▽ More
In this paper, we consider the 2D stochastic Nernst-Planck-Navier-Stokes equations with transport noise. By assuming the ionic species have the same diffusivity and opposite valences, we prove the global well-posedness of the system. Furthermore, we illustrate the enhanced dissipation phenomenon in the system with specific transportation noise by establishing that it enables an arbitrarily large exponential convergence rate of the solutions.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
360ORB-SLAM: A Visual SLAM System for Panoramic Images with Depth Completion Network
Authors:
Yichen Chen,
Yiqi Pan,
Ruyu Liu,
Haoyu Zhang,
Guodao Zhang,
Bo Sun,
Jianhua Zhang
Abstract:
To enhance the performance and effect of AR/VR applications and visual assistance and inspection systems, visual simultaneous localization and map** (vSLAM) is a fundamental task in computer vision and robotics. However, traditional vSLAM systems are limited by the camera's narrow field-of-view, resulting in challenges such as sparse feature distribution and lack of dense depth information. To o…
▽ More
To enhance the performance and effect of AR/VR applications and visual assistance and inspection systems, visual simultaneous localization and map** (vSLAM) is a fundamental task in computer vision and robotics. However, traditional vSLAM systems are limited by the camera's narrow field-of-view, resulting in challenges such as sparse feature distribution and lack of dense depth information. To overcome these limitations, this paper proposes a 360ORB-SLAM system for panoramic images that combines with a depth completion network. The system extracts feature points from the panoramic image, utilizes a panoramic triangulation module to generate sparse depth information, and employs a depth completion network to obtain a dense panoramic depth map. Experimental results on our novel panoramic dataset constructed based on Carla demonstrate that the proposed method achieves superior scale accuracy compared to existing monocular SLAM methods and effectively addresses the challenges of feature association and scale ambiguity. The integration of the depth completion network enhances system stability and mitigates the impact of dynamic elements on SLAM performance.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
A Unified NOMA Framework in Beam-Hop** Satellite Communication Systems
Authors:
Xuyang Zhang,
Xinwei Yue,
Tian Li,
Zhihao Han,
Yafei Wang,
Yong Ding,
Rongke Liu
Abstract:
This paper investigates the application of a unified non-orthogonal multiple access framework in beam hop** (U-NOMA-BH) based satellite communication systems. More specifically, the proposed U-NOMA-BH framework can be applied to code-domain NOMA based BH (CD-NOMA-BH) and power-domain NOMA based BH (PD-NOMA-BH) systems. To satisfy dynamic-uneven traffic demands, we formulate the optimization prob…
▽ More
This paper investigates the application of a unified non-orthogonal multiple access framework in beam hop** (U-NOMA-BH) based satellite communication systems. More specifically, the proposed U-NOMA-BH framework can be applied to code-domain NOMA based BH (CD-NOMA-BH) and power-domain NOMA based BH (PD-NOMA-BH) systems. To satisfy dynamic-uneven traffic demands, we formulate the optimization problem to minimize the square of discrete difference by jointly optimizing power allocation, carrier assignment and beam scheduling. The non-convexity of the objective function and the constraint condition is solved through Dinkelbach's transform and variable relaxation. As a further development, the closed-from and asymptotic expressions of outage probability are derived for CD/PD-NOMA-BH systems. Based on approximated results, the diversity orders of a pair of users are obtained in detail. In addition, the system throughput of U-NOMA-BH is discussed in delay-limited transmission mode. Numerical results verify that: i) The gap between traffic requests of CD/PD-NOMA-BH systems appears to be more closely compared with orthogonal multiple access based BH (OMA-BH); ii) The CD-NOMA-BH system is capable of providing the enhanced traffic request and capacity provision; and iii) The outage behaviors of CD/PD-NOMA-BH are better than that of OMA-BH.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
A Unified Model for Multi-epoch Neutrino Events and Broadband Spectral Energy Distribution of $\rm TXS~0506+056$
Authors:
Zhen-Jie Wang,
Ruo-Yu Liu,
Ze-Rui Wang,
Junfeng Wang
Abstract:
The blazar $TXS~0506+056$ has been proposed as a high-energy neutrino emitter. However, it has been shown that the standard one-zone model cannot produce sufficiently high neutrino flux due to constraints from the X-ray data, implying more complex properties of the radiation zones in the blazar than that described by the standard one-zone model. In this work we investigate multi-epoch high-energy…
▽ More
The blazar $TXS~0506+056$ has been proposed as a high-energy neutrino emitter. However, it has been shown that the standard one-zone model cannot produce sufficiently high neutrino flux due to constraints from the X-ray data, implying more complex properties of the radiation zones in the blazar than that described by the standard one-zone model. In this work we investigate multi-epoch high-energy muon neutrino events associated with the blazar $TXS~0506+056$ occured in 2014-2015, 2017-2018, 2021-2022 and 2022-2023, respectively. We applied the so-called ``stochastic dissipation model'' to account for the neutrino-blazar associations detected in the four epochs simultaenously. This model describes a scenario in which the emission of the blazar arise from the superimposition of two components: a persistent component related to the quasi-stable state of the blazar and a transient component responsible for the sudden enhancement of the blazar's flux, either in electromagnetic radiation or in neutrino emission. The latter component could form at a random distance along the jet by a strong energy dissipation event. Under such assumption, the multi-epoch broadband spectral energy distribution (SED) can be well explained and the expected number of high-energy neutrino events is statistically realistic. The expected number of neutrino events in half-year is around 8.2, 0.07, 0.73 and 0.41, corresponding to the epoch in 2014-2015, 2017-2018, 2021-2022 and 2022-2023, respectively. Hence, our model self-consistently explains the episodic neutrino emission from $TXS~0506+056$.
△ Less
Submitted 17 January, 2024; v1 submitted 11 January, 2024;
originally announced January 2024.
-
End-to-End Learning for SLP-Based ISAC Systems
Authors:
Yixian Zheng,
Rang Liu,
Ming Li,
Qian Liu
Abstract:
Integrated sensing and communication (ISAC) is an encouraging wireless technology which can simultaneously perform both radar and communication functionalities by sharing the same transmit waveform, spectral resource, and hardware platform. Recently emerged symbol-level precoding (SLP) technique exhibits advancement in ISAC systems by leveraging the waveform design degrees of freedom (DoFs) in bot…
▽ More
Integrated sensing and communication (ISAC) is an encouraging wireless technology which can simultaneously perform both radar and communication functionalities by sharing the same transmit waveform, spectral resource, and hardware platform. Recently emerged symbol-level precoding (SLP) technique exhibits advancement in ISAC systems by leveraging the waveform design degrees of freedom (DoFs) in both temporal and spatial domains. However, traditional SLP-based ISAC systems are designed in a modular paradigm, which potentially limits the overall performance of communication and radar sensing. The high complexity of existing SLP design algorithms is another issue that hurdles the practical deployment. To break through the bottleneck of these approaches, in this paper we propose an end-to-end approach to jointly design the SLP-based dual-functional transmitter and receivers of communication and radar sensing. In particular, we aim to utilize deep learning-based methods to minimize the symbol error rate (SER) of communication users, maximize the detection probability, and minimize the root mean square error (RMSE) of the target angle estimation. Multi-layer perceptron (MLP) networks and a long short term memory (LSTM) network are respectively applied to the transmitter, communication users and radar receiver. Simulation results verify the feasibility of the proposed deep-learning-based end-to-end optimization for ISAC systems and reveal the effectiveness of the proposed neural networks for the end-to-end design.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Loophole-free test of macroscopic realism via high-order correlations of measurement
Authors:
** Wang,
Chong Chen,
Hao Liao,
Vadim V. Vorobyov,
Joerg Wrachtrup,
and Ren-Bao Liu
Abstract:
Test of {macroscopic realism} (MR) is key to understanding the foundation of quantum mechanics. Due to the existence of the {non-invasive measurability} loophole and other interpretation loopholes, however, such test remains an open question. Here we propose a general inequality based on high-order correlations of measurements for a loophole-free test of MR at the weak signal limit. Importantly, t…
▽ More
Test of {macroscopic realism} (MR) is key to understanding the foundation of quantum mechanics. Due to the existence of the {non-invasive measurability} loophole and other interpretation loopholes, however, such test remains an open question. Here we propose a general inequality based on high-order correlations of measurements for a loophole-free test of MR at the weak signal limit. Importantly, the inequality is established using the statistics of \textit{raw data} recorded by classical devices, without requiring a specific model for the measurement process, so its violation would falsify MR without the interpretation loophole. The non-invasive measurability loophole is also closed, since the weak signal limit can be verified solely by measurement data (using the relative scaling behaviors of different orders of correlations). We demonstrate that the inequality can be broken by a quantum spin model. The inequality proposed here provides an unambiguous test of the MR principle and is also useful to characterizing {quantum coherence}.
△ Less
Submitted 15 January, 2024; v1 submitted 10 January, 2024;
originally announced January 2024.
-
Nematic quantum disordered state in FeSe
Authors:
Ruixian Liu,
Matthew B. Stone,
Shang Gao,
Mitsutaka Nakamura,
Kazuya Kamazawa,
Aleksandra Krajewska,
Helen C. Walker,
Peng Cheng,
Rong Yu,
Qimiao Si,
Pengcheng Dai,
Xingye Lu
Abstract:
The unusual quantum-disordered magnetic ground state intertwined with superconductivity and electronic nematicity in FeSe has been a research focus in iron-based superconductors. However, the intrinsic spin excitations across the entire Brillouin zone in detwinned FeSe, which forms the basis for a microscopic understanding of the magnetic state and superconductivity, remain to be determined. Here,…
▽ More
The unusual quantum-disordered magnetic ground state intertwined with superconductivity and electronic nematicity in FeSe has been a research focus in iron-based superconductors. However, the intrinsic spin excitations across the entire Brillouin zone in detwinned FeSe, which forms the basis for a microscopic understanding of the magnetic state and superconductivity, remain to be determined. Here, we use inelastic neutron scattering to map out the spin excitations of FeSe dewtinned with a uniaxial-strain device. We find that the stripe spin excitations (Q=(1, 0)/(0, 1)) exhibit the $C_2$ symmetry up to $E\approx120$ meV, while the N{é}el spin excitations (Q=(1, 1)) retain their $C_4$ symmetry in the nematic state. The temperature dependence of the difference in the spin excitations at Q=(1, 0) and (0, 1) for temperatures above the structural phase transition unambiguously shows the establishment of the nematic quantum disordered state. The similarity of the Néel excitations in FeSe and NaFeAs suggests that the Néel excitations are driven by the enhanced electron correlations in the $3d_{xy}$ orbital. By determining the key features of the stripe excitations and fitting their dispersions using a Heisenberg Hamiltonian with biquadratic interaction ($J_1$-$K$-$J_2$), we establish a spin-interaction phase diagram and conclude that FeSe is close to a crossover region between the antiferroquadrupolar, Néel, and stripe ordering regimes. The results provide an experimental basis for establishing a microscopic theoretical model to describe the origin and intertwining of the emergent orders in iron-based superconductors.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Large Model based Sequential Keyframe Extraction for Video Summarization
Authors:
Kailong Tan,
Yuxiang Zhou,
Qianchen Xia,
Rui Liu,
Yong Chen
Abstract:
Keyframe extraction aims to sum up a video's semantics with the minimum number of its frames. This paper puts forward a Large Model based Sequential Keyframe Extraction for video summarization, dubbed LMSKE, which contains three stages as below. First, we use the large model "TransNetV21" to cut the video into consecutive shots, and employ the large model "CLIP2" to generate each frame's visual fe…
▽ More
Keyframe extraction aims to sum up a video's semantics with the minimum number of its frames. This paper puts forward a Large Model based Sequential Keyframe Extraction for video summarization, dubbed LMSKE, which contains three stages as below. First, we use the large model "TransNetV21" to cut the video into consecutive shots, and employ the large model "CLIP2" to generate each frame's visual feature within each shot; Second, we develop an adaptive clustering algorithm to yield candidate keyframes for each shot, with each candidate keyframe locating nearest to a cluster center; Third, we further reduce the above candidate keyframes via redundancy elimination within each shot, and finally concatenate them in accordance with the sequence of shots as the final sequential keyframes. To evaluate LMSKE, we curate a benchmark dataset and conduct rich experiments, whose results exhibit that LMSKE performs much better than quite a few SOTA competitors with average F1 of 0.5311, average fidelity of 0.8141, and average compression ratio of 0.9922.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Recanting twins: addressing intermediate confounding in mediation analysis
Authors:
Tat-Thang Vo,
Nicholas Williams,
Richard Liu,
Kara E. Rudolph,
Ivan Dıaz
Abstract:
The presence of intermediate confounders, also called recanting witnesses, is a fundamental challenge to the investigation of causal mechanisms in mediation analysis, preventing the identification of natural path-specific effects. Proposed alternative parameters (such as randomizational interventional effects) are problematic because they can be non-null even when there is no mediation for any ind…
▽ More
The presence of intermediate confounders, also called recanting witnesses, is a fundamental challenge to the investigation of causal mechanisms in mediation analysis, preventing the identification of natural path-specific effects. Proposed alternative parameters (such as randomizational interventional effects) are problematic because they can be non-null even when there is no mediation for any individual in the population; i.e., they are not an average of underlying individual-level mechanisms. In this paper we develop a novel method for mediation analysis in settings with intermediate confounding, with guarantees that the causal parameters are summaries of the individual-level mechanisms of interest. The method is based on recently proposed ideas that view causality as the transfer of information, and thus replace recanting witnesses by draws from their conditional distribution, what we call "recanting twins". We show that, in the absence of intermediate confounding, recanting twin effects recover natural path-specific effects. We present the assumptions required for identification of recanting twins effects under a standard structural causal model, as well as the assumptions under which the recanting twin identification formulas can be interpreted in the context of the recently proposed separable effects models. To estimate recanting-twin effects, we develop efficient semi-parametric estimators that allow the use of data driven methods in the estimation of the nuisance parameters. We present numerical studies of the methods using synthetic data, as well as an application to evaluate the role of new-onset anxiety and depressive disorder in explaining the relationship between gabapentin/pregabalin prescription and incident opioid use disorder among Medicaid beneficiaries with chronic pain.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Two-Step Targeted Minimum-Loss Based Estimation for Non-Negative Two-Part Outcomes
Authors:
Nicholas T. Williams,
Richard Liu,
Katherine L. Hoffman,
Sarah Forrest,
Kara E. Rudolph,
Iván Díaz
Abstract:
Non-negative two-part outcomes are defined as outcomes with a density function that have a zero point mass but are otherwise positive. Examples, such as healthcare expenditure and hospital length of stay, are common in healthcare utilization research. Despite the practical relevance of non-negative two-part outcomes, very few methods exist to leverage knowledge of their semicontinuity to achieve i…
▽ More
Non-negative two-part outcomes are defined as outcomes with a density function that have a zero point mass but are otherwise positive. Examples, such as healthcare expenditure and hospital length of stay, are common in healthcare utilization research. Despite the practical relevance of non-negative two-part outcomes, very few methods exist to leverage knowledge of their semicontinuity to achieve improved performance in estimating causal effects. In this paper, we develop a nonparametric two-step targeted minimum-loss based estimator (denoted as hTMLE) for non-negative two-part outcomes. We present methods for a general class of interventions referred to as modified treatment policies, which can accommodate continuous, categorical, and binary exposures. The two-step TMLE uses a targeted estimate of the intensity component of the outcome to produce a targeted estimate of the binary component of the outcome that may improve finite sample efficiency. We demonstrate the efficiency gains achieved by the two-step TMLE with simulated examples and then apply it to a cohort of Medicaid beneficiaries to estimate the effect of chronic pain and physical disability on days' supply of opioids.
△ Less
Submitted 22 April, 2024; v1 submitted 8 January, 2024;
originally announced January 2024.
-
A Practical Beamforming Design for Active RIS-assisted MU-MISO Systems
Authors:
Yun Yang,
Zhi** Lu,
Ming Li,
Rang Liu,
Qian Liu
Abstract:
Reconfigurable Intelligent Surfaces (RIS) have been proposed as a revolutionary technology with the potential to address several critical requirements of 6G communication systems. Despite its powerful ability for radio environment reconfiguration, the ``double fading'' effect constricts the practical system performance enhancements due to the significant path loss. A new active RIS architecture ha…
▽ More
Reconfigurable Intelligent Surfaces (RIS) have been proposed as a revolutionary technology with the potential to address several critical requirements of 6G communication systems. Despite its powerful ability for radio environment reconfiguration, the ``double fading'' effect constricts the practical system performance enhancements due to the significant path loss. A new active RIS architecture has been recently proposed to overcome this challenge. However, existing active RIS studies rely on an ideal amplification model without considering the practical hardware limitation of amplifiers, which may cause performance degradation using such inaccurate active RIS modeling. Motivated by this fact, in this paper we first investigate the amplification principle of typical active RIS and propose a more accurate amplification model based on amplifier hardware characteristics. Then, based on the new amplification model, we propose a novel joint transmit beamforming and RIS reflection beamforming design considering the incident signal power on practical active RIS for multiuser multi-input single-output (MU-MISO) communication system. Fractional programming (FP), majorization minimization (MM) and block coordinate descent (BCD) methods are used to solve for the complex problem. Simulation results indicate the importance of the consideration of practical amplifier hardware characteristics in the joint beamforming designs and demonstrate the effectiveness of the proposed algorithm compared to other benchmarks.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
3D-SSGAN: Lifting 2D Semantics for 3D-Aware Compositional Portrait Synthesis
Authors:
Ruiqi Liu,
Peng Zheng,
Ye Wang,
Rui Ma
Abstract:
Existing 3D-aware portrait synthesis methods can generate impressive high-quality images while preserving strong 3D consistency. However, most of them cannot support the fine-grained part-level control over synthesized images. Conversely, some GAN-based 2D portrait synthesis methods can achieve clear disentanglement of facial regions, but they cannot preserve view consistency due to a lack of 3D m…
▽ More
Existing 3D-aware portrait synthesis methods can generate impressive high-quality images while preserving strong 3D consistency. However, most of them cannot support the fine-grained part-level control over synthesized images. Conversely, some GAN-based 2D portrait synthesis methods can achieve clear disentanglement of facial regions, but they cannot preserve view consistency due to a lack of 3D modeling abilities. To address these issues, we propose 3D-SSGAN, a novel framework for 3D-aware compositional portrait image synthesis. First, a simple yet effective depth-guided 2D-to-3D lifting module maps the generated 2D part features and semantics to 3D. Then, a volume renderer with a novel 3D-aware semantic mask renderer is utilized to produce the composed face features and corresponding masks. The whole framework is trained end-to-end by discriminating between real and synthesized 2D images and their semantic masks. Quantitative and qualitative evaluations demonstrate the superiority of 3D-SSGAN in controllable part-level synthesis while preserving 3D view consistency.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
A Perturbed Value-Function-Based Interior-Point Method for Perturbed Pessimistic Bilevel Problems
Authors:
Haimei Huo,
Risheng Liu,
Zhixun Su
Abstract:
Bilevel optimizaiton serves as a powerful tool for many machine learning applications. Perturbed pessimistic bilevel problem PBP$ε$, with $ε$ being an arbitrary positive number, is a variant of the bilevel problem to deal with the case where there are multiple solutions in the lower level problem. However, the provably convergent algorithms for PBP$ε$ with a nonlinear lower level problem are lacki…
▽ More
Bilevel optimizaiton serves as a powerful tool for many machine learning applications. Perturbed pessimistic bilevel problem PBP$ε$, with $ε$ being an arbitrary positive number, is a variant of the bilevel problem to deal with the case where there are multiple solutions in the lower level problem. However, the provably convergent algorithms for PBP$ε$ with a nonlinear lower level problem are lacking. To fill the gap, we consider in the paper the problem PBP$ε$ with a nonlinear lower level problem. By introducing a log-barrier function to replace the inequality constraint associated with the value function of the lower level problem, and approximating this value function, an algorithm named Perturbed Value-Function-based Interior-point Method(PVFIM) is proposed. We present a stationary condition for PBP$ε$, which has not been given before, and we show that PVFIM can converge to a stationary point of PBP$ε$. Finally, experiments are presented to verify the theoretical results and to show the application of the algorithm to GAN.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
PeVatron Candidate SNR G106.3+2.7 in a Low-density Cavity: a Multiwavelength Test
Authors:
Yiwei Bao,
Ruo-Yu Liu,
Chong Ge,
Yang Chen
Abstract:
In this paper, we constrain the density of the interstellar medium (ISM) around the hadronic PeVatron candidate, supernova remnant (SNR) G106.3+2.7, based on X-ray and $γ$-ray observations. The purpose of this investigation is to understand the influence of the gaseous environment on this SNR as a proton PeVatron candidate. By modelling the self-regulated propagation of the CRs injected from the S…
▽ More
In this paper, we constrain the density of the interstellar medium (ISM) around the hadronic PeVatron candidate, supernova remnant (SNR) G106.3+2.7, based on X-ray and $γ$-ray observations. The purpose of this investigation is to understand the influence of the gaseous environment on this SNR as a proton PeVatron candidate. By modelling the self-regulated propagation of the CRs injected from the SNR, we calculate the $γ$-ray emission of CRs via the hadronuclear interactions with the molecular cloud and the ISM, and use the measured $γ$-ray flux to constrain the ISM density around the SNR. Our results support the picture that the SNR is expanding into a low-density ($n<0.05 cm^{-3}$) cavity, enabling the SNR to be a potential proton PeVatron despite that it presently is not in the very early phase.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Unconditionally positivity-preserving explicit Euler-type schemes for a generalized Ait-Sahalia model
Authors:
Ruishu Liu,
Yulin Cao,
Xiaojie Wang
Abstract:
The present work is devoted to strong approximations of a generalized Aït-Sahalia model arising from mathematical finance. The numerical study of the considered model faces essential difficulties caused by a drift that blows up at the origin, highly nonlinear drift and diffusion coefficients and positivity-preserving requirement. In this paper, a novel explicit Euler-type scheme is proposed, which…
▽ More
The present work is devoted to strong approximations of a generalized Aït-Sahalia model arising from mathematical finance. The numerical study of the considered model faces essential difficulties caused by a drift that blows up at the origin, highly nonlinear drift and diffusion coefficients and positivity-preserving requirement. In this paper, a novel explicit Euler-type scheme is proposed, which is easily implementable and able to preserve positivity of the original model unconditionally, i.e., for any time step-size $h >0$. A mean-square convergence rate of order $0.5$ is also obtained for the proposed scheme in both non-critical and general critical cases. Our work is motivated by the need to justify the multi-level Monte Carlo (MLMC) simulations for the underlying model, where the rate of mean-square convergence is required and the preservation of positivity is desirable particularly for large discretization time steps. Numerical experiments are finally provided to confirm the theoretical findings.
△ Less
Submitted 25 March, 2024; v1 submitted 4 January, 2024;
originally announced January 2024.
-
Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation
Authors:
Renshuai Liu,
Bowen Ma,
Wei Zhang,
Zhipeng Hu,
Changjie Fan,
Tangjie Lv,
Yu Ding,
Xuan Cheng
Abstract:
In human-centric content generation, the pre-trained text-to-image models struggle to produce user-wanted portrait images, which retain the identity of individuals while exhibiting diverse expressions. This paper introduces our efforts towards personalized face generation. To this end, we propose a novel multi-modal face generation framework, capable of simultaneous identity-expression control and…
▽ More
In human-centric content generation, the pre-trained text-to-image models struggle to produce user-wanted portrait images, which retain the identity of individuals while exhibiting diverse expressions. This paper introduces our efforts towards personalized face generation. To this end, we propose a novel multi-modal face generation framework, capable of simultaneous identity-expression control and more fine-grained expression synthesis. Our expression control is so sophisticated that it can be specialized by the fine-grained emotional vocabulary. We devise a novel diffusion model that can undertake the task of simultaneously face swap** and reenactment. Due to the entanglement of identity and expression, it's nontrivial to separately and precisely control them in one framework, thus has not been explored yet. To overcome this, we propose several innovative designs in the conditional diffusion model, including balancing identity and expression encoder, improved midpoint sampling, and explicitly background conditioning. Extensive experiments have demonstrated the controllability and scalability of the proposed framework, in comparison with state-of-the-art text-to-image, face swap**, and face reenactment methods.
△ Less
Submitted 6 April, 2024; v1 submitted 2 January, 2024;
originally announced January 2024.
-
TBDD: A New Trust-based, DRL-driven Framework for Blockchain Sharding in IoT
Authors:
Zixu Zhang,
Guangsheng Yu,
Caijun Sun,
Xu Wang,
Ying Wang,
Ming Zhang,
Wei Ni,
Ren ** Liu,
Andrew Reeves,
Nektarios Georgalas
Abstract:
Integrating sharded blockchain with IoT presents a solution for trust issues and optimized data flow. Sharding boosts blockchain scalability by dividing its nodes into parallel shards, yet it's vulnerable to the $1\%$ attacks where dishonest nodes target a shard to corrupt the entire blockchain. Balancing security with scalability is pivotal for such systems. Deep Reinforcement Learning (DRL) adep…
▽ More
Integrating sharded blockchain with IoT presents a solution for trust issues and optimized data flow. Sharding boosts blockchain scalability by dividing its nodes into parallel shards, yet it's vulnerable to the $1\%$ attacks where dishonest nodes target a shard to corrupt the entire blockchain. Balancing security with scalability is pivotal for such systems. Deep Reinforcement Learning (DRL) adeptly handles dynamic, complex systems and multi-dimensional optimization. This paper introduces a Trust-based and DRL-driven (\textsc{TbDd}) framework, crafted to counter shard collusion risks and dynamically adjust node allocation, enhancing throughput while maintaining network security. With a comprehensive trust evaluation mechanism, \textsc{TbDd} discerns node types and performs targeted resharding against potential threats. The model maximizes tolerance for dishonest nodes, optimizes node movement frequency, ensures even node distribution in shards, and balances sharding risks. Rigorous evaluations prove \textsc{TbDd}'s superiority over conventional random-, community-, and trust-based sharding methods in shard risk equilibrium and reducing cross-shard transactions.
△ Less
Submitted 31 December, 2023;
originally announced January 2024.
-
From Text to Pixels: A Context-Aware Semantic Synergy Solution for Infrared and Visible Image Fusion
Authors:
Xingyuan Li,
Yang Zou,
**yuan Liu,
Zhiying Jiang,
Long Ma,
Xin Fan,
Risheng Liu
Abstract:
With the rapid progression of deep learning technologies, multi-modality image fusion has become increasingly prevalent in object detection tasks. Despite its popularity, the inherent disparities in how different sources depict scene content make fusion a challenging problem. Current fusion methodologies identify shared characteristics between the two modalities and integrate them within this shar…
▽ More
With the rapid progression of deep learning technologies, multi-modality image fusion has become increasingly prevalent in object detection tasks. Despite its popularity, the inherent disparities in how different sources depict scene content make fusion a challenging problem. Current fusion methodologies identify shared characteristics between the two modalities and integrate them within this shared domain using either iterative optimization or deep learning architectures, which often neglect the intricate semantic relationships between modalities, resulting in a superficial understanding of inter-modal connections and, consequently, suboptimal fusion outcomes. To address this, we introduce a text-guided multi-modality image fusion method that leverages the high-level semantics from textual descriptions to integrate semantics from infrared and visible images. This method capitalizes on the complementary characteristics of diverse modalities, bolstering both the accuracy and robustness of object detection. The codebook is utilized to enhance a streamlined and concise depiction of the fused intra- and inter-domain dynamics, fine-tuned for optimal performance in detection tasks. We present a bilevel optimization strategy that establishes a nexus between the joint problem of fusion and detection, optimizing both processes concurrently. Furthermore, we introduce the first dataset of paired infrared and visible images accompanied by text prompts, paving the way for future research. Extensive experiments on several datasets demonstrate that our method not only produces visually superior fusion results but also achieves a higher detection mAP over existing methods, achieving state-of-the-art results.
△ Less
Submitted 31 December, 2023;
originally announced January 2024.
-
Near-Space Communications: the Last Piece of 6G Space-Air-Ground-Sea Integrated Network Puzzle
Authors:
Hongshan Liu,
Tong Qin,
Zhen Gao,
Tianqi Mao,
Keke Ying,
Ziwei Wan,
Li Qiao,
Rui Na,
Zhongxiang Li,
Chun Hu,
Yikun Mei,
Tuan Li,
Guanghui Wen,
Lei Chen,
Zhonghuai Wu,
Ruiqi Liu,
Gaojie Chen,
Shuo Wang,
Dezhi Zheng
Abstract:
This article presents a comprehensive study on the emerging near-space communications (NS-COM) within the context of space-air-ground-sea integrated network (SAGSIN). Specifically, we firstly explore the recent technical developments of NS-COM, followed by the discussions about motivations behind integrating NS-COM into SAGSIN. To further demonstrate the necessity of NS-COM, a comparative analysis…
▽ More
This article presents a comprehensive study on the emerging near-space communications (NS-COM) within the context of space-air-ground-sea integrated network (SAGSIN). Specifically, we firstly explore the recent technical developments of NS-COM, followed by the discussions about motivations behind integrating NS-COM into SAGSIN. To further demonstrate the necessity of NS-COM, a comparative analysis between the NS-COM network and other counterparts in SAGSIN is conducted, covering aspects of deployment, coverage, channel characteristics and unique problems of NS-COM network. Afterwards, the technical aspects of NS-COM, including channel modeling, random access, channel estimation, array-based beam management and joint network optimization, are examined in detail. Furthermore, we explore the potential applications of NS-COM, such as structural expansion in SAGSIN communication, civil aviation communication, remote and urgent communication, weather monitoring and carbon neutrality. Finally, some promising research avenues are identified, including stratospheric satellite (StratoSat) -to-ground direct links for mobile terminals, reconfigurable multiple-input multiple-output (MIMO) and holographic MIMO, federated learning in NS-COM networks, maritime communication, electromagnetic spectrum sensing and adversarial game, integrated sensing and communications, StratoSat-based radar detection and imaging, NS-COM assisted enhanced global navigation system, NS-COM assisted intelligent unmanned system and free space optical (FSO) communication. Overall, this paper highlights that the NS-COM plays an indispensable role in the SAGSIN puzzle, providing substantial performance and coverage enhancement to the traditional SAGSIN architecture.
△ Less
Submitted 4 March, 2024; v1 submitted 30 December, 2023;
originally announced January 2024.
-
Sample Robust Scheduling of Electricity-Gas Systems Under Wind Power Uncertainty
Authors:
Rong-Peng Liu,
Yunhe Hou,
Yujia Li,
Shunbo Lei,
Wei Wei,
Xiaozhe Wang
Abstract:
This paper adopts a two-stage sample robust optimization (SRO) model to address the wind power penetrated unit commitment optimal energy flow (UC-OEF) problem for IEGSs. The two-stage SRO model can be approximately transformed into a computationally efficient form. Specifically, we employ linear decision rules to simplify the proposed UC-OEF model. Moreover, we further enhance the tractability of…
▽ More
This paper adopts a two-stage sample robust optimization (SRO) model to address the wind power penetrated unit commitment optimal energy flow (UC-OEF) problem for IEGSs. The two-stage SRO model can be approximately transformed into a computationally efficient form. Specifically, we employ linear decision rules to simplify the proposed UC-OEF model. Moreover, we further enhance the tractability of the simplified model by exploring its structural features and, accordingly, develop a solution method.
△ Less
Submitted 30 December, 2023;
originally announced January 2024.
-
Sparsity Exploitation via Joint Receive Processing and Transmit Beamforming Design for MIMO-OFDM ISAC Systems
Authors:
Zichao Xiao,
Rang Liu,
Ming Li,
Wei Wang,
Qian Liu
Abstract:
Integrated sensing and communication (ISAC) is widely recognized as a pivotal enabling technique for the advancement of future wireless networks. This paper aims to efficiently exploit the inherent sparsity of echo signals for the multi-input-multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) based ISAC system. A novel joint receive echo processing and transmit beamforming desig…
▽ More
Integrated sensing and communication (ISAC) is widely recognized as a pivotal enabling technique for the advancement of future wireless networks. This paper aims to efficiently exploit the inherent sparsity of echo signals for the multi-input-multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) based ISAC system. A novel joint receive echo processing and transmit beamforming design is presented to achieve this goal. Specifically, we first propose a compressive sensing (CS)-assisted estimation approach to facilitate ISAC receive echo processing, which can not only enable accurate recovery of target information, but also allow substantial reduction in the number of sensing subcarriers to be sampled and processed. Then, based on the proposed CS-assisted processing method, the associated transmit beamforming design is formulated with the objective of maximizing the sum-rate of multiuser communications while satisfying the transmit power budget and ensuring the received signal-to-noise ratio (SNR) for the designated sensing subcarriers. In order to address the formulated non-convex problem involving high-dimensional variables, an effective iterative algorithm employing majorization minimization (MM), fractional programming (FP), and the nonlinear equality alternative direction method of multipliers (neADMM) with closed-form solutions has been developed. Finally, extensive numerical simulations are conducted to verify the effectiveness of the proposed algorithm and the superior performance of the introduced sparsity exploitation strategy.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
Modeling Load Redistribution Attacks in Integrated Electricity-Gas Systems
Authors:
Rong-Peng Liu,
Xiaozhe Wang,
Bo Zeng,
Rawad Zgheib
Abstract:
We investigate load redistribution (LR) attacks on integrated electricity-gas systems (IEGSs) and proposes a bilevel mixed-integer model to identify the most severe LR attack from an economic perspective. Under a mild assumption, we prove that the proposed model does not exclude any possible upper-level attack. A modified reformulation and decomposition (R&D) algorithm is developed to solve this m…
▽ More
We investigate load redistribution (LR) attacks on integrated electricity-gas systems (IEGSs) and proposes a bilevel mixed-integer model to identify the most severe LR attack from an economic perspective. Under a mild assumption, we prove that the proposed model does not exclude any possible upper-level attack. A modified reformulation and decomposition (R&D) algorithm is developed to solve this model in a master-subproblem framework. Particularly, we design a subproblem to address infeasibility issues in the master problem. Accordingly, two types of cuts are added to the master problem for ensuring algorithm feasibility and solution optimality.
△ Less
Submitted 27 December, 2023;
originally announced December 2023.
-
Align on the Fly: Adapting Chatbot Behavior to Established Norms
Authors:
Chunpu Xu,
Steffi Chern,
Ethan Chern,
Ge Zhang,
Zekun Wang,
Ruibo Liu,
**g Li,
Jie Fu,
Pengfei Liu
Abstract:
In this paper, we aim to align large language models with the ever-changing, complex, and diverse human values (e.g., social norms) across time and locations. This presents a challenge to existing alignment techniques, such as supervised fine-tuning, which internalize values within model parameters. To overcome this, we propose an On-the-fly Preference Optimization (OPO) method, which is a real-ti…
▽ More
In this paper, we aim to align large language models with the ever-changing, complex, and diverse human values (e.g., social norms) across time and locations. This presents a challenge to existing alignment techniques, such as supervised fine-tuning, which internalize values within model parameters. To overcome this, we propose an On-the-fly Preference Optimization (OPO) method, which is a real-time alignment that works in a streaming way. It employs an external memory to store established rules for alignment, which can constrain LLMs' behaviors without further training, allowing for convenient updates and customization of human values. We also introduce a scalable evaluation to assess the proposed method more effectively. Experimental results on both human-annotated and auto-generated questions from legal and moral domains indicate the effectiveness of the proposed OPO method. Our code and data are released at https://github.com/GAIR-NLP/OPO.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
UVAGaze: Unsupervised 1-to-2 Views Adaptation for Gaze Estimation
Authors:
Ruicong Liu,
Feng Lu
Abstract:
Gaze estimation has become a subject of growing interest in recent research. Most of the current methods rely on single-view facial images as input. Yet, it is hard for these approaches to handle large head angles, leading to potential inaccuracies in the estimation. To address this issue, adding a second-view camera can help better capture eye appearance. However, existing multi-view methods have…
▽ More
Gaze estimation has become a subject of growing interest in recent research. Most of the current methods rely on single-view facial images as input. Yet, it is hard for these approaches to handle large head angles, leading to potential inaccuracies in the estimation. To address this issue, adding a second-view camera can help better capture eye appearance. However, existing multi-view methods have two limitations. 1) They require multi-view annotations for training, which are expensive. 2) More importantly, during testing, the exact positions of the multiple cameras must be known and match those used in training, which limits the application scenario. To address these challenges, we propose a novel 1-view-to-2-views (1-to-2 views) adaptation solution in this paper, the Unsupervised 1-to-2 Views Adaptation framework for Gaze estimation (UVAGaze). Our method adapts a traditional single-view gaze estimator for flexibly placed dual cameras. Here, the "flexibly" means we place the dual cameras in arbitrary places regardless of the training data, without knowing their extrinsic parameters. Specifically, the UVAGaze builds a dual-view mutual supervision adaptation strategy, which takes advantage of the intrinsic consistency of gaze directions between both views. In this way, our method can not only benefit from common single-view pre-training, but also achieve more advanced dual-view gaze estimation. The experimental results show that a single-view estimator, when adapted for dual views, can achieve much higher accuracy, especially in cross-dataset settings, with a substantial improvement of 47.0%. Project page: https://github.com/MickeyLLG/UVAGaze.
△ Less
Submitted 25 December, 2023;
originally announced December 2023.
-
Global dynamics for the stochastic nonlinear beam equations on the four-dimensional torus
Authors:
Andreia Chapouto,
Guopeng Li,
Ruoyuan Liu
Abstract:
We study global-in-time dynamics of the stochastic nonlinear beam equations (SNLB) with an additive space-time white noise, posed on the four-dimensional torus. The roughness of the noise leads us to introducing a time-dependent renormalization, after which we show that SNLB is pathwise locally well-posed in all subcritical and most of the critical regimes. For the (renormalized) defocusing cubic…
▽ More
We study global-in-time dynamics of the stochastic nonlinear beam equations (SNLB) with an additive space-time white noise, posed on the four-dimensional torus. The roughness of the noise leads us to introducing a time-dependent renormalization, after which we show that SNLB is pathwise locally well-posed in all subcritical and most of the critical regimes. For the (renormalized) defocusing cubic SNLB, we establish pathwise global well-posedness below the energy space, by adapting a hybrid argument of Gubinelli- Koch-Oh-Tolomeo (2022) that combines the $I$-method with a Gronwall-type argument. Lastly, we show almost sure global well-posedness and invariance of the Gibbs measure for the stochastic damped nonlinear beam equations in the defocusing case.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Nematic charge-density-wave correlations in FeSe$_{1-x}$S$_{x}$
Authors:
Ruixian Liu,
Wenliang Zhang,
Yuan Wei,
Zhen Tao,
Teguh C. Asmara,
Vladimir N. Strocov,
Thorsten Schmitt,
Xingye Lu
Abstract:
The occurrence of charge-density-wave (CDW) order is a common thread in the phase diagram of cuprate high-transition-temperature ($T_c$) superconductors. In iron-based superconductors (FeSCs), nematic order and fluctuations play a decisive role in driving other emergent orders. CDW order has been observed by scanning tunneling microscopy for various FeSCs such as FeSe thin films, uniaxially strain…
▽ More
The occurrence of charge-density-wave (CDW) order is a common thread in the phase diagram of cuprate high-transition-temperature ($T_c$) superconductors. In iron-based superconductors (FeSCs), nematic order and fluctuations play a decisive role in driving other emergent orders. CDW order has been observed by scanning tunneling microscopy for various FeSCs such as FeSe thin films, uniaxially strained LiFeAs, and tetragonal FeSe$_{0.81}$S$_{0.19}$. However, it remains elusive if the CDW in these materials is a bulk phenomenon as well as if and how it intertwines with the electronic nematicity. Using energy-resolved resonant X-ray scattering at the Fe-L$_3$ edge, we report the discovery of a local-strain-induced incommensurate isotropic CDW order in FeSe$_{0.82}$S$_{0.18}$. A highly anisotropic CDW response under uniaxial strain unambiguously manifests that the CDW is directly coupled to the nematicity. Transforming part of Fe$^{2+}$ to Fe$^{3+}$ on the surface of FeSe$_{1-x}$S$_{x}$ reveals that the same isotropic CDW can be induced, enhanced, and stabilized in the whole nematic regime measured ($x=0-0.19$). As Fe$^{3+}$ can create local lattice distortions on the surface, the CDW could arise from the interaction between the local strain around Fe$^{3+}$ and the nematic electron correlations. Our experimental observation of a local-strain-induced CDW gives vital information for understanding the interplay between electron correlations and the electronic nematicity in FeSCs.
△ Less
Submitted 21 December, 2023; v1 submitted 19 December, 2023;
originally announced December 2023.
-
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling
Authors:
Rui Liu,
Yifan Hu,
Yi Ren,
Xiang Yin,
Haizhou Li
Abstract:
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting. While recognising the significance of CSS task, the prior studies have not thoroughly investigated the emotional expressiveness problems due to the scarcity of emotional conversational datasets and the difficulty of stateful emotion mo…
▽ More
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting. While recognising the significance of CSS task, the prior studies have not thoroughly investigated the emotional expressiveness problems due to the scarcity of emotional conversational datasets and the difficulty of stateful emotion modeling. In this paper, we propose a novel emotional CSS model, termed ECSS, that includes two main components: 1) to enhance emotion understanding, we introduce a heterogeneous graph-based emotional context modeling mechanism, which takes the multi-source dialogue history as input to model the dialogue context and learn the emotion cues from the context; 2) to achieve emotion rendering, we employ a contrastive learning-based emotion renderer module to infer the accurate emotion style for the target utterance. To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity, and annotate additional emotional information on the existing conversational dataset (DailyTalk). Both objective and subjective evaluations suggest that our model outperforms the baseline models in understanding and rendering emotions. These evaluations also underscore the importance of comprehensive emotional annotations. Code and audio samples can be found at: https://github.com/walker-hyf/ECSS.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Prompt Based Tri-Channel Graph Convolution Neural Network for Aspect Sentiment Triplet Extraction
Authors:
Kun Peng,
Lei Jiang,
Hao Peng,
Rui Liu,
Zhengtao Yu,
Jiaqian Ren,
Zhifeng Hao,
Philip S. Yu
Abstract:
Aspect Sentiment Triplet Extraction (ASTE) is an emerging task to extract a given sentence's triplets, which consist of aspects, opinions, and sentiments. Recent studies tend to address this task with a table-filling paradigm, wherein word relations are encoded in a two-dimensional table, and the process involves clarifying all the individual cells to extract triples. However, these studies ignore…
▽ More
Aspect Sentiment Triplet Extraction (ASTE) is an emerging task to extract a given sentence's triplets, which consist of aspects, opinions, and sentiments. Recent studies tend to address this task with a table-filling paradigm, wherein word relations are encoded in a two-dimensional table, and the process involves clarifying all the individual cells to extract triples. However, these studies ignore the deep interaction between neighbor cells, which we find quite helpful for accurate extraction. To this end, we propose a novel model for the ASTE task, called Prompt-based Tri-Channel Graph Convolution Neural Network (PT-GCN), which converts the relation table into a graph to explore more comprehensive relational information. Specifically, we treat the original table cells as nodes and utilize a prompt attention score computation module to determine the edges' weights. This enables us to construct a target-aware grid-like graph to enhance the overall extraction process. After that, a triple-channel convolution module is conducted to extract precise sentiment knowledge. Extensive experiments on the benchmark datasets show that our model achieves state-of-the-art performance. The code is available at https://github.com/KunPunCN/PT-GCN.
△ Less
Submitted 24 December, 2023; v1 submitted 18 December, 2023;
originally announced December 2023.
-
A novel diffusion recommendation algorithm based on multi-scale cnn and residual lstm
Authors:
Yong Niu,
Xing Xing,
Zhichun Jia,
Ruidi Liu,
Mindong Xin
Abstract:
Sequential recommendation aims to infer user preferences from historical interaction sequences and predict the next item that users may be interested in the future. The current mainstream design approach is to represent items as fixed vectors, capturing the underlying relationships between items and user preferences based on the order of interactions. However, relying on a single fixed-item embedd…
▽ More
Sequential recommendation aims to infer user preferences from historical interaction sequences and predict the next item that users may be interested in the future. The current mainstream design approach is to represent items as fixed vectors, capturing the underlying relationships between items and user preferences based on the order of interactions. However, relying on a single fixed-item embedding may weaken the modeling capability of the system, and the global dynamics and local saliency exhibited by user preferences need to be distinguished. To address these issues, this paper proposes a novel diffusion recommendation algorithm based on multi-scale cnn and residual lstm (AREAL). We introduce diffusion models into the recommend system, representing items as probability distributions instead of fixed vectors. This approach enables adaptive reflection of multiple aspects of the items and generates item distributions in a denoising manner. We use multi-scale cnn and residual lstm methods to extract the local and global dependency features of user history interactions, and use attention mechanism to distinguish weights as the guide features of reverse diffusion recovery. The effectiveness of the proposed method is validated through experiments conducted on two real-world datasets. Specifically, AREAL obtains improvements over the best baselines by 2.63% and 4.25% in terms of HR@20 and 5.05% and 3.94% in terms of NDCG@20 on all datasets.
△ Less
Submitted 20 December, 2023; v1 submitted 17 December, 2023;
originally announced December 2023.
-
A Novel RFID Authentication Protocol Based on A Block-Order-Modulus Variable Matrix Encryption Algorithm
Authors:
Yan Wang,
Ruiqi Liu,
Tong Gao,
Feng Shu,
Xuemei Lei,
Guan Gui,
Jiangzhou Wang
Abstract:
In this paper, authentication for mobile radio frequency identification (RFID) systems with low-cost tags is studied. Firstly, an adaptive modulus (AM) encryption algorithm is proposed. Subsequently, in order to enhance the security without additional storage of new key matrices, a self-updating encryption order (SUEO) algorithm is designed. Furthermore, a diagonal block local transpose key matrix…
▽ More
In this paper, authentication for mobile radio frequency identification (RFID) systems with low-cost tags is studied. Firstly, an adaptive modulus (AM) encryption algorithm is proposed. Subsequently, in order to enhance the security without additional storage of new key matrices, a self-updating encryption order (SUEO) algorithm is designed. Furthermore, a diagonal block local transpose key matrix (DBLTKM) encryption algorithm is presented, which effectively expands the feasible domain of the key space. Based on the above three algorithms, a novel joint AM-SUEO-DBLTKM encryption algorithm is constructed. Making full use of the advantages of the proposed joint algorithm, a two-way RFID authentication protocol, named AM-SUEO-DBLTKM-RFID, is proposed for mobile RFID systems. In addition, the Burrows-Abadi-Needham (BAN) logic and security analysis indicate that the proposed AM-SUEO-DBLTKM-RFID protocol can effectively combat various typical attacks. Numerical results demonstrate that the proposed AM-SUEO-DBLTKM algorithm can save 99.59\% of tag storage over traditional algorithms. Finally, the low computational complexity as well as the low storage cost of the proposed AM-SUEO-DBLTKM-RFID protocol facilitates deployment within low-cost RFID tags.
△ Less
Submitted 9 May, 2024; v1 submitted 16 December, 2023;
originally announced December 2023.
-
Quantitative Measurement of adhesion energy between nanolayers and substrates using a nanowire-supported bridging method
Authors:
Xiaodong Song,
Lizhen Hou,
Ruizhe Liu,
Noman Akhtar,
Peng Wang,
Shiliang Wang
Abstract:
The measurement of adhesion energy between nanolayers and substrates holds significant importance for the design, fabrication, and stability assessment of micro-/nanoscale devices relying on nanolayers. In this study, we propose a nanowire-supported bridging method based on an optical microscope-based nanomanipulation technique to quantitatively measure the adhesion energy between nanolayers and s…
▽ More
The measurement of adhesion energy between nanolayers and substrates holds significant importance for the design, fabrication, and stability assessment of micro-/nanoscale devices relying on nanolayers. In this study, we propose a nanowire-supported bridging method based on an optical microscope-based nanomanipulation technique to quantitatively measure the adhesion energy between nanolayers and substrates. Using this innovative approach, we conducted adhesion energy measurements between mica nanolayers and Si substrates, revealing a value of approximately 110 J/m2. Additionally, we discuss the applicable conditions of this new method. The proposed technique allows measurements in atmospheric conditions and is, in principle, applicable to all types of nanolayers and substrates. Consequently, it holds promise as a universal method for assessing adhesion energy between nanolayers and substrates, considering environmental factors such as atmosphere and roughness.
△ Less
Submitted 19 December, 2023; v1 submitted 16 December, 2023;
originally announced December 2023.
-
Surface wrinkling of a film coated to a graded substrate
Authors:
Rui-Cheng Liu,
Yang Liu,
Alain Goriely
Abstract:
We study the surface wrinkling of a stiff thin elastic film bonded to a compliant graded elastic substrate subject to compressive stress generated either by compression or growth of the bilayer. Our aim is to clarify the influence of the modulus gradient on the onset and surface pattern in this bilayers. Within the framework of finite elasticity, an exact bifurcation condition is obtained using th…
▽ More
We study the surface wrinkling of a stiff thin elastic film bonded to a compliant graded elastic substrate subject to compressive stress generated either by compression or growth of the bilayer. Our aim is to clarify the influence of the modulus gradient on the onset and surface pattern in this bilayers. Within the framework of finite elasticity, an exact bifurcation condition is obtained using the Stroh formulation and the surface impedance matrix method. Further analytical progress is made by focusing on the case of short wavelength limit for which the Wentzel-Kramers-Brillouin method can be used to resolve the eigenvalue problem of ordinary differential equations with variable coefficients. An explicit bifurcation condition is obtained from which asymptotic the critical buckling load and the critical wavelength are derived. In particular, we consider two distinct situations depending on the ratio $β$ of the shear modulus at the substrate surface to that at infinity. If $β$ is of $\mathcal{O}(1)$ or small, the parameters related to modulus gradient all appear in the high order terms and play an insignificant role in the bifurcation. In that case, it is the modulus ratio between the film and substrate surface that governs the onset of surface wrinkling. If, however, $β\gg1$, the modulus gradient affects the critical condition through leading-order terms. Through our analysis we unravel the influence of different material and geometric parameters, including the modulus gradient, on the bifurcation threshold and the associated wavelength which can be of importance in many biological and technological settings.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
Individualized Deepfake Detection Exploiting Traces Due to Double Neural-Network Operations
Authors:
Mushfiqur Rahman,
Runze Liu,
Chau-Wai Wong,
Huaiyu Dai
Abstract:
In today's digital landscape, journalists urgently require tools to verify the authenticity of facial images and videos depicting specific public figures before incorporating them into news stories. Existing deepfake detectors are not optimized for this detection task when an image is associated with a specific and identifiable individual. This study focuses on the deepfake detection of facial ima…
▽ More
In today's digital landscape, journalists urgently require tools to verify the authenticity of facial images and videos depicting specific public figures before incorporating them into news stories. Existing deepfake detectors are not optimized for this detection task when an image is associated with a specific and identifiable individual. This study focuses on the deepfake detection of facial images of individual public figures. We propose to condition the proposed detector on the identity of the identified individual given the advantages revealed by our theory-driven simulations. While most detectors in the literature rely on perceptible or imperceptible artifacts present in deepfake facial images, we demonstrate that the detection performance can be improved by exploiting the idempotency property of neural networks. In our approach, the training process involves double neural-network operations where we pass an authentic image through a deepfake simulating network twice. Experimental results show that the proposed method improves the area under the curve (AUC) from 0.92 to 0.94 and reduces its standard deviation by 17\%. For evaluating the detection performance of individual public figures, a facial image dataset with individuals' names is required, a criterion not met by the current deepfake datasets. To address this, we curated a dataset comprising 32k images featuring 45 public figures, which we intend to release to the public after the paper is published.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
SSTA: Salient Spatially Transformed Attack
Authors:
Renyang Liu,
Wei Zhou,
Sixin Wu,
Jun Zhao,
Kwok-Yan Lam
Abstract:
Extensive studies have demonstrated that deep neural networks (DNNs) are vulnerable to adversarial attacks, which brings a huge security risk to the further application of DNNs, especially for the AI models developed in the real world. Despite the significant progress that has been made recently, existing attack methods still suffer from the unsatisfactory performance of esca** from being detect…
▽ More
Extensive studies have demonstrated that deep neural networks (DNNs) are vulnerable to adversarial attacks, which brings a huge security risk to the further application of DNNs, especially for the AI models developed in the real world. Despite the significant progress that has been made recently, existing attack methods still suffer from the unsatisfactory performance of esca** from being detected by naked human eyes due to the formulation of adversarial example (AE) heavily relying on a noise-adding manner. Such mentioned challenges will significantly increase the risk of exposure and result in an attack to be failed. Therefore, in this paper, we propose the Salient Spatially Transformed Attack (SSTA), a novel framework to craft imperceptible AEs, which enhance the stealthiness of AEs by estimating a smooth spatial transform metric on a most critical area to generate AEs instead of adding external noise to the whole image. Compared to state-of-the-art baselines, extensive experiments indicated that SSTA could effectively improve the imperceptibility of the AEs while maintaining a 100\% attack success rate.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
DTA: Distribution Transform-based Attack for Query-Limited Scenario
Authors:
Renyang Liu,
Wei Zhou,
Xin **,
Song Gao,
Yuanyu Wang,
Ruxin Wang
Abstract:
In generating adversarial examples, the conventional black-box attack methods rely on sufficient feedback from the to-be-attacked models by repeatedly querying until the attack is successful, which usually results in thousands of trials during an attack. This may be unacceptable in real applications since Machine Learning as a Service Platform (MLaaS) usually only returns the final result (i.e., h…
▽ More
In generating adversarial examples, the conventional black-box attack methods rely on sufficient feedback from the to-be-attacked models by repeatedly querying until the attack is successful, which usually results in thousands of trials during an attack. This may be unacceptable in real applications since Machine Learning as a Service Platform (MLaaS) usually only returns the final result (i.e., hard-label) to the client and a system equipped with certain defense mechanisms could easily detect malicious queries. By contrast, a feasible way is a hard-label attack that simulates an attacked action being permitted to conduct a limited number of queries. To implement this idea, in this paper, we bypass the dependency on the to-be-attacked model and benefit from the characteristics of the distributions of adversarial examples to reformulate the attack problem in a distribution transform manner and propose a distribution transform-based attack (DTA). DTA builds a statistical map** from the benign example to its adversarial counterparts by tackling the conditional likelihood under the hard-label black-box settings. In this way, it is no longer necessary to query the target model frequently. A well-trained DTA model can directly and efficiently generate a batch of adversarial examples for a certain input, which can be used to attack un-seen models based on the assumed transferability. Furthermore, we surprisingly find that the well-trained DTA model is not sensitive to the semantic spaces of the training dataset, meaning that the model yields acceptable attack performance on other datasets. Extensive experiments validate the effectiveness of the proposed idea and the superiority of DTA over the state-of-the-art.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.