-
CoSeR: Bridging Image and Language for Cognitive Super-Resolution
Authors:
Haoze Sun,
Wenbo Li,
Jianzhuang Liu,
Haoyu Chen,
Ren**g Pei,
Xueyi Zou,
Youliang Yan,
Yujiu Yang
Abstract:
Existing super-resolution (SR) models primarily focus on restoring local texture details, often neglecting the global semantic information within the scene. This oversight can lead to the omission of crucial semantic details or the introduction of inaccurate textures during the recovery process. In our work, we introduce the Cognitive Super-Resolution (CoSeR) framework, empowering SR models with t…
▽ More
Existing super-resolution (SR) models primarily focus on restoring local texture details, often neglecting the global semantic information within the scene. This oversight can lead to the omission of crucial semantic details or the introduction of inaccurate textures during the recovery process. In our work, we introduce the Cognitive Super-Resolution (CoSeR) framework, empowering SR models with the capacity to comprehend low-resolution images. We achieve this by marrying image appearance and language understanding to generate a cognitive embedding, which not only activates prior information from large text-to-image diffusion models but also facilitates the generation of high-quality reference images to optimize the SR process. To further improve image fidelity, we propose a novel condition injection scheme called "All-in-Attention", consolidating all conditional information into a single module. Consequently, our method successfully restores semantically correct and photorealistic details, demonstrating state-of-the-art performance across multiple benchmarks. Code: https://github.com/VINHYU/CoSeR
△ Less
Submitted 20 December, 2023; v1 submitted 27 November, 2023;
originally announced November 2023.
-
RoboGPT: an intelligent agent of making embodied long-term decisions for daily instruction tasks
Authors:
Yaran Chen,
Wenbo Cui,
Yuanwen Chen,
Mining Tan,
Xinyao Zhang,
Dongbin Zhao,
He Wang
Abstract:
Robotic agents must master common sense and long-term sequential decisions to solve daily tasks through natural language instruction. The developments in Large Language Models (LLMs) in natural language processing have inspired efforts to use LLMs in complex robot planning. Despite LLMs' great generalization and comprehension of instruction tasks, LLMs-generated task plans sometimes lack feasibili…
▽ More
Robotic agents must master common sense and long-term sequential decisions to solve daily tasks through natural language instruction. The developments in Large Language Models (LLMs) in natural language processing have inspired efforts to use LLMs in complex robot planning. Despite LLMs' great generalization and comprehension of instruction tasks, LLMs-generated task plans sometimes lack feasibility and correctness. To address the problem, we propose a RoboGPT agent\footnote{our code and dataset will be released soon} for making embodied long-term decisions for daily tasks, with two modules: 1) LLMs-based planning with re-plan to break the task into multiple sub-goals; 2) RoboSkill individually designed for sub-goals to learn better navigation and manipulation skills. The LLMs-based planning is enhanced with a new robotic dataset and re-plan, called RoboGPT. The new robotic dataset of 67k daily instruction tasks is gathered for fine-tuning the Llama model and obtaining RoboGPT. RoboGPT planner with strong generalization can plan hundreds of daily instruction tasks. Additionally, a low-computational Re-Plan module is designed to allow plans to flexibly adapt to the environment, thereby addressing the nomenclature diversity challenge. The proposed RoboGPT agent outperforms SOTA methods on the ALFRED daily tasks. Moreover, RoboGPT planner exceeds SOTA LLM-based planners like ChatGPT in task-planning rationality for hundreds of unseen daily tasks, and even other domain tasks, while kee** the large model's original broad application and generality.
△ Less
Submitted 30 June, 2024; v1 submitted 27 November, 2023;
originally announced November 2023.
-
An analysis of the fragmentation of observing time at the Muztagh-ata site
Authors:
Gu Wen-bo,
Xu **g,
Feng Guo-jie,
Zhang Xuan,
Wang Le-tian,
Wang Xin-liang,
Ali Esamdin,
Shen li-xian
Abstract:
Cloud cover plays a pivotal role in assessing observational conditions for astronomical site-testing. Except for the fraction of observing time, its fragmentation also wields a significant influence on the quality of nighttime sky clarity. In this article, we introduce the function Gamma, designed to comprehensively capture both the fraction of available observing time and its continuity. Leveragi…
▽ More
Cloud cover plays a pivotal role in assessing observational conditions for astronomical site-testing. Except for the fraction of observing time, its fragmentation also wields a significant influence on the quality of nighttime sky clarity. In this article, we introduce the function Gamma, designed to comprehensively capture both the fraction of available observing time and its continuity. Leveraging in situ measurement data gathered at the Muztagh-ata site between 2017 and 2021, we showcase the effectiveness of our approach. The statistical result illustrates that the Muztagh-ata site affords approximately 122 nights of absolute clear and 205 very good nights annually, corresponding to Gamma greater than or equal 0.9 and Gamma greater than or equal 0.36 respectively.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
An Improved Quantum Private Set Intersection Protocol Based on Hadamard Gates
Authors:
Wenjie Liu,
Wenbo Li,
Haibin Wang
Abstract:
Recently, Liu and Yin (Int. J. Theor. Phys. 60, 2074-2083 (2021)) proposed a two-party private set intersection protocol based on quantum Fourier transform. We find the participant can deduce the other party's private information, which violates the security requirement of private set computation. In order to solve this problem, an improved private set intersection protocol based on Hadamard gate…
▽ More
Recently, Liu and Yin (Int. J. Theor. Phys. 60, 2074-2083 (2021)) proposed a two-party private set intersection protocol based on quantum Fourier transform. We find the participant can deduce the other party's private information, which violates the security requirement of private set computation. In order to solve this problem, an improved private set intersection protocol based on Hadamard gate is proposed. Firstly, the more feasible Hadamard gates are used to perform on the original n qubits instead of the quantum Fourier transform, which may reduce the difficulty of implementation. In addition, through the exclusive OR calculation, the participant's private information is randomly chosen and encoded on the additional n qubits, which prevents participants from obtaining the result of the difference set S-diff , and then avoids the internal leakage of private information. Finally, the correctness and security analysis are conducted to show the proposed protocol can guarantee the correctness of computation result as well as resist outside attacks and participant internal attacks.
△ Less
Submitted 1 October, 2023;
originally announced November 2023.
-
TextGuard: Provable Defense against Backdoor Attacks on Text Classification
Authors:
Hengzhi Pei,
**yuan Jia,
Wenbo Guo,
Bo Li,
Dawn Song
Abstract:
Backdoor attacks have become a major security threat for deploying machine learning models in security-critical applications. Existing research endeavors have proposed many defenses against backdoor attacks. Despite demonstrating certain empirical defense efficacy, none of these techniques could provide a formal and provable security guarantee against arbitrary attacks. As a result, they can be ea…
▽ More
Backdoor attacks have become a major security threat for deploying machine learning models in security-critical applications. Existing research endeavors have proposed many defenses against backdoor attacks. Despite demonstrating certain empirical defense efficacy, none of these techniques could provide a formal and provable security guarantee against arbitrary attacks. As a result, they can be easily broken by strong adaptive attacks, as shown in our evaluation. In this work, we propose TextGuard, the first provable defense against backdoor attacks on text classification. In particular, TextGuard first divides the (backdoored) training data into sub-training sets, achieved by splitting each training sentence into sub-sentences. This partitioning ensures that a majority of the sub-training sets do not contain the backdoor trigger. Subsequently, a base classifier is trained from each sub-training set, and their ensemble provides the final prediction. We theoretically prove that when the length of the backdoor trigger falls within a certain threshold, TextGuard guarantees that its prediction will remain unaffected by the presence of the triggers in training and testing inputs. In our evaluation, we demonstrate the effectiveness of TextGuard on three benchmark text classification tasks, surpassing the certification accuracy of existing certified defenses against backdoor attacks. Furthermore, we propose additional strategies to enhance the empirical performance of TextGuard. Comparisons with state-of-the-art empirical defenses validate the superiority of TextGuard in countering multiple backdoor attacks. Our code and data are available at https://github.com/AI-secure/TextGuard.
△ Less
Submitted 24 November, 2023; v1 submitted 18 November, 2023;
originally announced November 2023.
-
End-to-end Task-oriented Dialogue: A Survey of Tasks, Methods, and Future Directions
Authors:
Libo Qin,
Wenbo Pan,
Qiguang Chen,
Lizi Liao,
Zhou Yu,
Yue Zhang,
Wanxiang Che,
Min Li
Abstract:
End-to-end task-oriented dialogue (EToD) can directly generate responses in an end-to-end fashion without modular training, which attracts escalating popularity. The advancement of deep neural networks, especially the successful use of large pre-trained models, has further led to significant progress in EToD research in recent years. In this paper, we present a thorough review and provide a unifie…
▽ More
End-to-end task-oriented dialogue (EToD) can directly generate responses in an end-to-end fashion without modular training, which attracts escalating popularity. The advancement of deep neural networks, especially the successful use of large pre-trained models, has further led to significant progress in EToD research in recent years. In this paper, we present a thorough review and provide a unified perspective to summarize existing approaches as well as recent trends to advance the development of EToD research. The contributions of this paper can be summarized: (1) \textbf{\textit{First survey}}: to our knowledge, we take the first step to present a thorough survey of this research field; (2) \textbf{\textit{New taxonomy}}: we first introduce a unified perspective for EToD, including (i) \textit{Modularly EToD} and (ii) \textit{Fully EToD}; (3) \textbf{\textit{New Frontiers}}: we discuss some potential frontier areas as well as the corresponding challenges, ho** to spur breakthrough research in EToD field; (4) \textbf{\textit{Abundant resources}}: we build a public website\footnote{We collect the related papers, baseline projects, and leaderboards for the community at \url{https://etods.net/}.}, where EToD researchers could directly access the recent progress. We hope this work can serve as a thorough reference for the EToD research community.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Experimental and Theoretical Exploration of Terahertz Channel Performance through Glass Doors
Authors:
Da Li,
Wenbo Liu,
Menghan Wei,
Jiacheng Liu,
Guohao Liu,
Peian Li,
Houjun Sun,
Jianjun Ma
Abstract:
In the evolving landscape of terahertz communication, the behavior of channels within indoor environments, particularly through glass doors, has garnered significant attention. This paper comprehensively investigates terahertz channel performance under such conditions, employing a measurement setup operational between 113 and 170 GHz. Analyzing scenarios frequently induced by human activity and en…
▽ More
In the evolving landscape of terahertz communication, the behavior of channels within indoor environments, particularly through glass doors, has garnered significant attention. This paper comprehensively investigates terahertz channel performance under such conditions, employing a measurement setup operational between 113 and 170 GHz. Analyzing scenarios frequently induced by human activity and environmental factors, like door movements, we established a comprehensive theoretical model. This model seamlessly integrates transmission, reflection, absorption, and diffraction mechanisms, leveraging the Fresnel formula, multi-layer transmission paradigm, and knife-edge diffraction theory. Our experimental results and theoretical predictions harmoniously align, revealing intricate dependencies, such as increased power loss at higher frequencies and larger incident angles. Furthermore, door interactions, whether opening or oscillations, significantly impact the terahertz channel. Notably, door edges lead to a power blockage surpassing the transmission loss of the glass itself but remaining inferior to metallic handle interferences. This paper's insights are pivotal for the design and fabrication of terahertz communication systems within indoor settings, pushing the boundaries of efficient and reliable communication.
△ Less
Submitted 3 February, 2024; v1 submitted 14 November, 2023;
originally announced November 2023.
-
Influence of Plasma Density Arrangement on Millimeter-Wave Transmission Characteristics
Authors:
Wenbo Liu,
Peian Li,
Guohao Liu,
Jianjun Ma,
Houjun Sun
Abstract:
The advancement of millimeter wave and terahertz technologies have revolutionized high speed wireless networks and precise tracking systems. These technologies offer unique penetration capabilities in specific scenarios, significantly enhancing the capability to investigation plasma. Recent breakthroughs include the precise diagnosis of plasma electron density using terahertz time domain spectrosc…
▽ More
The advancement of millimeter wave and terahertz technologies have revolutionized high speed wireless networks and precise tracking systems. These technologies offer unique penetration capabilities in specific scenarios, significantly enhancing the capability to investigation plasma. Recent breakthroughs include the precise diagnosis of plasma electron density using terahertz time domain spectroscopy and the modeling of plasma sheaths in re-entry spacecraft through scattering matrices. Concurrently, extensive research efforts have been dedicated to comprehending plasma's influence on electromagnetic wave behaviors, encompassing reflection, transmission, absorption and also phase shift. In this paper, we employ COMSOL Multiphysics software to create an inductively coupled plasma (ICP) device, enabling the simulation of various plasma density arrangements. Our investigation focuses on unraveling the intricate interplay between plasma configurations and millimeter-wave transmission characteristics. The findings underscore the substantial impact of diverse plasma concentration arrangements on the behavior of electromagnetic waves traversing through them. Additionally, these arrangements endow the plasma with a discernible degree of frequency selectivity, thus expanding our understanding of plasma behavior in novel ways.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Automatic Identification of Driving Maneuver Patterns using a Robust Hidden Semi-Markov Models
Authors:
Matthew Aguirre,
Wenbo Sun,
Jionghua,
**,
Yang Chen
Abstract:
There is an increase in interest to model driving maneuver patterns via the automatic unsupervised clustering of naturalistic sequential kinematic driving data. The patterns learned are often used in transportation research areas such as eco-driving, road safety, and intelligent vehicles. One such model capable of modeling these patterns is the Hierarchical Dirichlet Process Hidden Semi-Markov Mod…
▽ More
There is an increase in interest to model driving maneuver patterns via the automatic unsupervised clustering of naturalistic sequential kinematic driving data. The patterns learned are often used in transportation research areas such as eco-driving, road safety, and intelligent vehicles. One such model capable of modeling these patterns is the Hierarchical Dirichlet Process Hidden Semi-Markov Model (HDP-HSMM), as it is often used to estimate data segmentation, state duration, and transition probabilities. While this model is a powerful tool for automatically clustering observed sequential data, the existing HDP-HSMM estimation suffers from an inherent tendency to overestimate the number of states. This can result in poor estimation, which can potentially impact impact transportation research through incorrect inference of driving patterns. In this paper, a new robust HDP-HSMM (rHDP-HSMM) method is proposed to reduce the number of redundant states and improve the consistency of the model's estimation. Both a simulation study and a case study using naturalistic driving data are presented to demonstrate the effectiveness of the proposed rHDP-HSMM in identifying and inference of driving maneuver patterns.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Revisit to the yield ratio of triton and $^3$He as an indicator of neutron-rich neck emission
Authors:
Yijie Wang,
Mengting Wan,
Xinyue Diao,
Sheng Xiao,
Yuhao Qin,
Zhi Qin,
Dong Guo,
Dawei Si,
Boyuan Zhang,
Baiting Tian,
Fenhai Guan,
Qianghua Wu,
Xianglun Wei,
Herun Yang,
Peng Ma,
Rongjiang Hu,
Limin Duan,
Fangfang Duan,
Junbing Ma,
Shiwei Xu,
Qiang Hu,
Zhen Bai,
Yanyun Yang,
Jiansong Wang,
Wenbo Liu
, et al. (12 additional authors not shown)
Abstract:
The neutron rich neck zone created in heavy ion reaction is experimentally probed by the production of the $A=3$ isobars. The energy spectra and angular distributions of triton and $^3$He are measured with the CSHINE detector in $^{86}$Kr +$^{208}$Pb reactions at 25 MeV/u. While the energy spectrum of $^{3}$He is harder than that of triton, known as "$^{3}$He-puzzle", the yield ratio…
▽ More
The neutron rich neck zone created in heavy ion reaction is experimentally probed by the production of the $A=3$ isobars. The energy spectra and angular distributions of triton and $^3$He are measured with the CSHINE detector in $^{86}$Kr +$^{208}$Pb reactions at 25 MeV/u. While the energy spectrum of $^{3}$He is harder than that of triton, known as "$^{3}$He-puzzle", the yield ratio $R({\rm t/^3He})$ presents a robust rising trend with the polar angle in laboratory. Using the fission fragments to reconstruct the fission plane, the enhancement of out-plane $R({\rm t/^3He})$ is confirmed in comparison to the in-plane ratios. Transport model simulations reproduce qualitatively the experimental trends, but the quantitative agreement is not achieved. The results demonstrate that a neutron rich neck zone is formed in the reactions. Further studies are called for to understand the clustering and the isospin dynamics related to neck formation.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Insight-HXMT on-orbit thermal control status and thermal deformation impact analysis
Authors:
Aimei Zhang,
Yifan Zhang,
**yuan Liao,
Yupeng Xu,
Yusa Wang,
Wenbo Luo,
Yupeng Zhou,
Zhiying Qian,
Xiaobo Li,
Fangjun Lu,
Shuangnan Zhang,
Liming Song,
Congzhan Liu,
Fan Zhang,
Jianyin Nie,
Juan Wang,
Sheng Yang,
Tong Zhang,
Xiao**g Liu,
Ruijie Wang,
Xufang Li,
Yifei Zhang,
Zhengwei Li,
Xuefeng Lu,
He Xu
, et al. (1 additional authors not shown)
Abstract:
Purpose: The Hard X-ray Modulation Telescope is China's first X-ray astronomy satellite launched on June 15th, 2017, dubbed Insight-HXMT. Active and passive thermal control measures are employed to keep devices at suitable temperatures. In this paper, we analyzed the on-orbit thermal monitoring data of the first 5 years and investigated the effect of thermal deformation on the point spread functio…
▽ More
Purpose: The Hard X-ray Modulation Telescope is China's first X-ray astronomy satellite launched on June 15th, 2017, dubbed Insight-HXMT. Active and passive thermal control measures are employed to keep devices at suitable temperatures. In this paper, we analyzed the on-orbit thermal monitoring data of the first 5 years and investigated the effect of thermal deformation on the point spread function (PSF) of the telescopes.
Methods: We examined the data of the on-orbit temperatures measured using 157 thermistors placed on the collimators, detectors and their support structures and compared the results with the thermal control requirements. The thermal deformation was evaluated by the relative orientation of the two star sensors installed on the main support structure. its effect was estimated with evolution of the PSF obtained with calibration scanning observations of the Crab nebula.
Conclusion: The on-orbit temperatures met the thermal control requirements thus far, and the effect of thermal deformation on the PSF was negligible after the on-orbit pointing calibration.
△ Less
Submitted 11 November, 2023;
originally announced November 2023.
-
Deep Motion Masking for Secure, Usable, and Scalable Real-Time Anonymization of Virtual Reality Motion Data
Authors:
Vivek Nair,
Wenbo Guo,
James F. O'Brien,
Louis Rosenberg,
Dawn Song
Abstract:
Virtual reality (VR) and "metaverse" systems have recently seen a resurgence in interest and investment as major technology companies continue to enter the space. However, recent studies have demonstrated that the motion tracking "telemetry" data used by nearly all VR applications is as uniquely identifiable as a fingerprint scan, raising significant privacy concerns surrounding metaverse technolo…
▽ More
Virtual reality (VR) and "metaverse" systems have recently seen a resurgence in interest and investment as major technology companies continue to enter the space. However, recent studies have demonstrated that the motion tracking "telemetry" data used by nearly all VR applications is as uniquely identifiable as a fingerprint scan, raising significant privacy concerns surrounding metaverse technologies. Although previous attempts have been made to anonymize VR motion data, we present in this paper a state-of-the-art VR identification model that can convincingly bypass known defensive countermeasures. We then propose a new "deep motion masking" approach that scalably facilitates the real-time anonymization of VR telemetry data. Through a large-scale user study (N=182), we demonstrate that our method is significantly more usable and private than existing VR anonymity systems.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Optimizing Distributed Networking with Big Data Scheduling and Cloud Computing
Authors:
Wenbo Zhu
Abstract:
With the rapid transformation of computer hardware and algorithms, mobile networking has evolved from low data carrying capacity and high latency to better-optimized networks, either by enhancing the digital network or using different approaches to reduce network traffic. This paper discusses the big data applications and scheduling in the distributed networking and analyzes the opportunities and…
▽ More
With the rapid transformation of computer hardware and algorithms, mobile networking has evolved from low data carrying capacity and high latency to better-optimized networks, either by enhancing the digital network or using different approaches to reduce network traffic. This paper discusses the big data applications and scheduling in the distributed networking and analyzes the opportunities and challenges of data management systems. The analysis shows that the big data scheduling in the cloud computing environment produces the most efficient way to transfer and synchronize data. Since scheduling problems and cloud models are very complex to analyze in different settings, we set it to the typical software defined networks. The development of cloud management models and coflow scheduling algorithm is proved to be the priority of the digital communications and networks development in the future.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Twitter Sentiment Analysis of Covid Vacciness
Authors:
Wenbo Zhu,
Tiechuan Hu
Abstract:
In this paper, we look at a database of tweets sorted by various keywords that could indicate the users sentiment towards covid vaccines. With social media becoming such a prevalent source of opinion, sorting and ranking tweets that hold important information such as opinions on covid vaccines is of utmost importance. Two different ranking scales were used, and ranking a tweet in this way could re…
▽ More
In this paper, we look at a database of tweets sorted by various keywords that could indicate the users sentiment towards covid vaccines. With social media becoming such a prevalent source of opinion, sorting and ranking tweets that hold important information such as opinions on covid vaccines is of utmost importance. Two different ranking scales were used, and ranking a tweet in this way could represent the difference between an opinion being lost and an opinion being featured on the site, which affects the decisions and behavior of people, and why researchers were interested in it. Using natural language processing techniques, our aim is to determine and categorize opinions about covid vaccines with the highest accuracy possible.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
A Taxonomy of Rater Disagreements: Surveying Challenges & Opportunities from the Perspective of Annotating Online Toxicity
Authors:
Wenbo Zhang,
Hangzhi Guo,
Ian D Kivlichan,
Vinodkumar Prabhakaran,
Davis Yadav,
Amulya Yadav
Abstract:
Toxicity is an increasingly common and severe issue in online spaces. Consequently, a rich line of machine learning research over the past decade has focused on computationally detecting and mitigating online toxicity. These efforts crucially rely on human-annotated datasets that identify toxic content of various kinds in social media texts. However, such annotations historically yield low inter-r…
▽ More
Toxicity is an increasingly common and severe issue in online spaces. Consequently, a rich line of machine learning research over the past decade has focused on computationally detecting and mitigating online toxicity. These efforts crucially rely on human-annotated datasets that identify toxic content of various kinds in social media texts. However, such annotations historically yield low inter-rater agreement, which was often dealt with by taking the majority vote or other such approaches to arrive at a single ground truth label. Recent research has pointed out the importance of accounting for the subjective nature of this task when building and utilizing these datasets, and this has triggered work on analyzing and better understanding rater disagreements, and how they could be effectively incorporated into the machine learning developmental pipeline. While these efforts are filling an important gap, there is a lack of a broader framework about the root causes of rater disagreement, and therefore, we situate this work within that broader landscape. In this survey paper, we analyze a broad set of literature on the reasons behind rater disagreements focusing on online toxicity, and propose a detailed taxonomy for the same. Further, we summarize and discuss the potential solutions targeting each reason for disagreement. We also discuss several open issues, which could promote the future development of online toxicity research.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Neural Structure Learning with Stochastic Differential Equations
Authors:
Benjie Wang,
Joel Jennings,
Wenbo Gong
Abstract:
Discovering the underlying relationships among variables from temporal observations has been a longstanding challenge in numerous scientific disciplines, including biology, finance, and climate science. The dynamics of such systems are often best described using continuous-time stochastic processes. Unfortunately, most existing structure learning approaches assume that the underlying process evolv…
▽ More
Discovering the underlying relationships among variables from temporal observations has been a longstanding challenge in numerous scientific disciplines, including biology, finance, and climate science. The dynamics of such systems are often best described using continuous-time stochastic processes. Unfortunately, most existing structure learning approaches assume that the underlying process evolves in discrete-time and/or observations occur at regular time intervals. These mismatched assumptions can often lead to incorrect learned structures and models. In this work, we introduce a novel structure learning method, SCOTCH, which combines neural stochastic differential equations (SDE) with variational inference to infer a posterior distribution over possible structures. This continuous-time approach can naturally handle both learning from and predicting observations at arbitrary time points. Theoretically, we establish sufficient conditions for an SDE and SCOTCH to be structurally identifiable, and prove its consistency under infinite data limits. Empirically, we demonstrate that our approach leads to improved structure learning performance on both synthetic and real-world datasets compared to relevant baselines under regular and irregular sampling intervals.
△ Less
Submitted 5 May, 2024; v1 submitted 6 November, 2023;
originally announced November 2023.
-
Personalized Assignment to One of Many Treatment Arms via Regularized and Clustered Joint Assignment Forests
Authors:
Rahul Ladhania,
Jann Spiess,
Lyle Ungar,
Wenbo Wu
Abstract:
We consider learning personalized assignments to one of many treatment arms from a randomized controlled trial. Standard methods that estimate heterogeneous treatment effects separately for each arm may perform poorly in this case due to excess variance. We instead propose methods that pool information across treatment arms: First, we consider a regularized forest-based assignment algorithm based…
▽ More
We consider learning personalized assignments to one of many treatment arms from a randomized controlled trial. Standard methods that estimate heterogeneous treatment effects separately for each arm may perform poorly in this case due to excess variance. We instead propose methods that pool information across treatment arms: First, we consider a regularized forest-based assignment algorithm based on greedy recursive partitioning that shrinks effect estimates across arms. Second, we augment our algorithm by a clustering scheme that combines treatment arms with consistently similar outcomes. In a simulation study, we compare the performance of these approaches to predicting arm-wise outcomes separately, and document gains of directly optimizing the treatment assignment with regularization and clustering. In a theoretical model, we illustrate how a high number of treatment arms makes finding the best arm hard, while we can achieve sizable utility gains from personalization by regularized optimization.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
From Denoising Training to Test-Time Adaptation: Enhancing Domain Generalization for Medical Image Segmentation
Authors:
Ruxue Wen,
Hangjie Yuan,
Dong Ni,
Wenbo Xiao,
Yaoyao Wu
Abstract:
In medical image segmentation, domain generalization poses a significant challenge due to domain shifts caused by variations in data acquisition devices and other factors. These shifts are particularly pronounced in the most common scenario, which involves only single-source domain data due to privacy concerns. To address this, we draw inspiration from the self-supervised learning paradigm that ef…
▽ More
In medical image segmentation, domain generalization poses a significant challenge due to domain shifts caused by variations in data acquisition devices and other factors. These shifts are particularly pronounced in the most common scenario, which involves only single-source domain data due to privacy concerns. To address this, we draw inspiration from the self-supervised learning paradigm that effectively discourages overfitting to the source domain. We propose the Denoising Y-Net (DeY-Net), a novel approach incorporating an auxiliary denoising decoder into the basic U-Net architecture. The auxiliary decoder aims to perform denoising training, augmenting the domain-invariant representation that facilitates domain generalization. Furthermore, this paradigm provides the potential to utilize unlabeled data. Building upon denoising training, we propose Denoising Test Time Adaptation (DeTTA) that further: (i) adapts the model to the target domain in a sample-wise manner, and (ii) adapts to the noise-corrupted input. Extensive experiments conducted on widely-adopted liver segmentation benchmarks demonstrate significant domain generalization improvements over our baseline and state-of-the-art results compared to other methods. Code is available at https://github.com/WenRuxue/DeTTA.
△ Less
Submitted 2 November, 2023; v1 submitted 31 October, 2023;
originally announced October 2023.
-
On the nontrivial extremal eigenvalues of graphs
Authors:
Wenbo Li,
Shi** Liu
Abstract:
We present a finer quantitative version of an observation due to Breuillard, Green, Guralnick and Tao which tells that for finite non-bipartite Cayley graphs, once the nontrivial eigenvalues of their normalized adjacency matrices are uniformly bounded away from $1$, then they are also uniformly bounded away from $-1$. Unlike previous works which depend heavily on combinatorial arguments, we rely m…
▽ More
We present a finer quantitative version of an observation due to Breuillard, Green, Guralnick and Tao which tells that for finite non-bipartite Cayley graphs, once the nontrivial eigenvalues of their normalized adjacency matrices are uniformly bounded away from $1$, then they are also uniformly bounded away from $-1$. Unlike previous works which depend heavily on combinatorial arguments, we rely more on analysis of eigenfunctions. We establish a new explicit lower bound for the gap between $-1$ and the smallest normalized adjacency eigenvalue, which improves previous lower bounds in terms of edge-expansion, and is comparable to the best known lower bound in terms of vertex-expansion.
△ Less
Submitted 30 October, 2023; v1 submitted 26 October, 2023;
originally announced October 2023.
-
Demystifying Compiler Unstable Feature Usage and Impacts in the Rust Ecosystem
Authors:
Chenghao Li,
Yifei Wu,
Wenbo Shen,
Zichen Zhao,
Rui Chang,
Chengwei Liu,
Yang Liu,
Kui Ren
Abstract:
Rust programming language is gaining popularity rapidly in building reliable and secure systems due to its security guarantees and outstanding performance. To provide extra functionalities, the Rust compiler introduces Rust unstable features (RUF) to extend compiler functionality, syntax, and standard library support. However, these features are unstable and may get removed, introducing compilatio…
▽ More
Rust programming language is gaining popularity rapidly in building reliable and secure systems due to its security guarantees and outstanding performance. To provide extra functionalities, the Rust compiler introduces Rust unstable features (RUF) to extend compiler functionality, syntax, and standard library support. However, these features are unstable and may get removed, introducing compilation failures to dependent packages. Even worse, their impacts propagate through transitive dependencies, causing large-scale failures in the whole ecosystem. Although RUF is widely used in Rust, previous research has primarily concentrated on Rust code safety, with the usage and impacts of RUF from the Rust compiler remaining unexplored. Therefore, we aim to bridge this gap by systematically analyzing the RUF usage and impacts in the Rust ecosystem. We propose novel techniques for extracting RUF precisely, and to assess its impact on the entire ecosystem quantitatively, we accurately resolve package dependencies. We have analyzed the whole Rust ecosystem with 590K package versions and 140M transitive dependencies. Our study shows that the Rust ecosystem uses 1000 different RUF, and at most 44% of package versions are affected by RUF, causing compiling failures for at most 12%. To mitigate wide RUF impacts, we further design and implement a RUF-compilation-failure recovery tool that can recover up to 90% of the failure. We believe our techniques, findings, and tools can help to stabilize the Rust compiler, ultimately enhancing the security and reliability of the Rust ecosystem.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
Max-min Rate Optimization of Low-Complexity Hybrid Multi-User Beamforming Maintaining Rate-Fairness
Authors:
W. Zhu,
H. D. Tuan,
E. Dutkiewicz,
H. V. Poor,
L. Hanzo
Abstract:
A wireless network serving multiple users in the millimeter-wave or the sub-terahertz band by a base station is considered. High-throughput multi-user hybrid-transmit beamforming is conceived by maximizing the minimum rate of the users. For the sake of energy-efficient signal transmission, the array-of-subarrays structure is used for analog beamforming relying on low-resolution phase shifters. We…
▽ More
A wireless network serving multiple users in the millimeter-wave or the sub-terahertz band by a base station is considered. High-throughput multi-user hybrid-transmit beamforming is conceived by maximizing the minimum rate of the users. For the sake of energy-efficient signal transmission, the array-of-subarrays structure is used for analog beamforming relying on low-resolution phase shifters. We develop a convexsolver based algorithm, which iteratively invokes a convex problem of the same beamformer size for its solution. We then introduce the soft max-min rate objective function and develop a scalable algorithm for its optimization. Our simulation results demonstrate the striking fact that soft max-min rate optimization not only approaches the minimum user rate obtained by max-min rate optimization but it also achieves a sum rate similar to that of sum-rate maximization. Thus, the soft max-min rate optimization based beamforming design conceived offers a new technique of simultaneously achieving a high individual quality-of-service for all users and a high total network throughput.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
netFound: Foundation Model for Network Security
Authors:
Satyandra Guthula,
Navya Battula,
Roman Beltiukov,
Wenbo Guo,
Arpit Gupta
Abstract:
In ML for network security, traditional workflows rely on high-quality labeled data and manual feature engineering, but limited datasets and human expertise hinder feature selection, leading to models struggling to capture crucial relationships and generalize effectively. Inspired by recent advancements in ML application domains like GPT-4 and Vision Transformers, we have developed netFound, a fou…
▽ More
In ML for network security, traditional workflows rely on high-quality labeled data and manual feature engineering, but limited datasets and human expertise hinder feature selection, leading to models struggling to capture crucial relationships and generalize effectively. Inspired by recent advancements in ML application domains like GPT-4 and Vision Transformers, we have developed netFound, a foundational model for network security. This model undergoes pre-training using self-supervised algorithms applied to readily available unlabeled network packet traces. netFound's design incorporates hierarchical and multi-modal attributes of network traffic, effectively capturing hidden networking contexts, including application logic, communication protocols, and network conditions.
With this pre-trained foundation in place, we can fine-tune netFound for a wide array of downstream tasks, even when dealing with low-quality, limited, and noisy labeled data. Our experiments demonstrate netFound's superiority over existing state-of-the-art ML-based solutions across three distinct network downstream tasks: traffic classification, network intrusion detection, and APT detection. Furthermore, we emphasize netFound's robustness against noisy and missing labels, as well as its ability to generalize across temporal variations and diverse network environments. Finally, through a series of ablation studies, we provide comprehensive insights into how our design choices enable netFound to more effectively capture hidden networking contexts, further solidifying its performance and utility in network security applications.
△ Less
Submitted 27 November, 2023; v1 submitted 25 October, 2023;
originally announced October 2023.
-
TSONN: Time-step**-oriented neural network for solving partial differential equations
Authors:
Wenbo Cao,
Weiwei Zhang
Abstract:
Deep neural networks (DNNs), especially physics-informed neural networks (PINNs), have recently become a new popular method for solving forward and inverse problems governed by partial differential equations (PDEs). However, these methods still face challenges in achieving stable training and obtaining correct results in many problems, since minimizing PDE residuals with PDE-based soft constraint…
▽ More
Deep neural networks (DNNs), especially physics-informed neural networks (PINNs), have recently become a new popular method for solving forward and inverse problems governed by partial differential equations (PDEs). However, these methods still face challenges in achieving stable training and obtaining correct results in many problems, since minimizing PDE residuals with PDE-based soft constraint make the problem ill-conditioned. Different from all existing methods that directly minimize PDE residuals, this work integrates time-step** method with deep learning, and transforms the original ill-conditioned optimization problem into a series of well-conditioned sub-problems over given pseudo time intervals. The convergence of model training is significantly improved by following the trajectory of the pseudo time-step** process, yielding a robust optimization-based PDE solver. Our results show that the proposed method achieves stable training and correct results in many problems that standard PINNs fail to solve, requiring only a simple modification on the loss function. In addition, we demonstrate several novel properties and advantages of time-step** methods within the framework of neural network-based optimization approach, in comparison to traditional grid-based numerical method. Specifically, explicit scheme allows significantly larger time step, while implicit scheme can be implemented as straightforwardly as explicit scheme.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Segue: Side-information Guided Generative Unlearnable Examples for Facial Privacy Protection in Real World
Authors:
Zhiling Zhang,
Jie Zhang,
Kui Zhang,
Wenbo Zhou,
Weiming Zhang,
Nenghai Yu
Abstract:
The widespread use of face recognition technology has given rise to privacy concerns, as many individuals are worried about the collection and utilization of their facial data. To address these concerns, researchers are actively exploring the concept of ``unlearnable examples", by adding imperceptible perturbation to data in the model training stage, which aims to prevent the model from learning d…
▽ More
The widespread use of face recognition technology has given rise to privacy concerns, as many individuals are worried about the collection and utilization of their facial data. To address these concerns, researchers are actively exploring the concept of ``unlearnable examples", by adding imperceptible perturbation to data in the model training stage, which aims to prevent the model from learning discriminate features of the target face. However, current methods are inefficient and cannot guarantee transferability and robustness at the same time, causing impracticality in the real world. To remedy it, we propose a novel method called Segue: Side-information guided generative unlearnable examples. Specifically, we leverage a once-trained multiple-used model to generate the desired perturbation rather than the time-consuming gradient-based method. To improve transferability, we introduce side information such as true labels and pseudo labels, which are inherently consistent across different scenarios. For robustness enhancement, a distortion layer is integrated into the training pipeline. Extensive experiments demonstrate that the proposed Segue is much faster than previous methods (1000$\times$) and achieves transferable effectiveness across different datasets and model architectures. Furthermore, it can resist JPEG compression, adversarial training, and some standard data augmentations.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Prediction of fully metallic σ-bonded boron framework induced high superconductivity above 100 K in thermodynamically stable Sr2B5 at 40 GPa
Authors:
Xin Yang,
Wenbo Zhao,
Liang Ma,
Wencheng Lu,
Xin Zhong,
Yu Xie,
Hanyu Liu,
Yanming Ma
Abstract:
Metal borides have been considered as potential high-temperature superconductors since the discovery of record-holding 39 K superconductivity in bulk MgB2. In this work, we identified a superconducting yet thermodynamically stable F43m Sr2B5 at 40 GPa with a unique covalent sp3-hybridized boron framework through extensive first-principles structure searches. Remarkably, solving the anisotropic Mig…
▽ More
Metal borides have been considered as potential high-temperature superconductors since the discovery of record-holding 39 K superconductivity in bulk MgB2. In this work, we identified a superconducting yet thermodynamically stable F43m Sr2B5 at 40 GPa with a unique covalent sp3-hybridized boron framework through extensive first-principles structure searches. Remarkably, solving the anisotropic Migdal-Eliashberg equations resulted in a high superconducting critical temperature (Tc) around 100 K, exceeding the boiling point (77 K) of liquid nitrogen. Our in-depth analysis revealed that the high-temperature superconductivity mainly originates from the strong coupling between the metalized σ-bonded electronic bands and E phonon modes of boron atoms. Moreover, anharmonic phonon simulations suggest that F43m Sr2B5 might be recovered to ambient pressure. Our current findings provide a prototype structure with a full σ-bonded boron framework for the design of high-Tc superconducting borides that may expand to a broader variety of lightweight compounds.
△ Less
Submitted 21 October, 2023;
originally announced October 2023.
-
DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation
Authors:
Bangbang Yang,
Wenqi Dong,
Lin Ma,
Wenbo Hu,
Xiao Liu,
Zhaopeng Cui,
Yuewen Ma
Abstract:
Diffusion-based methods have achieved prominent success in generating 2D media. However, accomplishing similar proficiencies for scene-level mesh texturing in 3D spatial applications, e.g., XR/VR, remains constrained, primarily due to the intricate nature of 3D geometry and the necessity for immersive free-viewpoint rendering. In this paper, we propose a novel indoor scene texturing framework, whi…
▽ More
Diffusion-based methods have achieved prominent success in generating 2D media. However, accomplishing similar proficiencies for scene-level mesh texturing in 3D spatial applications, e.g., XR/VR, remains constrained, primarily due to the intricate nature of 3D geometry and the necessity for immersive free-viewpoint rendering. In this paper, we propose a novel indoor scene texturing framework, which delivers text-driven texture generation with enchanting details and authentic spatial coherence. The key insight is to first imagine a stylized 360° panoramic texture from the central viewpoint of the scene, and then propagate it to the rest areas with inpainting and imitating techniques. To ensure meaningful and aligned textures to the scene, we develop a novel coarse-to-fine panoramic texture generation approach with dual texture alignment, which both considers the geometry and texture cues of the captured scenes. To survive from cluttered geometries during texture propagation, we design a separated strategy, which conducts texture inpainting in confidential regions and then learns an implicit imitating network to synthesize textures in occluded and tiny structural areas. Extensive experiments and the immersive VR application on real-world indoor scenes demonstrate the high quality of the generated textures and the engaging experience on VR headsets. Project webpage: https://ybbbbt.com/publication/dreamspace
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Investigating Uncertainty Calibration of Aligned Language Models under the Multiple-Choice Setting
Authors:
Guande He,
Peng Cui,
Jianfei Chen,
Wenbo Hu,
Jun Zhu
Abstract:
Despite the significant progress made in practical applications of aligned language models (LMs), they tend to be overconfident in output answers compared to the corresponding pre-trained LMs. In this work, we systematically evaluate the impact of the alignment process on logit-based uncertainty calibration of LMs under the multiple-choice setting. We first conduct a thoughtful empirical study on…
▽ More
Despite the significant progress made in practical applications of aligned language models (LMs), they tend to be overconfident in output answers compared to the corresponding pre-trained LMs. In this work, we systematically evaluate the impact of the alignment process on logit-based uncertainty calibration of LMs under the multiple-choice setting. We first conduct a thoughtful empirical study on how aligned LMs differ in calibration from their pre-trained counterparts. Experimental results reveal that there are two distinct uncertainties in LMs under the multiple-choice setting, which are responsible for the answer decision and the format preference of the LMs, respectively. Then, we investigate the role of these two uncertainties on aligned LM's calibration through fine-tuning in simple synthetic alignment schemes and conclude that one reason for aligned LMs' overconfidence is the conflation of these two types of uncertainty. Furthermore, we examine the utility of common post-hoc calibration methods for aligned LMs and propose an easy-to-implement and sample-efficient method to calibrate aligned LMs. We hope our findings could provide insights into the design of more reliable alignment processes for LMs.
△ Less
Submitted 19 November, 2023; v1 submitted 18 October, 2023;
originally announced October 2023.
-
A note on an effective bound for the gonality conjecture
Authors:
Alexander Duncan,
Wenbo Niu,
**hyung Park
Abstract:
The gonality conjecture, proved by Ein--Lazarsfeld, asserts that the gonality of a nonsingular projective curve of genus $g$ can be detected from its syzygies in the embedding given by a line bundle of sufficiently large degree. An effective result obtained by Rathmann says that any line bundle of degree at least 4g-3 would work in the gonality theorem. In this note, we improve the degree bound to…
▽ More
The gonality conjecture, proved by Ein--Lazarsfeld, asserts that the gonality of a nonsingular projective curve of genus $g$ can be detected from its syzygies in the embedding given by a line bundle of sufficiently large degree. An effective result obtained by Rathmann says that any line bundle of degree at least 4g-3 would work in the gonality theorem. In this note, we improve the degree bound to 4g-4 with two exceptional cases.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
HairCLIPv2: Unifying Hair Editing via Proxy Feature Blending
Authors:
Tianyi Wei,
Dongdong Chen,
Wenbo Zhou,
**g Liao,
Weiming Zhang,
Gang Hua,
Nenghai Yu
Abstract:
Hair editing has made tremendous progress in recent years. Early hair editing methods use well-drawn sketches or masks to specify the editing conditions. Even though they can enable very fine-grained local control, such interaction modes are inefficient for the editing conditions that can be easily specified by language descriptions or reference images. Thanks to the recent breakthrough of cross-m…
▽ More
Hair editing has made tremendous progress in recent years. Early hair editing methods use well-drawn sketches or masks to specify the editing conditions. Even though they can enable very fine-grained local control, such interaction modes are inefficient for the editing conditions that can be easily specified by language descriptions or reference images. Thanks to the recent breakthrough of cross-modal models (e.g., CLIP), HairCLIP is the first work that enables hair editing based on text descriptions or reference images. However, such text-driven and reference-driven interaction modes make HairCLIP unable to support fine-grained controls specified by sketch or mask. In this paper, we propose HairCLIPv2, aiming to support all the aforementioned interactions with one unified framework. Simultaneously, it improves upon HairCLIP with better irrelevant attributes (e.g., identity, background) preservation and unseen text descriptions support. The key idea is to convert all the hair editing tasks into hair transfer tasks, with editing conditions converted into different proxies accordingly. The editing effects are added upon the input image by blending the corresponding proxy features within the hairstyle or hair color feature spaces. Besides the unprecedented user interaction mode support, quantitative and qualitative experiments demonstrate the superiority of HairCLIPv2 in terms of editing effects, irrelevant attribute preservation and visual naturalness. Our code is available at \url{https://github.com/wty-ustc/HairCLIPv2}.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Editable-DeepSC: Cross-Modal Editable Semantic Communication Systems
Authors:
Wenbo Yu,
Bin Chen,
Qinshan Zhang,
Shu-Tao Xia
Abstract:
Different from \emph{data-oriented} communication systems primarily focusing on how to accurately transmit every bit of data, \emph{task-oriented} semantic communication systems \iffalse, which are capturing widespread research attention recently,\fi only transmit the specific semantic information required by downstream tasks, striving to minimize the communication overhead and maintain competitiv…
▽ More
Different from \emph{data-oriented} communication systems primarily focusing on how to accurately transmit every bit of data, \emph{task-oriented} semantic communication systems \iffalse, which are capturing widespread research attention recently,\fi only transmit the specific semantic information required by downstream tasks, striving to minimize the communication overhead and maintain competitive tasks execution performance in the presence of channel noise. However, it is worth noting that in many scenarios, the transmitted semantic information needs to be dynamically modified according to the users' preferences and few existing works take this into consideration. Therefore, in this paper, we propose a novel cross-modal editable semantic communication system, named \emph{Editable-DeepSC}, to tackle this challenge. By utilizing inversion methods based on StyleGAN priors, \emph{Editable-DeepSC} takes cross-modal text-image pairs as inputs and transmits the edited information of images based on textual instructions. Extensive numerical results demonstrate that our proposed \emph{Editable-DeepSC} can achieve remarkable editing effects under the perturbations of channel noise, outperforming existing \emph{data-oriented} communication methods.
△ Less
Submitted 22 April, 2024; v1 submitted 16 October, 2023;
originally announced October 2023.
-
Mitigating Bias for Question Answering Models by Tracking Bias Influence
Authors:
Mingyu Derek Ma,
Jiun-Yu Kao,
Arpit Gupta,
Yu-Hsiang Lin,
Wenbo Zhao,
Tagyoung Chung,
Wei Wang,
Kai-Wei Chang,
Nanyun Peng
Abstract:
Models of various NLP tasks have been shown to exhibit stereotypes, and the bias in the question answering (QA) models is especially harmful as the output answers might be directly consumed by the end users. There have been datasets to evaluate bias in QA models, while bias mitigation technique for the QA models is still under-explored. In this work, we propose BMBI, an approach to mitigate the bi…
▽ More
Models of various NLP tasks have been shown to exhibit stereotypes, and the bias in the question answering (QA) models is especially harmful as the output answers might be directly consumed by the end users. There have been datasets to evaluate bias in QA models, while bias mitigation technique for the QA models is still under-explored. In this work, we propose BMBI, an approach to mitigate the bias of multiple-choice QA models. Based on the intuition that a model would lean to be more biased if it learns from a biased example, we measure the bias level of a query instance by observing its influence on another instance. If the influenced instance is more biased, we derive that the query instance is biased. We then use the bias level detected as an optimization objective to form a multi-task learning setting in addition to the original QA task. We further introduce a new bias evaluation metric to quantify bias in a comprehensive and sensitive way. We show that our method could be applied to multiple QA formulations across multiple bias categories. It can significantly reduce the bias level in all 9 bias categories in the BBQ dataset while maintaining comparable QA accuracy.
△ Less
Submitted 17 June, 2024; v1 submitted 12 October, 2023;
originally announced October 2023.
-
Every Parameter Matters: Ensuring the Convergence of Federated Learning with Dynamic Heterogeneous Models Reduction
Authors:
Hanhan Zhou,
Tian Lan,
Guru Venkataramani,
Wenbo Ding
Abstract:
Cross-device Federated Learning (FL) faces significant challenges where low-end clients that could potentially make unique contributions are excluded from training large models due to their resource bottlenecks. Recent research efforts have focused on model-heterogeneous FL, by extracting reduced-size models from the global model and applying them to local clients accordingly. Despite the empirica…
▽ More
Cross-device Federated Learning (FL) faces significant challenges where low-end clients that could potentially make unique contributions are excluded from training large models due to their resource bottlenecks. Recent research efforts have focused on model-heterogeneous FL, by extracting reduced-size models from the global model and applying them to local clients accordingly. Despite the empirical success, general theoretical guarantees of convergence on this method remain an open question. This paper presents a unifying framework for heterogeneous FL algorithms with online model extraction and provides a general convergence analysis for the first time. In particular, we prove that under certain sufficient conditions and for both IID and non-IID data, these algorithms converge to a stationary point of standard FL for general smooth cost functions. Moreover, we introduce the concept of minimum coverage index, together with model reduction noise, which will determine the convergence of heterogeneous federated learning, and therefore we advocate for a holistic approach that considers both factors to enhance the efficiency of heterogeneous federated learning.
△ Less
Submitted 26 October, 2023; v1 submitted 12 October, 2023;
originally announced October 2023.
-
SEE-OoD: Supervised Exploration For Enhanced Out-of-Distribution Detection
Authors:
Xiaoyang Song,
Wenbo Sun,
Maher Nouiehed,
Raed Al Kontar,
Judy **
Abstract:
Current techniques for Out-of-Distribution (OoD) detection predominantly rely on quantifying predictive uncertainty and incorporating model regularization during the training phase, using either real or synthetic OoD samples. However, methods that utilize real OoD samples lack exploration and are prone to overfit the OoD samples at hand. Whereas synthetic samples are often generated based on featu…
▽ More
Current techniques for Out-of-Distribution (OoD) detection predominantly rely on quantifying predictive uncertainty and incorporating model regularization during the training phase, using either real or synthetic OoD samples. However, methods that utilize real OoD samples lack exploration and are prone to overfit the OoD samples at hand. Whereas synthetic samples are often generated based on features extracted from training data, rendering them less effective when the training and OoD data are highly overlapped in the feature space. In this work, we propose a Wasserstein-score-based generative adversarial training scheme to enhance OoD detection accuracy, which, for the first time, performs data augmentation and exploration simultaneously under the supervision of limited OoD samples. Specifically, the generator explores OoD spaces and generates synthetic OoD samples using feedback from the discriminator, while the discriminator exploits both the observed and synthesized samples for OoD detection using a predefined Wasserstein score. We provide theoretical guarantees that the optimal solutions of our generative scheme are statistically achievable through adversarial training in empirical settings. We then demonstrate that the proposed method outperforms state-of-the-art techniques on various computer vision datasets and exhibits superior generalizability to unseen OoD data.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Quantum Shadow Gradient Descent for Quantum Learning
Authors:
Mohsen Heidari,
Mobasshir A Naved,
Wenbo Xie,
Arjun Jacob Grama,
Wojciech Szpankowski
Abstract:
This paper proposes a new procedure called quantum shadow gradient descent (QSGD) that addresses these key challenges. Our method has the benefits of a one-shot approach, in not requiring any sample duplication while having a convergence rate comparable to the ideal update rule using exact gradient computation. We propose a new technique for generating quantum shadow samples (QSS), which generates…
▽ More
This paper proposes a new procedure called quantum shadow gradient descent (QSGD) that addresses these key challenges. Our method has the benefits of a one-shot approach, in not requiring any sample duplication while having a convergence rate comparable to the ideal update rule using exact gradient computation. We propose a new technique for generating quantum shadow samples (QSS), which generates quantum shadows as opposed to classical shadows used in existing works. With classical shadows, the computations are typically performed on classical computers and, hence, are prohibitive since the dimension grows exponentially. Our approach resolves this issue by measurements of quantum shadows. As the second main contribution, we study more general non-product ansatz of the form $\exp\{i\sum_j θ_j A_j\}$ that model variational Hamiltonians. We prove that the gradient can be written in terms of the gradient of single-parameter ansatzes that can be easily measured. Our proof is based on the Suzuki-Trotter approximation; however, our expressions are exact, unlike prior efforts that approximate non-product operators. As a result, existing gradient measurement techniques can be applied to more general VQAs followed by correction terms without any approximation penalty. We provide theoretical proofs, convergence analysis and verify our results through numerical experiments.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
HiFi-123: Towards High-fidelity One Image to 3D Content Generation
Authors:
Wangbo Yu,
Li Yuan,
Yan-Pei Cao,
Xiangjun Gao,
Xiaoyu Li,
Wenbo Hu,
Long Quan,
Ying Shan,
Yonghong Tian
Abstract:
Recent advances in diffusion models have enabled 3D generation from a single image. However, current methods often produce suboptimal results for novel views, with blurred textures and deviations from the reference image, limiting their practical applications. In this paper, we introduce HiFi-123, a method designed for high-fidelity and multi-view consistent 3D generation. Our contributions are tw…
▽ More
Recent advances in diffusion models have enabled 3D generation from a single image. However, current methods often produce suboptimal results for novel views, with blurred textures and deviations from the reference image, limiting their practical applications. In this paper, we introduce HiFi-123, a method designed for high-fidelity and multi-view consistent 3D generation. Our contributions are twofold: First, we propose a Reference-Guided Novel View Enhancement (RGNV) technique that significantly improves the fidelity of diffusion-based zero-shot novel view synthesis methods. Second, capitalizing on the RGNV, we present a novel Reference-Guided State Distillation (RGSD) loss. When incorporated into the optimization-based image-to-3D pipeline, our method significantly improves 3D generation quality, achieving state-of-the-art performance. Comprehensive evaluations demonstrate the effectiveness of our approach over existing methods, both qualitatively and quantitatively. Video results are available on the project page.
△ Less
Submitted 25 March, 2024; v1 submitted 10 October, 2023;
originally announced October 2023.
-
Two stage Robust Nash Bargaining based Energy Trading between Hydrogen-enriched Gas and Active Distribution Networks
Authors:
Wenwen Zhang,
Gao Qiu,
Hongjun Gao,
Tingjian Liu,
Junyong Liu,
Ya** Li,
Shengchun Yang,
Jiahao Yan,
Wenbo Mao
Abstract:
Integration of emerging hydrogen-enriched compressed natural gas (HCNG) distribution network with active distribution net-work (ADN) provides huge latent flexibility on consuming re-newable energies. However, paucity of energy trading mechanism risks the stable earnings of the flexibility for both entities, especially when rising highly-efficient solid oxide fuel cells (SOFCs) are pioneered to int…
▽ More
Integration of emerging hydrogen-enriched compressed natural gas (HCNG) distribution network with active distribution net-work (ADN) provides huge latent flexibility on consuming re-newable energies. However, paucity of energy trading mechanism risks the stable earnings of the flexibility for both entities, especially when rising highly-efficient solid oxide fuel cells (SOFCs) are pioneered to interface gas and electricity. To fill the gap, a two-stage robust Nash bargaining strategy is pro-posed. In the first stage, a privacy-preserved Nash Bargaining based on the ADMM is applied to clear energy trading between the two autonomous entities, i.e., ADN and gas distribution network (GDN). Via robust dispatch of configured energy storage in ADN, the next stage de-risks ADN profit collapse from transaction biases, caused by forecasting errors of distributed energy resources. C&CG is finally utilized to loop the two stages. The convergence of the entire energy trading strategy is theoretically proved. As such, sustain-able returns from the integration of ADN and GDN bridged by SOFC and HCNG are facilitated. Numerical studies indicate that, the proposed cooperative strategy reaps a stable social welfare of nearly 1.6% to total cost, and benefit-steady situations for both ADN and GDN, even in the worst case.
△ Less
Submitted 22 May, 2024; v1 submitted 9 October, 2023;
originally announced October 2023.
-
Inclusive Data Representation in Federated Learning: A Novel Approach Integrating Textual and Visual Prompt
Authors:
Zihao Zhao,
Zhenpeng Shi,
Yang Liu,
Wenbo Ding
Abstract:
Federated Learning (FL) is often impeded by communication overhead issues. Prompt tuning, as a potential solution, has been introduced to only adjust a few trainable parameters rather than the whole model. However, current single-modality prompt tuning approaches fail to comprehensively portray local clients' data. To overcome this limitation, we present Twin Prompt Federated learning (TPFL), a pi…
▽ More
Federated Learning (FL) is often impeded by communication overhead issues. Prompt tuning, as a potential solution, has been introduced to only adjust a few trainable parameters rather than the whole model. However, current single-modality prompt tuning approaches fail to comprehensively portray local clients' data. To overcome this limitation, we present Twin Prompt Federated learning (TPFL), a pioneering solution that integrates both visual and textual modalities, ensuring a more holistic representation of local clients' data characteristics. Furthermore, in order to tackle the data heterogeneity issues, we introduce the Augmented TPFL (ATPFL) employing the contrastive learning to TPFL, which not only enhances the global knowledge acquisition of client models but also fosters the development of robust, compact models. The effectiveness of TPFL and ATPFL is substantiated by our extensive evaluations, consistently showing superior performance compared to all baselines.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
Primal-dual hybrid gradient algorithms for computing time-implicit Hamilton-Jacobi equations
Authors:
Tingwei Meng,
Wenbo Hao,
Siting Liu,
Stanley J. Osher,
Wuchen Li
Abstract:
Hamilton-Jacobi (HJ) partial differential equations (PDEs) have diverse applications spanning physics, optimal control, game theory, and imaging sciences. This research introduces a first-order optimization-based technique for HJ PDEs, which formulates the time-implicit update of HJ PDEs as saddle point problems. We remark that the saddle point formulation for HJ equations is aligned with the prim…
▽ More
Hamilton-Jacobi (HJ) partial differential equations (PDEs) have diverse applications spanning physics, optimal control, game theory, and imaging sciences. This research introduces a first-order optimization-based technique for HJ PDEs, which formulates the time-implicit update of HJ PDEs as saddle point problems. We remark that the saddle point formulation for HJ equations is aligned with the primal-dual formulation of optimal transport and potential mean-field games (MFGs). This connection enables us to extend MFG techniques and design numerical schemes for solving HJ PDEs. We employ the primal-dual hybrid gradient (PDHG) method to solve the saddle point problems, benefiting from the simple structures that enable fast computations in updates. Remarkably, the method caters to a broader range of Hamiltonians, encompassing non-smooth and spatiotemporally dependent cases. The approach's effectiveness is verified through various numerical examples in both one-dimensional and two-dimensional examples, such as quadratic and $L^1$ Hamiltonians with spatial and time dependence.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Real-Time Risk Analysis with Optimization Proxies
Authors:
Wenbo Chen,
Mathieu Tanneau,
Pascal Van Hentenryck
Abstract:
The increasing penetration of renewable generation and distributed energy resources requires new operating practices for power systems, wherein risk is explicitly quantified and managed. However, traditional risk-assessment frameworks are not fast enough for real-time operations, because they require numerous simulations, each of which requires solving multiple economic dispatch problems sequentia…
▽ More
The increasing penetration of renewable generation and distributed energy resources requires new operating practices for power systems, wherein risk is explicitly quantified and managed. However, traditional risk-assessment frameworks are not fast enough for real-time operations, because they require numerous simulations, each of which requires solving multiple economic dispatch problems sequentially. The paper addresses this computational challenge by proposing proxy-based risk assessment, wherein optimization proxies are trained to learn the input-to-output map** of an economic dispatch optimization solver. Once trained, the proxies make predictions in milliseconds, thereby enabling real-time risk assessment. The paper leverages self-supervised learning and end-to-end-feasible architecture to achieve high-quality sequential predictions. Numerical experiments on large systems demonstrate the scalability and accuracy of the proposed approach.
△ Less
Submitted 4 October, 2023; v1 submitted 1 October, 2023;
originally announced October 2023.
-
Berkeley Open Extended Reality Recordings 2023 (BOXRR-23): 4.7 Million Motion Capture Recordings from 105,852 Extended Reality Device Users
Authors:
Vivek Nair,
Wenbo Guo,
Rui Wang,
James F. O'Brien,
Louis Rosenberg,
Dawn Song
Abstract:
Extended reality (XR) devices such as the Meta Quest and Apple Vision Pro have seen a recent surge in attention, with motion tracking "telemetry" data lying at the core of nearly all XR and metaverse experiences. Researchers are just beginning to understand the implications of this data for security, privacy, usability, and more, but currently lack large-scale human motion datasets to study. The B…
▽ More
Extended reality (XR) devices such as the Meta Quest and Apple Vision Pro have seen a recent surge in attention, with motion tracking "telemetry" data lying at the core of nearly all XR and metaverse experiences. Researchers are just beginning to understand the implications of this data for security, privacy, usability, and more, but currently lack large-scale human motion datasets to study. The BOXRR-23 dataset contains 4,717,215 motion capture recordings, voluntarily submitted by 105,852 XR device users from over 50 countries. BOXRR-23 is over 200 times larger than the largest existing motion capture research dataset and uses a new, highly efficient purpose-built XR Open Recording (XROR) file format.
△ Less
Submitted 30 September, 2023;
originally announced October 2023.
-
TranDRL: A Transformer-Driven Deep Reinforcement Learning Enabled Prescriptive Maintenance Framework
Authors:
Yang Zhao,
Jiaxi Yang,
Wenbo Wang,
Helin Yang,
Dusit Niyato
Abstract:
Industrial systems demand reliable predictive maintenance strategies to enhance operational efficiency and reduce downtime. This paper introduces an integrated framework that leverages the capabilities of the Transformer model-based neural networks and deep reinforcement learning (DRL) algorithms to optimize system maintenance actions. Our approach employs the Transformer model to effectively capt…
▽ More
Industrial systems demand reliable predictive maintenance strategies to enhance operational efficiency and reduce downtime. This paper introduces an integrated framework that leverages the capabilities of the Transformer model-based neural networks and deep reinforcement learning (DRL) algorithms to optimize system maintenance actions. Our approach employs the Transformer model to effectively capture complex temporal patterns in sensor data, thereby accurately predicting the remaining useful life (RUL) of an equipment. Additionally, the DRL component of our framework provides cost-effective and timely maintenance recommendations. We validate the efficacy of our framework on the NASA C-MPASS dataset, where it demonstrates significant advancements in both RUL prediction accuracy and the optimization of maintenance actions, compared to the other prevalent machine learning-based methods. Our proposed approach provides an innovative data-driven framework for industry machine systems, accurately forecasting equipment lifespans and optimizing maintenance schedules, thereby reducing downtime and cutting costs.
△ Less
Submitted 20 February, 2024; v1 submitted 28 September, 2023;
originally announced September 2023.
-
Leveraging Neural Networks to Profile Health Care Providers with Application to Medicare Claims
Authors:
Wenbo Wu,
Fan Li,
Richard Liu,
Yiting Li,
Mara McAdams-DeMarco,
Krzysztof J. Geras,
Douglas E. Schaubel,
Iván Díaz
Abstract:
Encompassing numerous nationwide, statewide, and institutional initiatives in the United States, provider profiling has evolved into a major health care undertaking with ubiquitous applications, profound implications, and high-stakes consequences. In line with such a significant profile, the literature has accumulated a number of developments dedicated to enhancing the statistical paradigm of prov…
▽ More
Encompassing numerous nationwide, statewide, and institutional initiatives in the United States, provider profiling has evolved into a major health care undertaking with ubiquitous applications, profound implications, and high-stakes consequences. In line with such a significant profile, the literature has accumulated a number of developments dedicated to enhancing the statistical paradigm of provider profiling. Tackling wide-ranging profiling issues, these methods typically adjust for risk factors using linear predictors. While this approach is simple, it can be too restrictive to characterize complex and dynamic factor-outcome associations in certain contexts. One such example arises from evaluating dialysis facilities treating Medicare beneficiaries with end-stage renal disease. It is of primary interest to consider how the coronavirus disease (COVID-19) affected 30-day unplanned readmissions in 2020. The impact of COVID-19 on the risk of readmission varied dramatically across pandemic phases. To efficiently capture the variation while profiling facilities, we develop a generalized partially linear model (GPLM) that incorporates a neural network. Considering provider-level clustering, we implement the GPLM as a stratified sampling-based stochastic optimization algorithm that features accelerated convergence. Furthermore, an exact test is designed to identify under- and over-performing facilities, with an accompanying funnel plot to visualize profiles. The advantages of the proposed methods are demonstrated through simulation experiments and profiling dialysis facilities using 2020 Medicare claims from the United States Renal Data System.
△ Less
Submitted 20 January, 2024; v1 submitted 26 September, 2023;
originally announced September 2023.
-
Adaptive Denoising-Enhanced LiDAR Odometry for Degeneration Resilience in Diverse Terrains
Authors:
Mazeyu Ji,
Wenbo Shi,
Yujie Cui,
Chengju Liu,
Qijun Chen
Abstract:
The flexibility of Simultaneous Localization and Map** (SLAM) algorithms in various environments has consistently been a significant challenge. To address the issue of LiDAR odometry drift in high-noise settings, integrating clustering methods to filter out unstable features has become an effective module of SLAM frameworks. However, reducing the amount of point cloud data can lead to potential…
▽ More
The flexibility of Simultaneous Localization and Map** (SLAM) algorithms in various environments has consistently been a significant challenge. To address the issue of LiDAR odometry drift in high-noise settings, integrating clustering methods to filter out unstable features has become an effective module of SLAM frameworks. However, reducing the amount of point cloud data can lead to potential loss of information and possible degeneration. As a result, this research proposes a LiDAR odometry that can dynamically assess the point cloud's reliability. The algorithm aims to improve adaptability in diverse settings by selecting important feature points with sensitivity to the level of environmental degeneration. Firstly, a fast adaptive Euclidean clustering algorithm based on range image is proposed, which, combined with depth clustering, extracts the primary structural points of the environment defined as ambient skeleton points. Then, the environmental degeneration level is computed through the dense normal features of the skeleton points, and the point cloud cleaning is dynamically adjusted accordingly. The algorithm is validated on the KITTI benchmark and real environments, demonstrating higher accuracy and robustness in different environments.
△ Less
Submitted 6 February, 2024; v1 submitted 25 September, 2023;
originally announced September 2023.
-
Sparsity-Based Channel Estimation Exploiting Deep Unrolling for Downlink Massive MIMO
Authors:
An Chen,
Wenbo Xu,
Liyang Lu,
Yue Wang
Abstract:
Massive multiple-input multiple-output (MIMO) enjoys great advantage in 5G wireless communication systems owing to its spectrum and energy efficiency. However, hundreds of antennas require large volumes of pilot overhead to guarantee reliable channel estimation in FDD massive MIMO system. Compressive sensing (CS) has been applied for channel estimation by exploiting the inherent sparse structure o…
▽ More
Massive multiple-input multiple-output (MIMO) enjoys great advantage in 5G wireless communication systems owing to its spectrum and energy efficiency. However, hundreds of antennas require large volumes of pilot overhead to guarantee reliable channel estimation in FDD massive MIMO system. Compressive sensing (CS) has been applied for channel estimation by exploiting the inherent sparse structure of massive MIMO channel but suffer from high complexity. To overcome this challenge, this paper develops a hybrid channel estimation scheme by integrating the model-driven CS and data-driven deep unrolling technique. The proposed scheme consists of a coarse estimation part and a fine correction part to respectively exploit the inter- and intraframe sparsities of channels to greatly reduce the pilot overhead. Theoretical result is provided to indicate the convergence of the fine correction and coarse estimation net. Simulation results are provided to verify that our scheme can estimate MIMO channels with low pilot overhead while guaranteeing estimation accuracy with relatively low complexity.
△ Less
Submitted 24 September, 2023;
originally announced September 2023.
-
An Empirical Study of Malicious Code In PyPI Ecosystem
Authors:
Wenbo Guo,
Zhengzi Xu,
Chengwei Liu,
Cheng Huang,
Yong Fang,
Yang Liu
Abstract:
PyPI provides a convenient and accessible package management platform to developers, enabling them to quickly implement specific functions and improve work efficiency. However, the rapid development of the PyPI ecosystem has led to a severe problem of malicious package propagation. Malicious developers disguise malicious packages as normal, posing a significant security risk to end-users.
To thi…
▽ More
PyPI provides a convenient and accessible package management platform to developers, enabling them to quickly implement specific functions and improve work efficiency. However, the rapid development of the PyPI ecosystem has led to a severe problem of malicious package propagation. Malicious developers disguise malicious packages as normal, posing a significant security risk to end-users.
To this end, we conducted an empirical study to understand the characteristics and current state of the malicious code lifecycle in the PyPI ecosystem. We first built an automated data collection framework and collated a multi-source malicious code dataset containing 4,669 malicious package files. We preliminarily classified these malicious code into five categories based on malicious behaviour characteristics. Our research found that over 50% of malicious code exhibits multiple malicious behaviours, with information stealing and command execution being particularly prevalent. In addition, we observed several novel attack vectors and anti-detection techniques. Our analysis revealed that 74.81% of all malicious packages successfully entered end-user projects through source code installation, thereby increasing security risks. A real-world investigation showed that many reported malicious packages persist in PyPI mirror servers globally, with over 72% remaining for an extended period after being discovered. Finally, we sketched a portrait of the malicious code lifecycle in the PyPI ecosystem, effectively reflecting the characteristics of malicious code at different stages. We also present some suggested mitigations to improve the security of the Python open-source ecosystem.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
Macroscopic fundamental diagram with volume-delay relationship: model derivation, empirical validation and invariance property
Authors:
Ke Han,
Tao Huang,
Wenbo Fan,
Qian Ge,
Shihui Dong,
Xuting Wang
Abstract:
This paper presents a macroscopic fundamental diagram model with volume-delay relationship (MFD-VD) for road traffic networks, by exploring two new data sources: license plate cameras (LPCs) and road congestion indices (RCIs). We derive a first-order, nonlinear and implicit ordinary differential equation involving the network accumulation (the {\it volume}) and average congestion index (the {\it d…
▽ More
This paper presents a macroscopic fundamental diagram model with volume-delay relationship (MFD-VD) for road traffic networks, by exploring two new data sources: license plate cameras (LPCs) and road congestion indices (RCIs). We derive a first-order, nonlinear and implicit ordinary differential equation involving the network accumulation (the {\it volume}) and average congestion index (the {\it delay}), and use empirical data from a 266 km$^2$ urban network to fit an accumulation-based MFD with $R^2>0.9$. The issue of incomplete traffic volume observed by the LPCs is addressed with a theoretical derivation of the observability-invariant property: The ratio of traffic volume to the critical value (corresponding to the peak of the MFD) is independent of the (unknown) proportion of those detected vehicles. Conditions for such a property to hold are discussed in theory and verified empirically. This offers a practical way to estimate the ratio-to-critical-value, which is an important indicator of network saturation and efficiency, by simply working with a finite set of LPCs. The significance of our work is the introduction of two new data sources widely available to study empirical MFDs, as well as the removal of the assumptions of full observability, known detection rates, and spatially uniform sensors, which are typically required in conventional approaches based on loop detector and floating car data.
△ Less
Submitted 8 February, 2024; v1 submitted 14 September, 2023;
originally announced September 2023.
-
Averages of completely multiplicative functions over the Gaussian integers -- a dynamical approach
Authors:
Sebastián Donoso,
Anh N. Le,
Joel Moreira,
Wenbo Sun
Abstract:
We prove a pointwise convergence result for additive ergodic averages associated with certain multiplicative actions of the Gaussian integers. We derive several applications in dynamics and number theory, including:
(i) Wirsing's theorem for Gaussian integers: if $f\colon \mathbb{G} \to \mathbb{R}$ is a bounded completely multiplicative function, then the following limit exists:…
▽ More
We prove a pointwise convergence result for additive ergodic averages associated with certain multiplicative actions of the Gaussian integers. We derive several applications in dynamics and number theory, including:
(i) Wirsing's theorem for Gaussian integers: if $f\colon \mathbb{G} \to \mathbb{R}$ is a bounded completely multiplicative function, then the following limit exists: $$\lim_{N \to \infty} \frac{1}{N^2} \sum_{1 \leq m, n \leq N} f(m + {\rm i} n).$$ (ii) An answer to a special case of a question of Frantzikinakis and Host: for any completely multiplicative real-valued function $f: \mathbb{N} \to \mathbb{R}$, the following limit exists: $$\lim_{N \to \infty} \frac{1}{N^2} \sum_{1 \leq m, n \leq N} f(m^2 + n^2).$$ (iii) A variant of a theorem of Bergelson and Richter on ergodic averages along the $Ω$ function: if $(X,T)$ is a uniquely ergodic system with unique invariant measure $μ$, then for any $x\in X$ and $f\in C(X)$, $$\lim_{N\to\infty}\frac{1}{N^2}\sum_{1 \leq m, n \leq N} f(T^{Ω(m^2 + n^2)}x)=\int_Xf \ dμ.$$
△ Less
Submitted 6 March, 2024; v1 submitted 13 September, 2023;
originally announced September 2023.
-
Federated PAC-Bayesian Learning on Non-IID data
Authors:
Zihao Zhao,
Yang Liu,
Wenbo Ding,
Xiao-** Zhang
Abstract:
Existing research has either adapted the Probably Approximately Correct (PAC) Bayesian framework for federated learning (FL) or used information-theoretic PAC-Bayesian bounds while introducing their theorems, but few considering the non-IID challenges in FL. Our work presents the first non-vacuous federated PAC-Bayesian bound tailored for non-IID local data. This bound assumes unique prior knowled…
▽ More
Existing research has either adapted the Probably Approximately Correct (PAC) Bayesian framework for federated learning (FL) or used information-theoretic PAC-Bayesian bounds while introducing their theorems, but few considering the non-IID challenges in FL. Our work presents the first non-vacuous federated PAC-Bayesian bound tailored for non-IID local data. This bound assumes unique prior knowledge for each client and variable aggregation weights. We also introduce an objective function and an innovative Gibbs-based algorithm for the optimization of the derived bound. The results are validated on real-world datasets.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Catch You Everything Everywhere: Guarding Textual Inversion via Concept Watermarking
Authors:
Weitao Feng,
Jiyan He,
Jie Zhang,
Tianwei Zhang,
Wenbo Zhou,
Weiming Zhang,
Nenghai Yu
Abstract:
AIGC (AI-Generated Content) has achieved tremendous success in many applications such as text-to-image tasks, where the model can generate high-quality images with diverse prompts, namely, different descriptions in natural languages. More surprisingly, the emerging personalization techniques even succeed in describing unseen concepts with only a few personal images as references, and there have be…
▽ More
AIGC (AI-Generated Content) has achieved tremendous success in many applications such as text-to-image tasks, where the model can generate high-quality images with diverse prompts, namely, different descriptions in natural languages. More surprisingly, the emerging personalization techniques even succeed in describing unseen concepts with only a few personal images as references, and there have been some commercial platforms for sharing the valuable personalized concept. However, such an advanced technique also introduces a severe threat, where malicious users can misuse the target concept to generate highly-realistic illegal images. Therefore, it becomes necessary for the platform to trace malicious users and hold them accountable.
In this paper, we focus on guarding the most popular lightweight personalization model, ie, Textual Inversion (TI). To achieve it, we propose the novel concept watermarking, where watermark information is embedded into the target concept and then extracted from generated images based on the watermarked concept. Specifically, we jointly train a watermark encoder and a watermark decoder with the sampler in the loop.
It shows great resilience to different diffusion sampling processes possibly chosen by malicious users, meanwhile preserving utility for normal use. In practice, the concept owner can upload his concept with different watermarks (ie, serial numbers) to the platform, and the platform allocates different users with different serial numbers for subsequent tracing and forensics.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World Conditions
Authors:
Jiong Wang,
Fengyu Yang,
Wenbo Gou,
Bingliang Li,
Danqi Yan,
Ailing Zeng,
Yijun Gao,
Junle Wang,
Yanqing **g,
Ruimao Zhang
Abstract:
Estimating the 3D structure of the human body from natural scenes is a fundamental aspect of visual perception. 3D human pose estimation is a vital step in advancing fields like AIGC and human-robot interaction, serving as a crucial technique for understanding and interacting with human actions in real-world settings. However, the current datasets, often collected under single laboratory condition…
▽ More
Estimating the 3D structure of the human body from natural scenes is a fundamental aspect of visual perception. 3D human pose estimation is a vital step in advancing fields like AIGC and human-robot interaction, serving as a crucial technique for understanding and interacting with human actions in real-world settings. However, the current datasets, often collected under single laboratory conditions using complex motion capture equipment and unvarying backgrounds, are insufficient. The absence of datasets on variable conditions is stalling the progress of this crucial task. To facilitate the development of 3D pose estimation, we present FreeMan, the first large-scale, multi-view dataset collected under the real-world conditions. FreeMan was captured by synchronizing 8 smartphones across diverse scenarios. It comprises 11M frames from 8000 sequences, viewed from different perspectives. These sequences cover 40 subjects across 10 different scenarios, each with varying lighting conditions. We have also established an semi-automated pipeline containing error detection to reduce the workload of manual check and ensure precise annotation. We provide comprehensive evaluation baselines for a range of tasks, underlining the significant challenges posed by FreeMan. Further evaluations of standard indoor/outdoor human sensing datasets reveal that FreeMan offers robust representation transferability in real and complex scenes. Code and data are available at https://wangjiongw.github.io/freeman.
△ Less
Submitted 3 April, 2024; v1 submitted 10 September, 2023;
originally announced September 2023.