-
Unsupervised Solution Operator Learning for Mean-Field Games via Sampling-Invariant Parametrizations
Authors:
Han Huang,
Rongjie Lai
Abstract:
Recent advances in deep learning has witnessed many innovative frameworks that solve high dimensional mean-field games (MFG) accurately and efficiently. These methods, however, are restricted to solving single-instance MFG and demands extensive computational time per instance, limiting practicality. To overcome this, we develop a novel framework to learn the MFG solution operator. Our model takes…
▽ More
Recent advances in deep learning has witnessed many innovative frameworks that solve high dimensional mean-field games (MFG) accurately and efficiently. These methods, however, are restricted to solving single-instance MFG and demands extensive computational time per instance, limiting practicality. To overcome this, we develop a novel framework to learn the MFG solution operator. Our model takes a MFG instances as input and output their solutions with one forward pass. To ensure the proposed parametrization is well-suited for operator learning, we introduce and prove the notion of sampling invariance for our model, establishing its convergence to a continuous operator in the sampling limit. Our method features two key advantages. First, it is discretization-free, making it particularly suitable for learning operators of high-dimensional MFGs. Secondly, it can be trained without the need for access to supervised labels, significantly reducing the computational overhead associated with creating training datasets in existing operator learning methods. We test our framework on synthetic and realistic datasets with varying complexity and dimensionality to substantiate its robustness.
△ Less
Submitted 23 April, 2024; v1 submitted 27 January, 2024;
originally announced January 2024.
-
Constant-roll inflation and primordial black holes with Barrow holographic dark energy
Authors:
Qihong Huang,
He Huang,
Bing Xu,
Kaituo Zhang
Abstract:
We investigate the constant-roll inflation and the evolution of primordial black holes (PBHs) with Barrow holographic dark energy (BHDE). Using the modified Friedmann equation and the constant-roll condition in BHDE model, we calculate the constant-roll parameters, the scalar spectral index parameter and the tensor-to-scalar ratio with the chaotic potential $V_{0}φ^{n}$. Then, we show that a suita…
▽ More
We investigate the constant-roll inflation and the evolution of primordial black holes (PBHs) with Barrow holographic dark energy (BHDE). Using the modified Friedmann equation and the constant-roll condition in BHDE model, we calculate the constant-roll parameters, the scalar spectral index parameter and the tensor-to-scalar ratio with the chaotic potential $V_{0}φ^{n}$. Then, we show that a suitable value of the power exponent is $n=1$ by using the Planck 2018 data. Considering the accretion process and the evaporation due to Hawking radiation, we discuss the evolution of PBHs in BHDE model and obtain that the PBHs mass is in the mass window of PBHs.
△ Less
Submitted 27 January, 2024;
originally announced January 2024.
-
Multi-Trigger Backdoor Attacks: More Triggers, More Threats
Authors:
Yige Li,
Xingjun Ma,
Jiabo He,
Hanxun Huang,
Yu-Gang Jiang
Abstract:
Backdoor attacks have emerged as a primary threat to (pre-)training and deployment of deep neural networks (DNNs). While backdoor attacks have been extensively studied in a body of works, most of them were focused on single-trigger attacks that poison a dataset using a single type of trigger. Arguably, real-world backdoor attacks can be much more complex, e.g., the existence of multiple adversarie…
▽ More
Backdoor attacks have emerged as a primary threat to (pre-)training and deployment of deep neural networks (DNNs). While backdoor attacks have been extensively studied in a body of works, most of them were focused on single-trigger attacks that poison a dataset using a single type of trigger. Arguably, real-world backdoor attacks can be much more complex, e.g., the existence of multiple adversaries for the same dataset if it is of high value. In this work, we investigate the practical threat of backdoor attacks under the setting of \textbf{multi-trigger attacks} where multiple adversaries leverage different types of triggers to poison the same dataset. By proposing and investigating three types of multi-trigger attacks, including parallel, sequential, and hybrid attacks, we provide a set of important understandings of the coexisting, overwriting, and cross-activating effects between different triggers on the same dataset. Moreover, we show that single-trigger attacks tend to cause overly optimistic views of the security of current defense techniques, as all examined defense methods struggle to defend against multi-trigger attacks. Finally, we create a multi-trigger backdoor poisoning dataset to help future evaluation of backdoor attacks and defenses. Although our work is purely empirical, we hope it can help steer backdoor research toward more realistic settings.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
Shadow images of compact objects in beyond Horndeski theory
Authors:
Hyat Huang,
Jutta Kunz,
Deeshani Mitra
Abstract:
A beyond Horndeski theory is considered that admits wormholes, black holes and naked singularities. In this theory the shadow images of the black holes and the exotic compact objects (ECOs), illuminated by an optically and geometrically thin disk, are investigated. The results show that the three kinds of objects cast unlike shadow images, in particular, because the different objects possess a dif…
▽ More
A beyond Horndeski theory is considered that admits wormholes, black holes and naked singularities. In this theory the shadow images of the black holes and the exotic compact objects (ECOs), illuminated by an optically and geometrically thin disk, are investigated. The results show that the three kinds of objects cast unlike shadow images, in particular, because the different objects possess a different number of light rings. The different boundaries of the accretion disk also affect the images. This may provide further insight into the nature of the shadow images of massive compact objects.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
Authors:
Zheqi He,
Xinya Wu,
Pengfei Zhou,
Richeng Xuan,
Guang Liu,
Xi Yang,
Qiannan Zhu,
Hua Huang
Abstract:
Multi-modal large language models(MLLMs) have achieved remarkable progress and demonstrated powerful knowledge comprehension and reasoning abilities. However, the mastery of domain-specific knowledge, which is essential for evaluating the intelligence of MLLMs, continues to be a challenge. Current multi-modal benchmarks for domain-specific knowledge concentrate on multiple-choice questions and are…
▽ More
Multi-modal large language models(MLLMs) have achieved remarkable progress and demonstrated powerful knowledge comprehension and reasoning abilities. However, the mastery of domain-specific knowledge, which is essential for evaluating the intelligence of MLLMs, continues to be a challenge. Current multi-modal benchmarks for domain-specific knowledge concentrate on multiple-choice questions and are predominantly available in English, which imposes limitations on the comprehensiveness of the evaluation. To this end, we introduce CMMU, a novel benchmark for multi-modal and multi-type question understanding and reasoning in Chinese. CMMU consists of 3,603 questions in 7 subjects, covering knowledge from primary to high school. The questions can be categorized into 3 types: multiple-choice, multiple-response, and fill-in-the-blank, bringing greater challenges to MLLMs. In addition, we propose an evaluation strategy called Positional Error Variance for assessing multiple-choice questions. The strategy aims to perform a quantitative analysis of position bias. We evaluate seven open-source MLLMs along with GPT4-V, Gemini-Pro, and Qwen-VL-Plus. The results demonstrate that CMMU poses a significant challenge to the recent MLLMs. The data and code are available at https://github.com/FlagOpen/CMMU.
△ Less
Submitted 8 May, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
General Automatic Solution Generation of Social Problems
Authors:
Tong Niu,
Haoyu Huang,
Yu Du,
Weihao Zhang,
Lu** Shi,
Rong Zhao
Abstract:
Given the escalating intricacy and multifaceted nature of contemporary social systems, manually generating solutions to address pertinent social issues has become a formidable task. In response to this challenge, the rapid development of artificial intelligence has spurred the exploration of computational methodologies aimed at automatically generating solutions. However, current methods for auto-…
▽ More
Given the escalating intricacy and multifaceted nature of contemporary social systems, manually generating solutions to address pertinent social issues has become a formidable task. In response to this challenge, the rapid development of artificial intelligence has spurred the exploration of computational methodologies aimed at automatically generating solutions. However, current methods for auto-generation of solutions mainly concentrate on local social regulations that pertain to specific scenarios. Here, we report an automatic social operating system (ASOS) designed for general social solution generation, which is built upon agent-based models, enabling both global and local analyses and regulations of social problems across spatial and temporal dimensions. ASOS adopts a hypergraph with extensible social semantics for a comprehensive and structured representation of social dynamics. It also incorporates a generalized protocol for standardized hypergraph operations and a symbolic hybrid framework that delivers interpretable solutions, yielding a balance between regulatory efficacy and function viability. To demonstrate the effectiveness of ASOS, we apply it to the domain of averting extreme events within international oil futures markets. By generating a new trading role supplemented by new mechanisms, ASOS can adeptly discern precarious market conditions and make front-running interventions for non-profit purposes. This study demonstrates that ASOS provides an efficient and systematic approach for generating solutions for enhancing our society.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Interleaving One-Class and Weakly-Supervised Models with Adaptive Thresholding for Unsupervised Video Anomaly Detection
Authors:
Yongwei Nie,
Hao Huang,
Chengjiang Long,
Qing Zhang,
Pradipta Maji,
Hongmin Cai
Abstract:
Without human annotations, a typical Unsupervised Video Anomaly Detection (UVAD) method needs to train two models that generate pseudo labels for each other. In previous work, the two models are closely entangled with each other, and it is not known how to upgrade their method without modifying their training framework significantly. Second, previous work usually adopts fixed thresholding to obtai…
▽ More
Without human annotations, a typical Unsupervised Video Anomaly Detection (UVAD) method needs to train two models that generate pseudo labels for each other. In previous work, the two models are closely entangled with each other, and it is not known how to upgrade their method without modifying their training framework significantly. Second, previous work usually adopts fixed thresholding to obtain pseudo labels, however the user-specified threshold is not reliable which inevitably introduces errors into the training process. To alleviate these two problems, we propose a novel interleaved framework that alternately trains a One-Class Classification (OCC) model and a Weakly-Supervised (WS) model for UVAD. The OCC or WS models in our method can be easily replaced with other OCC or WS models, which facilitates our method to upgrade with the most recent developments in both fields. For handling the fixed thresholding problem, we break through the conventional cognitive boundary and propose a weighted OCC model that can be trained on both normal and abnormal data. We also propose an adaptive mechanism for automatically finding the optimal threshold for the WS model in a loose to strict manner. Experiments demonstrate that the proposed UVAD method outperforms previous approaches.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
GraphiMind: LLM-centric Interface for Information Graphics Design
Authors:
Qirui Huang,
Min Lu,
Joel Lanir,
Dani Lischinski,
Daniel Cohen-Or,
Hui Huang
Abstract:
Information graphics are pivotal in effective information dissemination and storytelling. However, creating such graphics is extremely challenging for non-professionals, since the design process requires multifaceted skills and comprehensive knowledge. Thus, despite the many available authoring tools, a significant gap remains in enabling non-experts to produce compelling information graphics seam…
▽ More
Information graphics are pivotal in effective information dissemination and storytelling. However, creating such graphics is extremely challenging for non-professionals, since the design process requires multifaceted skills and comprehensive knowledge. Thus, despite the many available authoring tools, a significant gap remains in enabling non-experts to produce compelling information graphics seamlessly, especially from scratch. Recent breakthroughs show that Large Language Models (LLMs), especially when tool-augmented, can autonomously engage with external tools, making them promising candidates for enabling innovative graphic design applications. In this work, we propose a LLM-centric interface with the agent GraphiMind for automatic generation, recommendation, and composition of information graphics design resources, based on user intent expressed through natural language. Our GraphiMind integrates a Textual Conversational Interface, powered by tool-augmented LLM, with a traditional Graphical Manipulation Interface, streamlining the entire design process from raw resource curation to composition and refinement. Extensive evaluations highlight our tool's proficiency in simplifying the design process, opening avenues for its use by non-professional users. Moreover, we spotlight the potential of LLMs in resha** the domain of information graphics design, offering a blend of automation, versatility, and user-centric interactivity.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Style-Consistent 3D Indoor Scene Synthesis with Decoupled Objects
Authors:
Yunfan Zhang,
Hong Huang,
Zhiwei Xiong,
Zhiqi Shen,
Guosheng Lin,
Hao Wang,
Nicholas Vun
Abstract:
Controllable 3D indoor scene synthesis stands at the forefront of technological progress, offering various applications like gaming, film, and augmented/virtual reality. The capability to stylize and de-couple objects within these scenarios is a crucial factor, providing an advanced level of control throughout the editing process. This control extends not just to manipulating geometric attributes…
▽ More
Controllable 3D indoor scene synthesis stands at the forefront of technological progress, offering various applications like gaming, film, and augmented/virtual reality. The capability to stylize and de-couple objects within these scenarios is a crucial factor, providing an advanced level of control throughout the editing process. This control extends not just to manipulating geometric attributes like translation and scaling but also includes managing appearances, such as stylization. Current methods for scene stylization are limited to applying styles to the entire scene, without the ability to separate and customize individual objects. Addressing the intricacies of this challenge, we introduce a unique pipeline designed for synthesis 3D indoor scenes. Our approach involves strategically placing objects within the scene, utilizing information from professionally designed bounding boxes. Significantly, our pipeline prioritizes maintaining style consistency across multiple objects within the scene, ensuring a cohesive and visually appealing result aligned with the desired aesthetic. The core strength of our pipeline lies in its ability to generate 3D scenes that are not only visually impressive but also exhibit features like photorealism, multi-view consistency, and diversity. These scenes are crafted in response to various natural language prompts, demonstrating the versatility and adaptability of our model.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Fourier Transporter: Bi-Equivariant Robotic Manipulation in 3D
Authors:
Haojie Huang,
Owen Howell,
Dian Wang,
Xupeng Zhu,
Robin Walters,
Robert Platt
Abstract:
Many complex robotic manipulation tasks can be decomposed as a sequence of pick and place actions. Training a robotic agent to learn this sequence over many different starting conditions typically requires many iterations or demonstrations, especially in 3D environments. In this work, we propose Fourier Transporter (FourTran) which leverages the two-fold SE(d)xSE(d) symmetry in the pick-place prob…
▽ More
Many complex robotic manipulation tasks can be decomposed as a sequence of pick and place actions. Training a robotic agent to learn this sequence over many different starting conditions typically requires many iterations or demonstrations, especially in 3D environments. In this work, we propose Fourier Transporter (FourTran) which leverages the two-fold SE(d)xSE(d) symmetry in the pick-place problem to achieve much higher sample efficiency. FourTran is an open-loop behavior cloning method trained using expert demonstrations to predict pick-place actions on new environments. FourTran is constrained to incorporate symmetries of the pick and place actions independently. Our method utilizes a fiber space Fourier transformation that allows for memory-efficient construction. We test our proposed network on the RLbench benchmark and achieve state-of-the-art results across various tasks.
△ Less
Submitted 15 March, 2024; v1 submitted 22 January, 2024;
originally announced January 2024.
-
Identifying and Analyzing Task-Encoding Tokens in Large Language Models
Authors:
Yu Bai,
Heyan Huang,
Cesare Spinoso-Di Piano,
Marc-Antoine Rondeau,
Sanxing Chen,
Yang Gao,
Jackie Chi Kit Cheung
Abstract:
In-context learning (ICL) has become an effective solution for few-shot learning in natural language processing. However, our understanding of ICL's working mechanisms is limited, specifically regarding how models learn to perform tasks from ICL demonstrations. For example, unexpectedly large changes in performance can arise from small changes in the prompt, leaving prompt design a largely empiric…
▽ More
In-context learning (ICL) has become an effective solution for few-shot learning in natural language processing. However, our understanding of ICL's working mechanisms is limited, specifically regarding how models learn to perform tasks from ICL demonstrations. For example, unexpectedly large changes in performance can arise from small changes in the prompt, leaving prompt design a largely empirical endeavour. In this paper, we investigate this problem by identifying and analyzing task-encoding tokens on whose representations the task performance depends. Using experiments that ablate the representations of different token types, we find that template and stopword tokens are the most prone to be task-encoding. In addition, we demonstrate experimentally that lexical meaning, repetition, and text formatting are the main distinguishing characteristics of these tokens. Our work sheds light on how large language models (LLMs) learn to perform a task from demonstrations, deepens our understanding of the varied roles different types of tokens play in LLMs, and provides insights for avoiding instability from improperly utilizing task-encoding tokens.
△ Less
Submitted 16 February, 2024; v1 submitted 20 January, 2024;
originally announced January 2024.
-
Inference without Interference: Disaggregate LLM Inference for Mixed Downstream Workloads
Authors:
Cunchen Hu,
Heyang Huang,
Liangliang Xu,
Xusheng Chen,
Jiang Xu,
Shuang Chen,
Hao Feng,
Chenxi Wang,
Sa Wang,
Yungang Bao,
Ninghui Sun,
Yizhou Shan
Abstract:
Transformer-based large language model (LLM) inference serving is now the backbone of many cloud services. LLM inference consists of a prefill phase and a decode phase. However, existing LLM deployment practices often overlook the distinct characteristics of these phases, leading to significant interference. To mitigate interference, our insight is to carefully schedule and group inference request…
▽ More
Transformer-based large language model (LLM) inference serving is now the backbone of many cloud services. LLM inference consists of a prefill phase and a decode phase. However, existing LLM deployment practices often overlook the distinct characteristics of these phases, leading to significant interference. To mitigate interference, our insight is to carefully schedule and group inference requests based on their characteristics. We realize this idea in TetriInfer through three pillars. First, it partitions prompts into fixed-size chunks so that the accelerator always runs close to its computationsaturated limit. Second, it disaggregates prefill and decode instances so each can run independently. Finally, it uses a smart two-level scheduling algorithm augmented with predicted resource usage to avoid decode scheduling hotspots. Results show that TetriInfer improves time-to-first-token (TTFT), job completion time (JCT), and inference efficiency in turns of performance per dollar by a large margin, e.g., it uses 38% less resources all the while lowering average TTFT and average JCT by 97% and 47%, respectively.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching
Authors:
Ling Jiang,
Junwen An,
Huihui Huang,
Qiyi Tang,
Sen Nie,
Shi Wu,
Yuqun Zhang
Abstract:
While third-party libraries are extensively reused to enhance productivity during software development, they can also introduce potential security risks such as vulnerability propagation. Software composition analysis, proposed to identify reused TPLs for reducing such risks, has become an essential procedure within modern DevSecOps. As one of the mainstream SCA techniques, binary-to-source SCA id…
▽ More
While third-party libraries are extensively reused to enhance productivity during software development, they can also introduce potential security risks such as vulnerability propagation. Software composition analysis, proposed to identify reused TPLs for reducing such risks, has become an essential procedure within modern DevSecOps. As one of the mainstream SCA techniques, binary-to-source SCA identifies the third-party source projects contained in binary files via binary source code matching, which is a major challenge in reverse engineering since binary and source code exhibit substantial disparities after compilation. The existing binary-to-source SCA techniques leverage basic syntactic features that suffer from redundancy and lack robustness in the large-scale TPL dataset, leading to inevitable false positives and compromised recall. To mitigate these limitations, we introduce BinaryAI, a novel binary-to-source SCA technique with two-phase binary source code matching to capture both syntactic and semantic code features. First, BinaryAI trains a transformer-based model to produce function-level embeddings and obtain similar source functions for each binary function accordingly. Then by applying the link-time locality to facilitate function matching, BinaryAI detects the reused TPLs based on the ratio of matched source functions. Our experimental results demonstrate the superior performance of BinaryAI in terms of binary source code matching and the downstream SCA task. Specifically, our embedding model outperforms the state-of-the-art model CodeCMR, i.e., achieving 22.54% recall@1 and 0.34 MRR compared with 10.75% and 0.17 respectively. Additionally, BinaryAI outperforms all existing binary-to-source SCA tools in TPL detection, increasing the precision from 73.36% to 85.84% and recall from 59.81% to 64.98% compared with the well-recognized commercial SCA product Black Duck.
△ Less
Submitted 23 January, 2024; v1 submitted 20 January, 2024;
originally announced January 2024.
-
Quasiparticle scattering in three-dimensional topological insulators near the thickness limit
Authors:
Haiming Huang,
Mu Chen,
Dezhi Song,
Jun Zhang,
Ye-** Jiang
Abstract:
In the ultra-thin regime, Bi2Te3 films feature two surfaces (with each surface being a two-dimensional Dirac-fermion system) with complicated spin textures and a tunneling term between them. We find in this regime that the quasiparticle scattering is completely different compared with the thick-film case and even behaves differently at each thickness. The thickness-dependent war** effect and tun…
▽ More
In the ultra-thin regime, Bi2Te3 films feature two surfaces (with each surface being a two-dimensional Dirac-fermion system) with complicated spin textures and a tunneling term between them. We find in this regime that the quasiparticle scattering is completely different compared with the thick-film case and even behaves differently at each thickness. The thickness-dependent war** effect and tunneling term are found to be the two main factors that govern the scattering behaviors. The inter-band back-scattering that signals the existence of a tunneling term is found to disappear at 4 quintuple layers by the step-edge reflection approach. A four-band model is presented that captures the main features of the thickness-dependent scattering behaviors. Our work clarifies that the prohibition of back-scattering guaranteed by symmetry in topological insulators breaks down in the ultra-thin regime.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
Hybrid Online Certificate Status Protocol with Certificate Revocation List for Smart Grid Public Key Infrastructure
Authors:
Hong-Sheng Huang,
Zhe-Yi Jiang,
Hsuan-Tung Chen,
Hung-Min Sun
Abstract:
Hsu et al. (2022) proposed a cryptographic scheme within the public key infrastructure to bolster the security of smart grid meters. Their proposal involved develo** the Certificate Management over CMS mechanism to establish Simple Certificate Enrollment Protocol and Enrollment over Secure Transport protocol. Additionally, they implemented Online Certificate Status Protocol (OCSP) services to in…
▽ More
Hsu et al. (2022) proposed a cryptographic scheme within the public key infrastructure to bolster the security of smart grid meters. Their proposal involved develo** the Certificate Management over CMS mechanism to establish Simple Certificate Enrollment Protocol and Enrollment over Secure Transport protocol. Additionally, they implemented Online Certificate Status Protocol (OCSP) services to independently query the status of certificates. However, their implementation featured a single OCSP server handling all query requests. Considering the typical scenario in smart grid PKI environments with over tens of thousands of end-meters, we introduced a Hybrid Online Certificate Status Protocol mechanism. This approach decreases demand of query resources from the client to OCSP servers collaborating with Certificate Revocation Lists. Our simulations, mimicking meter behavior, demonstrated increased efficiency, creating a more robust architecture tailored to the smart grid meter landscape.
△ Less
Submitted 26 February, 2024; v1 submitted 19 January, 2024;
originally announced January 2024.
-
LDReg: Local Dimensionality Regularized Self-Supervised Learning
Authors:
Hanxun Huang,
Ricardo J. G. B. Campello,
Sarah Monazam Erfani,
Xingjun Ma,
Michael E. Houle,
James Bailey
Abstract:
Representations learned via self-supervised learning (SSL) can be susceptible to dimensional collapse, where the learned representation subspace is of extremely low dimensionality and thus fails to represent the full data distribution and modalities. Dimensional collapse also known as the "underfilling" phenomenon is one of the major causes of degraded performance on downstream tasks. Previous wor…
▽ More
Representations learned via self-supervised learning (SSL) can be susceptible to dimensional collapse, where the learned representation subspace is of extremely low dimensionality and thus fails to represent the full data distribution and modalities. Dimensional collapse also known as the "underfilling" phenomenon is one of the major causes of degraded performance on downstream tasks. Previous work has investigated the dimensional collapse problem of SSL at a global level. In this paper, we demonstrate that representations can span over high dimensional space globally, but collapse locally. To address this, we propose a method called $\textit{local dimensionality regularization (LDReg)}$. Our formulation is based on the derivation of the Fisher-Rao metric to compare and optimize local distance distributions at an asymptotically small radius for each data point. By increasing the local intrinsic dimensionality, we demonstrate through a range of experiments that LDReg improves the representation quality of SSL. The results also show that LDReg can regularize dimensionality at both local and global levels.
△ Less
Submitted 14 March, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration
Authors:
**ming Zhuang,
Zhuo** Yang,
Shixin Ji,
Heng Huang,
Alex K. Jones,
**gtong Hu,
Yiyu Shi,
Peipei Zhou
Abstract:
With the increase in the computation intensity of the chip, the mismatch between computation layer shapes and the available computation resource significantly limits the utilization of the chip. Driven by this observation, prior works discuss spatial accelerators or dataflow architecture to maximize the throughput. However, using spatial accelerators could potentially increase the execution latenc…
▽ More
With the increase in the computation intensity of the chip, the mismatch between computation layer shapes and the available computation resource significantly limits the utilization of the chip. Driven by this observation, prior works discuss spatial accelerators or dataflow architecture to maximize the throughput. However, using spatial accelerators could potentially increase the execution latency. In this work, we first systematically investigate two execution models: (1) sequentially (temporally) launch one monolithic accelerator, and (2) spatially launch multiple accelerators. From the observations, we find that there is a latency throughput tradeoff between these two execution models, and combining these two strategies together can give us a more efficient latency throughput Pareto front. To achieve this, we propose spatial sequential architecture (SSR) and SSR design automation framework to explore both strategies together when deploying deep learning inference. We use the 7nm AMD Versal ACAP VCK190 board to implement SSR accelerators for four end-to-end transformer-based deep learning models. SSR achieves average throughput gains of 2.53x, 35.71x, and 14.20x under different batch sizes compared to the 8nm Nvidia GPU A10G, 16nm AMD FPGAs ZCU102, and U250. The average energy efficiency gains are 8.51x, 6.75x, and 21.22x, respectively. Compared with the sequential-only solution and spatial-only solution on VCK190, our spatial-sequential-hybrid solutions achieve higher throughput under the same latency requirement and lower latency under the same throughput requirement. We also use SSR analytical models to demonstrate how to use SSR to optimize solutions on other computing platforms, e.g., 14nm Intel Stratix 10 NX.
△ Less
Submitted 18 February, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
Learning shallow quantum circuits
Authors:
Hsin-Yuan Huang,
Yunchao Liu,
Michael Broughton,
Isaac Kim,
Anurag Anshu,
Zeph Landau,
Jarrod R. McClean
Abstract:
Despite fundamental interests in learning quantum circuits, the existence of a computationally efficient algorithm for learning shallow quantum circuits remains an open question. Because shallow quantum circuits can generate distributions that are classically hard to sample from, existing learning algorithms do not apply. In this work, we present a polynomial-time classical algorithm for learning…
▽ More
Despite fundamental interests in learning quantum circuits, the existence of a computationally efficient algorithm for learning shallow quantum circuits remains an open question. Because shallow quantum circuits can generate distributions that are classically hard to sample from, existing learning algorithms do not apply. In this work, we present a polynomial-time classical algorithm for learning the description of any unknown $n$-qubit shallow quantum circuit $U$ (with arbitrary unknown architecture) within a small diamond distance using single-qubit measurement data on the output states of $U$. We also provide a polynomial-time classical algorithm for learning the description of any unknown $n$-qubit state $\lvert ψ\rangle = U \lvert 0^n \rangle$ prepared by a shallow quantum circuit $U$ (on a 2D lattice) within a small trace distance using single-qubit measurements on copies of $\lvert ψ\rangle$. Our approach uses a quantum circuit representation based on local inversions and a technique to combine these inversions. This circuit representation yields an optimization landscape that can be efficiently navigated and enables efficient learning of quantum circuits that are classically hard to simulate.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
An optimization-based equilibrium measure describes non-equilibrium steady state dynamics: application to edge of chaos
Authors:
Junbin Qiu,
Hai** Huang
Abstract:
Understanding neural dynamics is a central topic in machine learning, non-linear physics and neuroscience. However, the dynamics is non-linear, stochastic and particularly non-gradient, i.e., the driving force can not be written as gradient of a potential. These features make analytic studies very challenging. The common tool is the path integral approach or dynamical mean-field theory, but the dr…
▽ More
Understanding neural dynamics is a central topic in machine learning, non-linear physics and neuroscience. However, the dynamics is non-linear, stochastic and particularly non-gradient, i.e., the driving force can not be written as gradient of a potential. These features make analytic studies very challenging. The common tool is the path integral approach or dynamical mean-field theory, but the drawback is that one has to solve the integro-differential or dynamical mean-field equations, which is computationally expensive and has no closed form solutions in general. From the aspect of associated Fokker-Planck equation, the steady state solution is generally unknown. Here, we treat searching for the steady states as an optimization problem, and construct an approximate potential related to the speed of the dynamics, and find that searching for the ground state of this potential is equivalent to running an approximate stochastic gradient dynamics or Langevin dynamics. Only in the zero temperature limit, the distribution of the original steady states can be achieved. The resultant stationary state of the dynamics follows exactly the canonical Boltzmann measure. Within this framework, the quenched disorder intrinsic in the neural networks can be averaged out by applying the replica method, which leads naturally to order parameters for the non-equilibrium steady states. Our theory reproduces the well-known result of edge-of-chaos, and further the order parameters characterizing the continuous transition are derived, and the order parameters are explained as fluctuations and responses of the steady states. Our method thus opens the door to analytically study the steady state landscape of the deterministic or stochastic high dimensional dynamics.
△ Less
Submitted 7 June, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
Hijacking Attacks against Neural Networks by Analyzing Training Data
Authors:
Yunjie Ge,
Qian Wang,
Huayang Huang,
Qi Li,
Cong Wang,
Chao Shen,
Lingchen Zhao,
Peipei Jiang,
Zheng Fang,
Shenyi Zhang
Abstract:
Backdoors and adversarial examples are the two primary threats currently faced by deep neural networks (DNNs). Both attacks attempt to hijack the model behaviors with unintended outputs by introducing (small) perturbations to the inputs. Backdoor attacks, despite the high success rates, often require a strong assumption, which is not always easy to achieve in reality. Adversarial example attacks,…
▽ More
Backdoors and adversarial examples are the two primary threats currently faced by deep neural networks (DNNs). Both attacks attempt to hijack the model behaviors with unintended outputs by introducing (small) perturbations to the inputs. Backdoor attacks, despite the high success rates, often require a strong assumption, which is not always easy to achieve in reality. Adversarial example attacks, which put relatively weaker assumptions on attackers, often demand high computational resources, yet do not always yield satisfactory success rates when attacking mainstream black-box models in the real world. These limitations motivate the following research question: can model hijacking be achieved more simply, with a higher attack success rate and more reasonable assumptions? In this paper, we propose CleanSheet, a new model hijacking attack that obtains the high performance of backdoor attacks without requiring the adversary to tamper with the model training process. CleanSheet exploits vulnerabilities in DNNs stemming from the training data. Specifically, our key idea is to treat part of the clean training data of the target model as "poisoned data," and capture the characteristics of these data that are more sensitive to the model (typically called robust features) to construct "triggers." These triggers can be added to any input example to mislead the target model, similar to backdoor attacks. We validate the effectiveness of CleanSheet through extensive experiments on 5 datasets, 79 normally trained models, 68 pruned models, and 39 defensive models. Results show that CleanSheet exhibits performance comparable to state-of-the-art backdoor attacks, achieving an average attack success rate (ASR) of 97.5% on CIFAR-100 and 92.4% on GTSRB, respectively. Furthermore, CleanSheet consistently maintains a high ASR, when confronted with various mainstream backdoor defenses.
△ Less
Submitted 19 January, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
The generative quantum eigensolver (GQE) and its application for ground state search
Authors:
Kouhei Nakaji,
Lasse Bjørn Kristensen,
Jorge A. Campos-Gonzalez-Angulo,
Mohammad Ghazi Vakili,
Haozhe Huang,
Mohsen Bagherimehrab,
Christoph Gorgulla,
FuTe Wong,
Alex McCaskey,
**-Sung Kim,
Thien Nguyen,
Pooja Rao,
Alan Aspuru-Guzik
Abstract:
We introduce the generative quantum eigensolver (GQE), a novel method for applying classical generative models for quantum simulation. The GQE algorithm optimizes a classical generative model to produce quantum circuits with desired properties. Here, we develop a transformer-based implementation, which we name the generative pre-trained transformer-based (GPT) quantum eigensolver (GPT-QE), leverag…
▽ More
We introduce the generative quantum eigensolver (GQE), a novel method for applying classical generative models for quantum simulation. The GQE algorithm optimizes a classical generative model to produce quantum circuits with desired properties. Here, we develop a transformer-based implementation, which we name the generative pre-trained transformer-based (GPT) quantum eigensolver (GPT-QE), leveraging both pre-training on existing datasets and training without any prior knowledge. We demonstrate the effectiveness of training and pre-training GPT-QE in the search for ground states of electronic structure Hamiltonians. GQE strategies can extend beyond the problem of Hamiltonian simulation into other application areas of quantum computing.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
IRS-Enhanced Anti-Jamming Precoding Against DISCO Physical Layer Jamming Attacks
Authors:
Huan Huang,
Hongliang Zhang,
Yi Cai,
Yun**g Zhang,
A. Lee Swindlehurst,
Zhu Han
Abstract:
Illegitimate intelligent reflective surfaces (IRSs) can pose significant physical layer security risks on multi-user multiple-input single-output (MU-MISO) systems. Recently, a DISCO approach has been proposed an illegitimate IRS with random and time-varying reflection coefficients, referred to as a "disco" IRS (DIRS). Such DIRS can attack MU-MISO systems without relying on either jamming power or…
▽ More
Illegitimate intelligent reflective surfaces (IRSs) can pose significant physical layer security risks on multi-user multiple-input single-output (MU-MISO) systems. Recently, a DISCO approach has been proposed an illegitimate IRS with random and time-varying reflection coefficients, referred to as a "disco" IRS (DIRS). Such DIRS can attack MU-MISO systems without relying on either jamming power or channel state information (CSI), and classical anti-jamming techniques are ineffective for the DIRS-based fully-passive jammers (DIRS-based FPJs). In this paper, we propose an IRS-enhanced anti-jamming precoder against DIRS-based FPJs that requires only statistical rather than instantaneous CSI of the DIRS-jammed channels. Specifically, a legitimate IRS is introduced to reduce the strength of the DIRS-based jamming relative to the transmit signals at a legitimate user (LU). In addition, the active beamforming at the legitimate access point (AP) is designed to maximize the signal-to-jamming-plus-noise ratios (SJNRs). Numerical results are presented to evaluate the effectiveness of the proposed IRS-enhanced anti-jamming precoder against DIRS-based FPJs.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
A multi-domain model for microcirculation in optic nerve: blood flow and oxygen transport
Authors:
Zilong Song,
Shixin Xu,
Robert Eisenberg,
Huaxiong Huang
Abstract:
Microcirculation of blood and oxygen transport play important roles in biological function of optic nerve and are directly affected by damages or pathologies. This work develops a multi-domain model for optic nerve, that includes important biological structures and various physical mechanisms in blood flow and oxygen delivery. The two sets of vasculature network are treated as five domains in the…
▽ More
Microcirculation of blood and oxygen transport play important roles in biological function of optic nerve and are directly affected by damages or pathologies. This work develops a multi-domain model for optic nerve, that includes important biological structures and various physical mechanisms in blood flow and oxygen delivery. The two sets of vasculature network are treated as five domains in the same geometric region, with various exchanges among them (such as Darcy's law for fluid flow) and with the tissue domain (such as water leak, diffusion). The numerical results of the coupled model for a uniform case of vasculature distribution show mechanisms and scales consistent with literature and intuition. The effects of various important model parameters (relevant to pathological conditions) are investigated to provide insights into the possible implications. The vasculature distribution (resting volume fractions here) has significant impacts on the blood circulation and could lead to insufficient blood supply in certain local region and in turn affect the oxygen delivery. The water leak across the capillary wall will have nontrivial effects after the leak coefficients pass a threshold. The periodic arterial pressure conditions lead to expected periodic patterns and stable spatial profiles, and the uniform case is almost the averaged version of periodic case. The effects of viscosity, the stiffness of blood vessel wall, oxygen demand, etc. have also been analyzed. The framework can be extended to include ionic transport or to study the retina when more biological structural information is available.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Cross Domain Early Crop Map** using CropSTGAN
Authors:
Yiqun Wang,
Hui Huang,
Radu State
Abstract:
Driven by abundant satellite imagery, machine learning-based approaches have recently been promoted to generate high-resolution crop cultivation maps to support many agricultural applications. One of the major challenges faced by these approaches is the limited availability of ground truth labels. In the absence of ground truth, existing work usually adopts the "direct transfer strategy" that trai…
▽ More
Driven by abundant satellite imagery, machine learning-based approaches have recently been promoted to generate high-resolution crop cultivation maps to support many agricultural applications. One of the major challenges faced by these approaches is the limited availability of ground truth labels. In the absence of ground truth, existing work usually adopts the "direct transfer strategy" that trains a classifier using historical labels collected from other regions and then applies the trained model to the target region. Unfortunately, the spectral features of crops exhibit inter-region and inter-annual variability due to changes in soil composition, climate conditions, and crop progress, the resultant models perform poorly on new and unseen regions or years. Despite recent efforts, such as the application of the deep adaptation neural network (DANN) model structure in the deep adaptation crop classification network (DACCN), to tackle the above cross-domain challenges, their effectiveness diminishes significantly when there is a large dissimilarity between the source and target regions. This paper introduces the Crop Map** Spectral-temporal Generative Adversarial Neural Network (CropSTGAN), a novel solution for cross-domain challenges, that doesn't require target domain labels. CropSTGAN learns to transform the target domain's spectral features to those of the source domain, effectively bridging large dissimilarities. Additionally, it employs an identity loss to maintain the intrinsic local structure of the data. Comprehensive experiments across various regions and years demonstrate the benefits and effectiveness of the proposed approach. In experiments, CropSTGAN is benchmarked against various state-of-the-art (SOTA) methods. Notably, CropSTGAN significantly outperforms these methods in scenarios with large data distribution dissimilarities between the target and source domains.
△ Less
Submitted 18 April, 2024; v1 submitted 14 January, 2024;
originally announced January 2024.
-
Improving Domain Adaptation through Extended-Text Reading Comprehension
Authors:
Ting Jiang,
Shaohan Huang,
Shengyue Luo,
Zihan Zhang,
Haizhen Huang,
Furu Wei,
Weiwei Deng,
Feng Sun,
Qi Zhang,
Deqing Wang,
Fuzhen Zhuang
Abstract:
To enhance the domain-specific capabilities of large language models, continued pre-training on a domain-specific corpus is a prevalent method. Recent work demonstrates that adapting models using reading comprehension data formatted by regex-based patterns can significantly improve performance on domain-specific tasks. However, regex-based patterns are incapable of parsing raw corpora using domain…
▽ More
To enhance the domain-specific capabilities of large language models, continued pre-training on a domain-specific corpus is a prevalent method. Recent work demonstrates that adapting models using reading comprehension data formatted by regex-based patterns can significantly improve performance on domain-specific tasks. However, regex-based patterns are incapable of parsing raw corpora using domain-specific knowledge. Furthermore, the question and answer pairs are extracted directly from the corpus in predefined formats offers limited context. To address this limitation, we improve reading comprehension via LLM and clustering. LLM focuses on leveraging domain knowledge within the corpus to refine comprehension stage, while clustering supplies relevant knowledge by extending the context to enrich reading stage. Additionally, our method incorporates parameter-efficient fine-tuning to improve the efficiency of domain adaptation. In comparison to AdaptLLM, our method achieves an improvement exceeding 5% in domain-specific tasks. Our code will available at https://github.com/microsoft/LMOps.
△ Less
Submitted 18 January, 2024; v1 submitted 14 January, 2024;
originally announced January 2024.
-
Graphical Principal Component Analysis of Multivariate Functional Time Series
Authors:
Jianbin Tan,
Decai Liang,
Yongtao Guan,
Hui Huang
Abstract:
In this paper, we consider multivariate functional time series with a two-way dependence structure: a serial dependence across time points and a graphical interaction among the multiple functions within each time point. We develop the notion of dynamic weak separability, a more general condition than those assumed in literature, and use it to characterize the two-way structure in multivariate func…
▽ More
In this paper, we consider multivariate functional time series with a two-way dependence structure: a serial dependence across time points and a graphical interaction among the multiple functions within each time point. We develop the notion of dynamic weak separability, a more general condition than those assumed in literature, and use it to characterize the two-way structure in multivariate functional time series. Based on the proposed weak separability, we develop a unified framework for functional graphical models and dynamic principal component analysis, and further extend it to optimally reconstruct signals from contaminated functional data using graphical-level information. We investigate asymptotic properties of the resulting estimators and illustrate the effectiveness of our proposed approach through extensive simulations. We apply our method to hourly air pollution data that were collected from a monitoring network in China.
△ Less
Submitted 13 January, 2024;
originally announced January 2024.
-
Mini-jet Clustering Algorithm Using Transverse-momentum Seeds in High-energy Nuclear Collisions
Authors:
Hanpu Jiang,
Nanxi Yao,
Cheuk-Yin Wong,
Gang Wang,
Huan Zhong Huang
Abstract:
We propose an algorithm to detect mini-jet clusters in high-energy nuclear collisions, by selecting a high-transverse-momentum ($p_T$) particle as a seed and assigning a clustering radius ($R$) in the pseudorapidity and azimuthal-angle space. Our PYTHIA simulations for $p$+$p$ collisions show that a scheme with a seeding $p_T$ of around 0.5 GeV/$c$ and $R$ of approximately 0.6 satisfactorily ident…
▽ More
We propose an algorithm to detect mini-jet clusters in high-energy nuclear collisions, by selecting a high-transverse-momentum ($p_T$) particle as a seed and assigning a clustering radius ($R$) in the pseudorapidity and azimuthal-angle space. Our PYTHIA simulations for $p$+$p$ collisions show that a scheme with a seeding $p_T$ of around 0.5 GeV/$c$ and $R$ of approximately 0.6 satisfactorily identifies mini-jet clusters. The correlation between clusters obtained in PYTHIA calculations using the algorithm exhibits the proper behavior of hard-scattering-like processes, suggesting its usefulness in isolating mini-jet-like clusters from non-hard-scattering soft processes when applied to actual nuclear-collision data, thereby allowing a closer examination of both the mini-jet and the soft mechanisms.
△ Less
Submitted 11 April, 2024; v1 submitted 12 January, 2024;
originally announced January 2024.
-
EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models
Authors:
**gyuan Yang,
Jiawei Feng,
Hui Huang
Abstract:
Recent years have witnessed remarkable progress in image generation task, where users can create visually astonishing images with high-quality. However, existing text-to-image diffusion models are proficient in generating concrete concepts (dogs) but encounter challenges with more abstract ones (emotions). Several efforts have been made to modify image emotions with color and style adjustments, fa…
▽ More
Recent years have witnessed remarkable progress in image generation task, where users can create visually astonishing images with high-quality. However, existing text-to-image diffusion models are proficient in generating concrete concepts (dogs) but encounter challenges with more abstract ones (emotions). Several efforts have been made to modify image emotions with color and style adjustments, facing limitations in effectively conveying emotions with fixed image contents. In this work, we introduce Emotional Image Content Generation (EICG), a new task to generate semantic-clear and emotion-faithful images given emotion categories. Specifically, we propose an emotion space and construct a map** network to align it with the powerful Contrastive Language-Image Pre-training (CLIP) space, providing a concrete interpretation of abstract emotions. Attribute loss and emotion confidence are further proposed to ensure the semantic diversity and emotion fidelity of the generated images. Our method outperforms the state-of-the-art text-to-image approaches both quantitatively and qualitatively, where we derive three custom metrics, i.e., emotion accuracy, semantic clarity and semantic diversity. In addition to generation, our method can help emotion understanding and inspire emotional art design.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Sibyl: Forecasting Time-Evolving Query Workloads
Authors:
Hanxian Huang,
Tarique Siddiqui,
Rana Alotaibi,
Carlo Curino,
Jyoti Leeka,
Alekh **dal,
Jishen Zhao,
Jesus Camacho-Rodriguez,
Yuanyuan Tian
Abstract:
Database systems often rely on historical query traces to perform workload-based performance tuning. However, real production workloads are time-evolving, making historical queries ineffective for optimizing future workloads. To address this challenge, we propose SIBYL, an end-to-end machine learning-based framework that accurately forecasts a sequence of future queries, with the entire query stat…
▽ More
Database systems often rely on historical query traces to perform workload-based performance tuning. However, real production workloads are time-evolving, making historical queries ineffective for optimizing future workloads. To address this challenge, we propose SIBYL, an end-to-end machine learning-based framework that accurately forecasts a sequence of future queries, with the entire query statements, in various prediction windows. Drawing insights from real-workloads, we propose template-based featurization techniques and develop a stacked-LSTM with an encoder-decoder architecture for accurate forecasting of query workloads. We also develop techniques to improve forecasting accuracy over large prediction windows and achieve high scalability over large workloads with high variability in arrival rates of queries. Finally, we propose techniques to handle workload drifts. Our evaluation on four real workloads demonstrates that SIBYL can forecast workloads with an $87.3\%$ median F1 score, and can result in $1.7\times$ and $1.3\times$ performance improvement when applied to materialized view selection and index selection applications, respectively.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
Charged-current non-standard neutrino interactions at Daya Bay
Authors:
Daya Bay collaboration,
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
Y. C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng,
X. Y. Ding
, et al. (177 additional authors not shown)
Abstract:
The full data set of the Daya Bay reactor neutrino experiment is used to probe the effect of the charged current non-standard interactions (CC-NSI) on neutrino oscillation experiments. Two different approaches are applied and constraints on the corresponding CC-NSI parameters are obtained with the neutrino flux taken from the Huber-Mueller model with a $5\%$ uncertainty. For the quantum mechanics-…
▽ More
The full data set of the Daya Bay reactor neutrino experiment is used to probe the effect of the charged current non-standard interactions (CC-NSI) on neutrino oscillation experiments. Two different approaches are applied and constraints on the corresponding CC-NSI parameters are obtained with the neutrino flux taken from the Huber-Mueller model with a $5\%$ uncertainty. For the quantum mechanics-based approach (QM-NSI), the constraints on the CC-NSI parameters $ε_{eα}$ and $ε_{eα}^{s}$ are extracted with and without the assumption that the effects of the new physics are the same in the production and detection processes, respectively. The approach based on the weak effective field theory (WEFT-NSI) deals with four types of CC-NSI represented by the parameters $[\varepsilon_{X}]_{eα}$. For both approaches, the results for the CC-NSI parameters are shown for cases with various fixed values of the CC-NSI and the Dirac CP-violating phases, and when they are allowed to vary freely. We find that constraints on the QM-NSI parameters $ε_{eα}$ and $ε_{eα}^{s}$ from the Daya Bay experiment alone can reach the order $\mathcal{O}(0.01)$ for the former and $\mathcal{O}(0.1)$ for the latter, while for WEFT-NSI parameters $[\varepsilon_{X}]_{eα}$, we obtain $\mathcal{O}(0.1)$ for both cases.
△ Less
Submitted 19 March, 2024; v1 submitted 5 January, 2024;
originally announced January 2024.
-
Generating Non-Stationary Textures using Self-Rectification
Authors:
Yang Zhou,
Rongjun Xiao,
Dani Lischinski,
Daniel Cohen-Or,
Hui Huang
Abstract:
This paper addresses the challenge of example-based non-stationary texture synthesis. We introduce a novel twostep approach wherein users first modify a reference texture using standard image editing tools, yielding an initial rough target for the synthesis. Subsequently, our proposed method, termed "self-rectification", automatically refines this target into a coherent, seamless texture, while fa…
▽ More
This paper addresses the challenge of example-based non-stationary texture synthesis. We introduce a novel twostep approach wherein users first modify a reference texture using standard image editing tools, yielding an initial rough target for the synthesis. Subsequently, our proposed method, termed "self-rectification", automatically refines this target into a coherent, seamless texture, while faithfully preserving the distinct visual characteristics of the reference exemplar. Our method leverages a pre-trained diffusion network, and uses self-attention mechanisms, to gradually align the synthesized texture with the reference, ensuring the retention of the structures in the provided target. Through experimental validation, our approach exhibits exceptional proficiency in handling non-stationary textures, demonstrating significant advancements in texture synthesis when compared to existing state-of-the-art techniques. Code is available at https://github.com/xiaorongjun000/Self-Rectification
△ Less
Submitted 30 January, 2024; v1 submitted 5 January, 2024;
originally announced January 2024.
-
An Open and Comprehensive Pipeline for Unified Object Grounding and Detection
Authors:
Xiangyu Zhao,
Yicheng Chen,
Shilin Xu,
Xiangtai Li,
Xinjiang Wang,
Yining Li,
Haian Huang
Abstract:
Grounding-DINO is a state-of-the-art open-set detection model that tackles multiple vision tasks including Open-Vocabulary Detection (OVD), Phrase Grounding (PG), and Referring Expression Comprehension (REC). Its effectiveness has led to its widespread adoption as a mainstream architecture for various downstream applications. However, despite its significance, the original Grounding-DINO model lac…
▽ More
Grounding-DINO is a state-of-the-art open-set detection model that tackles multiple vision tasks including Open-Vocabulary Detection (OVD), Phrase Grounding (PG), and Referring Expression Comprehension (REC). Its effectiveness has led to its widespread adoption as a mainstream architecture for various downstream applications. However, despite its significance, the original Grounding-DINO model lacks comprehensive public technical details due to the unavailability of its training code. To bridge this gap, we present MM-Grounding-DINO, an open-source, comprehensive, and user-friendly baseline, which is built with the MMDetection toolbox. It adopts abundant vision datasets for pre-training and various detection and grounding datasets for fine-tuning. We give a comprehensive analysis of each reported result and detailed settings for reproduction. The extensive experiments on the benchmarks mentioned demonstrate that our MM-Grounding-DINO-Tiny outperforms the Grounding-DINO-Tiny baseline. We release all our models to the research community. Codes and trained models are released at https://github.com/open-mmlab/mmdetection/tree/main/configs/mm_grounding_dino.
△ Less
Submitted 5 January, 2024; v1 submitted 4 January, 2024;
originally announced January 2024.
-
Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA
Authors:
Kaiyuan Yang,
Fabio Musio,
Yihui Ma,
Norman Juchler,
Johannes C. Paetzold,
Rami Al-Maskari,
Luciano Höher,
Hongwei Bran Li,
Ibrahim Ethem Hamamci,
Anjany Sekuboyina,
Suprosanna Shit,
Hou**g Huang,
Chinmay Prabhakar,
Ezequiel de la Rosa,
Diana Waldmannstetter,
Florian Kofler,
Fernando Navarro,
Martin Menten,
Ivan Ezhov,
Daniel Rueckert,
Iris Vos,
Ynte Ruigrok,
Birgitta Velthuis,
Hugo Kuijf,
Julien Hämmerli
, et al. (59 additional authors not shown)
Abstract:
The Circle of Willis (CoW) is an important network of arteries connecting major circulations of the brain. Its vascular architecture is believed to affect the risk, severity, and clinical outcome of serious neuro-vascular diseases. However, characterizing the highly variable CoW anatomy is still a manual and time-consuming expert task. The CoW is usually imaged by two angiographic imaging modaliti…
▽ More
The Circle of Willis (CoW) is an important network of arteries connecting major circulations of the brain. Its vascular architecture is believed to affect the risk, severity, and clinical outcome of serious neuro-vascular diseases. However, characterizing the highly variable CoW anatomy is still a manual and time-consuming expert task. The CoW is usually imaged by two angiographic imaging modalities, magnetic resonance angiography (MRA) and computed tomography angiography (CTA), but there exist limited public datasets with annotations on CoW anatomy, especially for CTA. Therefore we organized the TopCoW Challenge in 2023 with the release of an annotated CoW dataset. The TopCoW dataset was the first public dataset with voxel-level annotations for thirteen possible CoW vessel components, enabled by virtual-reality (VR) technology. It was also the first large dataset with paired MRA and CTA from the same patients. TopCoW challenge formalized the CoW characterization problem as a multiclass anatomical segmentation task with an emphasis on topological metrics. We invited submissions worldwide for the CoW segmentation task, which attracted over 140 registered participants from four continents. The top performing teams managed to segment many CoW components to Dice scores around 90%, but with lower scores for communicating arteries and rare variants. There were also topological mistakes for predictions with high Dice scores. Additional topological analysis revealed further areas for improvement in detecting certain CoW components and matching CoW variant topology accurately. TopCoW represented a first attempt at benchmarking the CoW anatomical segmentation task for MRA and CTA, both morphologically and topologically.
△ Less
Submitted 29 April, 2024; v1 submitted 29 December, 2023;
originally announced December 2023.
-
Utilizing the Janus MoSSe surface polarization in designing complementary metal-oxide-semiconductor field-effect transistors
Authors:
Yun-Pin Chiu,
Hsin-Wen Huang,
Yuh-Renn Wu
Abstract:
Janus transition metal dichalcogenides (JTMDs) have attracted much attention because of their outstanding electronic and optical properties. The additional out-of-plane dipole in JTMDs can form n- and p-like Ohmic contacts, and this may be used in device applications such as pin diodes and photovoltaic cells. In this study, we exploit this property to design n- and p-type metal-oxide-semiconductor…
▽ More
Janus transition metal dichalcogenides (JTMDs) have attracted much attention because of their outstanding electronic and optical properties. The additional out-of-plane dipole in JTMDs can form n- and p-like Ohmic contacts, and this may be used in device applications such as pin diodes and photovoltaic cells. In this study, we exploit this property to design n- and p-type metal-oxide-semiconductor field effect transistors (MOSFETs). First, we use density-functional theory calculations to study the inherent dipole field strength in the trilayer JTMD MoSSe. The intrinsic dipole of MoSSe causes band bending at both the metal/MoSSe and MoSSe/metal interfaces, resulting in electron and hole accumulation to form n- and p-type Ohmic contact regions. We incorporate this property into a 2D finite-element-based Poisson-drift-diffusion solver to perform simulations, on the basis of which we design complementary MOSFETs. Our results demonstrate that JTMDs can be used to make n- and p-MOSFETs in the same layer without the need for any extra do**.
△ Less
Submitted 16 May, 2024; v1 submitted 29 December, 2023;
originally announced December 2023.
-
I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models
Authors:
Xun Guo,
Mingwu Zheng,
Liang Hou,
Yuan Gao,
Yufan Deng,
Pengfei Wan,
Di Zhang,
Yufan Liu,
Weiming Hu,
Zhengjun Zha,
Haibin Huang,
Chongyang Ma
Abstract:
Text-guided image-to-video (I2V) generation aims to generate a coherent video that preserves the identity of the input image and semantically aligns with the input prompt. Existing methods typically augment pretrained text-to-video (T2V) models by either concatenating the image with noised video frames channel-wise before being fed into the model or injecting the image embedding produced by pretra…
▽ More
Text-guided image-to-video (I2V) generation aims to generate a coherent video that preserves the identity of the input image and semantically aligns with the input prompt. Existing methods typically augment pretrained text-to-video (T2V) models by either concatenating the image with noised video frames channel-wise before being fed into the model or injecting the image embedding produced by pretrained image encoders in cross-attention modules. However, the former approach often necessitates altering the fundamental weights of pretrained T2V models, thus restricting the model's compatibility within the open-source communities and disrupting the model's prior knowledge. Meanwhile, the latter typically fails to preserve the identity of the input image. We present I2V-Adapter to overcome such limitations. I2V-Adapter adeptly propagates the unnoised input image to subsequent noised frames through a cross-frame attention mechanism, maintaining the identity of the input image without any changes to the pretrained T2V model. Notably, I2V-Adapter only introduces a few trainable parameters, significantly alleviating the training cost and also ensures compatibility with existing community-driven personalized models and control tools. Moreover, we propose a novel Frame Similarity Prior to balance the motion amplitude and the stability of generated videos through two adjustable control coefficients. Our experimental results demonstrate that I2V-Adapter is capable of producing high-quality videos. This performance, coupled with its agility and adaptability, represents a substantial advancement in the field of I2V, particularly for personalized and controllable applications.
△ Less
Submitted 26 June, 2024; v1 submitted 27 December, 2023;
originally announced December 2023.
-
A Logically Consistent Chain-of-Thought Approach for Stance Detection
Authors:
Bowen Zhang,
Daijun Ding,
Liwen **g,
Hu Huang
Abstract:
Zero-shot stance detection (ZSSD) aims to detect stances toward unseen targets. Incorporating background knowledge to enhance transferability between seen and unseen targets constitutes the primary approach of ZSSD. However, these methods often struggle with a knowledge-task disconnect and lack logical consistency in their predictions. To address these issues, we introduce a novel approach named L…
▽ More
Zero-shot stance detection (ZSSD) aims to detect stances toward unseen targets. Incorporating background knowledge to enhance transferability between seen and unseen targets constitutes the primary approach of ZSSD. However, these methods often struggle with a knowledge-task disconnect and lack logical consistency in their predictions. To address these issues, we introduce a novel approach named Logically Consistent Chain-of-Thought (LC-CoT) for ZSSD, which improves stance detection by ensuring relevant and logically sound knowledge extraction. LC-CoT employs a three-step process. Initially, it assesses whether supplementary external knowledge is necessary. Subsequently, it uses API calls to retrieve this knowledge, which can be processed by a separate LLM. Finally, a manual exemplar guides the LLM to infer stance categories, using an if-then logical structure to maintain relevance and logical coherence. This structured approach to eliciting background knowledge enhances the model's capability, outperforming traditional supervised methods without relying on labeled data.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
Hybrid Precoder Design for Angle-of-Departure Estimation with Limited-Resolution Phase Shifters
Authors:
Hui** Huang,
Musa Furkan Keskin,
Henk Wymeersch,
Xuesong Cai,
Linlong Wu,
Johan Thunberg,
Fredrik Tufvesson
Abstract:
Hybrid analog-digital beamforming stands out as a key enabler for future communication systems with a massive number of antennas. In this paper, we investigate the hybrid precoder design problem for angle-of-departure (AoD) estimation, where we take into account the practical constraint on the limited resolution of phase shifters. Our goal is to design a radio-frequency (RF) precoder and a base-ba…
▽ More
Hybrid analog-digital beamforming stands out as a key enabler for future communication systems with a massive number of antennas. In this paper, we investigate the hybrid precoder design problem for angle-of-departure (AoD) estimation, where we take into account the practical constraint on the limited resolution of phase shifters. Our goal is to design a radio-frequency (RF) precoder and a base-band (BB) precoder to estimate AoD of the user with a high accuracy. To this end, we propose a two-step strategy where we first obtain the fully digital precoder that minimizes the angle error bound, and then the resulting digital precoder is decomposed into an RF precoder and a BB precoder, based on the alternating optimization and the alternating direction method of multipliers. Besides, we derive the quantization error upper bound and analyse the convergence behavior of the proposed algorithm. Numerical results demonstrate the superior performance of the proposed method over state-of-the-art baselines.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
Order Relations of the Wasserstein mean and the spectral geometric mean
Authors:
Luyining Gan,
Huajun Huang
Abstract:
On the space of positive definite matrices, several operator means are popular and have been studied extensively. In this paper, we investigate the near order and the Löwner order relations on the curves defined by the Wasserstein mean and the spectral geometric mean. We show that the near order $\preceq $ is stronger than the eigenvalue entrywise order, and that…
▽ More
On the space of positive definite matrices, several operator means are popular and have been studied extensively. In this paper, we investigate the near order and the Löwner order relations on the curves defined by the Wasserstein mean and the spectral geometric mean. We show that the near order $\preceq $ is stronger than the eigenvalue entrywise order, and that $A\natural_t B \preceq A\diamond_t B$ for $t\in [0,1]$. We prove the monotonicity properties of the curves originated from the Wasserstein mean and the spectral geometric mean in terms of the near order. The Löwner order properties of the Wasserstein mean and the spectral geometric mean are also explored.
△ Less
Submitted 5 May, 2024; v1 submitted 23 December, 2023;
originally announced December 2023.
-
Battery-Care Resource Allocation and Task Offloading in Multi-Agent Post-Disaster MEC Environment
Authors:
Yiwei Tang,
Hualong Huang,
Wenhan Zhan,
Geyong Min,
Zhekai Duan,
Yuchuan Lei
Abstract:
Being an up-and-coming application scenario of mobile edge computing (MEC), the post-disaster rescue suffers multitudinous computing-intensive tasks but unstably guaranteed network connectivity. In rescue environments, quality of service (QoS), such as task execution delay, energy consumption and battery state of health (SoH), is of significant meaning. This paper studies a multi-user post-disaste…
▽ More
Being an up-and-coming application scenario of mobile edge computing (MEC), the post-disaster rescue suffers multitudinous computing-intensive tasks but unstably guaranteed network connectivity. In rescue environments, quality of service (QoS), such as task execution delay, energy consumption and battery state of health (SoH), is of significant meaning. This paper studies a multi-user post-disaster MEC environment with unstable 5G communication, where device-to-device (D2D) link communication and dynamic voltage and frequency scaling (DVFS) are adopted to balance each user's requirement for task delay and energy consumption. A battery degradation evaluation approach to prolong battery lifetime is also presented. The distributed optimization problem is formulated into a mixed cooperative-competitive (MCC) multi-agent Markov decision process (MAMDP) and is tackled with recurrent multi-agent Proximal Policy Optimization (rMAPPO). Extensive simulations and comprehensive comparisons with other representative algorithms clearly demonstrate the effectiveness of the proposed rMAPPO-based offloading scheme.
△ Less
Submitted 23 December, 2023;
originally announced December 2023.
-
Self-Supervised Depth Completion Guided by 3D Perception and Geometry Consistency
Authors:
Yu Cai,
Tianyu Shen,
Shi-Sheng Huang,
Hua Huang
Abstract:
Depth completion, aiming to predict dense depth maps from sparse depth measurements, plays a crucial role in many computer vision related applications. Deep learning approaches have demonstrated overwhelming success in this task. However, high-precision depth completion without relying on the ground-truth data, which are usually costly, still remains challenging. The reason lies on the ignorance o…
▽ More
Depth completion, aiming to predict dense depth maps from sparse depth measurements, plays a crucial role in many computer vision related applications. Deep learning approaches have demonstrated overwhelming success in this task. However, high-precision depth completion without relying on the ground-truth data, which are usually costly, still remains challenging. The reason lies on the ignorance of 3D structural information in most previous unsupervised solutions, causing inaccurate spatial propagation and mixed-depth problems. To alleviate the above challenges, this paper explores the utilization of 3D perceptual features and multi-view geometry consistency to devise a high-precision self-supervised depth completion method. Firstly, a 3D perceptual spatial propagation algorithm is constructed with a point cloud representation and an attention weighting mechanism to capture more reasonable and favorable neighboring features during the iterative depth propagation process. Secondly, the multi-view geometric constraints between adjacent views are explicitly incorporated to guide the optimization of the whole depth completion model in a self-supervised manner. Extensive experiments on benchmark datasets of NYU-Depthv2 and VOID demonstrate that the proposed model achieves the state-of-the-art depth completion performance compared with other unsupervised methods, and competitive performance compared with previous supervised methods.
△ Less
Submitted 23 December, 2023;
originally announced December 2023.
-
Compressing Image-to-Image Translation GANs Using Local Density Structures on Their Learned Manifold
Authors:
Alireza Ganjdanesh,
Shangqian Gao,
Hirad Alipanah,
Heng Huang
Abstract:
Generative Adversarial Networks (GANs) have shown remarkable success in modeling complex data distributions for image-to-image translation. Still, their high computational demands prohibit their deployment in practical scenarios like edge devices. Existing GAN compression methods mainly rely on knowledge distillation or convolutional classifiers' pruning techniques. Thus, they neglect the critical…
▽ More
Generative Adversarial Networks (GANs) have shown remarkable success in modeling complex data distributions for image-to-image translation. Still, their high computational demands prohibit their deployment in practical scenarios like edge devices. Existing GAN compression methods mainly rely on knowledge distillation or convolutional classifiers' pruning techniques. Thus, they neglect the critical characteristic of GANs: their local density structure over their learned manifold. Accordingly, we approach GAN compression from a new perspective by explicitly encouraging the pruned model to preserve the density structure of the original parameter-heavy model on its learned manifold. We facilitate this objective for the pruned model by partitioning the learned manifold of the original generator into local neighborhoods around its generated samples. Then, we propose a novel pruning objective to regularize the pruned model to preserve the local density structure over each neighborhood, resembling the kernel density estimation method. Also, we develop a collaborative pruning scheme in which the discriminator and generator are pruned by two pruning agents. We design the agents to capture interactions between the generator and discriminator by exchanging their peer's feedback when determining corresponding models' architectures. Thanks to such a design, our pruning method can efficiently find performant sub-networks and can maintain the balance between the generator and discriminator more effectively compared to baselines during pruning, thereby showing more stable pruning dynamics. Our experiments on image translation GAN models, Pix2Pix and CycleGAN, with various benchmark datasets and architectures demonstrate our method's effectiveness.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
NeuSurf: On-Surface Priors for Neural Surface Reconstruction from Sparse Input Views
Authors:
Han Huang,
Yulun Wu,
Junsheng Zhou,
Ge Gao,
Ming Gu,
Yu-Shen Liu
Abstract:
Recently, neural implicit functions have demonstrated remarkable results in the field of multi-view reconstruction. However, most existing methods are tailored for dense views and exhibit unsatisfactory performance when dealing with sparse views. Several latest methods have been proposed for generalizing implicit reconstruction to address the sparse view reconstruction task, but they still suffer…
▽ More
Recently, neural implicit functions have demonstrated remarkable results in the field of multi-view reconstruction. However, most existing methods are tailored for dense views and exhibit unsatisfactory performance when dealing with sparse views. Several latest methods have been proposed for generalizing implicit reconstruction to address the sparse view reconstruction task, but they still suffer from high training costs and are merely valid under carefully selected perspectives. In this paper, we propose a novel sparse view reconstruction framework that leverages on-surface priors to achieve highly faithful surface reconstruction. Specifically, we design several constraints on global geometry alignment and local geometry refinement for jointly optimizing coarse shapes and fine details. To achieve this, we train a neural network to learn a global implicit field from the on-surface points obtained from SfM and then leverage it as a coarse geometric constraint. To exploit local geometric consistency, we project on-surface points onto seen and unseen views, treating the consistent loss of projected features as a fine geometric constraint. The experimental results with DTU and BlendedMVS datasets in two prevalent sparse settings demonstrate significant improvements over the state-of-the-art methods.
△ Less
Submitted 21 December, 2023; v1 submitted 21 December, 2023;
originally announced December 2023.
-
Cross-Layer Optimization for Fault-Tolerant Deep Learning
Authors:
Qing Zhang,
Cheng Liu,
Bo Liu,
Haitong Huang,
Ying Wang,
Huawei Li,
Xiaowei Li
Abstract:
Fault-tolerant deep learning accelerator is the basis for highly reliable deep learning processing and critical to deploy deep learning in safety-critical applications such as avionics and robotics. Since deep learning is known to be computing- and memory-intensive, traditional fault-tolerant approaches based on redundant computing will incur substantial overhead including power consumption and ch…
▽ More
Fault-tolerant deep learning accelerator is the basis for highly reliable deep learning processing and critical to deploy deep learning in safety-critical applications such as avionics and robotics. Since deep learning is known to be computing- and memory-intensive, traditional fault-tolerant approaches based on redundant computing will incur substantial overhead including power consumption and chip area. To this end, we propose to characterize deep learning vulnerability difference across both neurons and bits of each neuron, and leverage the vulnerability difference to enable selective protection of the deep learning processing components from the perspective of architecture layer and circuit layer respectively. At the same time, we observe the correlation between model quantization and bit protection overhead of the underlying processing elements of deep learning accelerators, and propose to reduce the bit protection overhead by adding additional quantization constrain without compromising the model accuracy. Finally, we employ Bayesian optimization strategy to co-optimize the correlated cross-layer design parameters at algorithm layer, architecture layer, and circuit layer to minimize the hardware resource consumption while fulfilling multiple user constraints including reliability, accuracy, and performance of the deep learning processing at the same time.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video Grounding
Authors:
Haifeng Huang,
Yang Zhao,
Zehan Wang,
Yan Xia,
Zhou Zhao
Abstract:
Temporal Video Grounding (TVG) aims to localize the temporal boundary of a specific segment in an untrimmed video based on a given language query. Since datasets in this domain are often gathered from limited video scenes, models tend to overfit to scene-specific factors, which leads to suboptimal performance when encountering new scenes in real-world applications. In a new scene, the fine-grained…
▽ More
Temporal Video Grounding (TVG) aims to localize the temporal boundary of a specific segment in an untrimmed video based on a given language query. Since datasets in this domain are often gathered from limited video scenes, models tend to overfit to scene-specific factors, which leads to suboptimal performance when encountering new scenes in real-world applications. In a new scene, the fine-grained annotations are often insufficient due to the expensive labor cost, while the coarse-grained video-query pairs are easier to obtain. Thus, to address this issue and enhance model performance on new scenes, we explore the TVG task in an unsupervised domain adaptation (UDA) setting across scenes for the first time, where the video-query pairs in the source scene (domain) are labeled with temporal boundaries, while those in the target scene are not. Under the UDA setting, we introduce a novel Adversarial Multi-modal Domain Adaptation (AMDA) method to adaptively adjust the model's scene-related knowledge by incorporating insights from the target data. Specifically, we tackle the domain gap by utilizing domain discriminators, which help identify valuable scene-related features effective across both domains. Concurrently, we mitigate the semantic gap between different modalities by aligning video-query pairs with related semantics. Furthermore, we employ a mask-reconstruction approach to enhance the understanding of temporal semantics within a scene. Extensive experiments on Charades-STA, ActivityNet Captions, and YouCook2 demonstrate the effectiveness of our proposed method.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Federated Continual Novel Class Learning
Authors:
Lixu Wang,
Chenxi Liu,
Junfeng Guo,
Jiahua Dong,
Xiao Wang,
Heng Huang,
Qi Zhu
Abstract:
In a privacy-focused era, Federated Learning (FL) has emerged as a promising machine learning technique. However, most existing FL studies assume that the data distribution remains nearly fixed over time, while real-world scenarios often involve dynamic and continual changes. To equip FL systems with continual model evolution capabilities, we focus on an important problem called Federated Continua…
▽ More
In a privacy-focused era, Federated Learning (FL) has emerged as a promising machine learning technique. However, most existing FL studies assume that the data distribution remains nearly fixed over time, while real-world scenarios often involve dynamic and continual changes. To equip FL systems with continual model evolution capabilities, we focus on an important problem called Federated Continual Novel Class Learning (FedCN) in this work. The biggest challenge in FedCN is to merge and align novel classes that are discovered and learned by different clients without compromising privacy. To address this, we propose a Global Alignment Learning (GAL) framework that can accurately estimate the global novel class number and provide effective guidance for local training from a global perspective, all while maintaining privacy protection. Specifically, GAL first locates high-density regions in the representation space through a bi-level clustering mechanism to estimate the novel class number, with which the global prototypes corresponding to novel classes can be constructed. Then, GAL uses a novel semantic weighted loss to capture all possible correlations between these prototypes and the training data for mitigating the impact of pseudo-label noise and data heterogeneity. Extensive experiments on various datasets demonstrate GAL's superior performance over state-of-the-art novel class discovery methods. In particular, GAL achieves significant improvements in novel-class performance, increasing the accuracy by 5.1% to 10.6% in the case of one novel class learning stage and by 7.8% to 17.9% in the case of two novel class learning stages, without sacrificing known-class performance. Moreover, GAL is shown to be effective in equip** a variety of different mainstream FL algorithms with novel class discovery and learning capability, highlighting its potential for many real-world applications.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
In2SET: Intra-Inter Similarity Exploiting Transformer for Dual-Camera Compressive Hyperspectral Imaging
Authors:
Xin Wang,
Lizhi Wang,
Xiangtian Ma,
Maoqing Zhang,
Lin Zhu,
Hua Huang
Abstract:
Dual-Camera Compressed Hyperspectral Imaging (DCCHI) offers the capability to reconstruct 3D Hyperspectral Image (HSI) by fusing compressive and Panchromatic (PAN) image, which has shown great potential for snapshot hyperspectral imaging in practice. In this paper, we introduce a novel DCCHI reconstruction network, the Intra-Inter Similarity Exploiting Transformer (In2SET). Our key insight is to m…
▽ More
Dual-Camera Compressed Hyperspectral Imaging (DCCHI) offers the capability to reconstruct 3D Hyperspectral Image (HSI) by fusing compressive and Panchromatic (PAN) image, which has shown great potential for snapshot hyperspectral imaging in practice. In this paper, we introduce a novel DCCHI reconstruction network, the Intra-Inter Similarity Exploiting Transformer (In2SET). Our key insight is to make full use of the PAN image to assist the reconstruction. To this end, we propose using the intra-similarity within the PAN image as a proxy for approximating the intra-similarity in the original HSI, thereby offering an enhanced content prior for more accurate HSI reconstruction. Furthermore, we aim to align the features from the underlying HSI with those of the PAN image, maintaining semantic consistency and introducing new contextual information for the reconstruction process. By integrating In2SET into a PAN-guided unrolling framework, our method substantially enhances the spatial-spectral fidelity and detail of the reconstructed images, providing a more comprehensive and accurate depiction of the scene. Extensive experiments conducted on both real and simulated datasets demonstrate that our approach consistently outperforms existing state-of-the-art methods in terms of reconstruction quality and computational complexity. Code will be released.
△ Less
Submitted 8 June, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
AccidentGPT: Accident Analysis and Prevention from V2X Environmental Perception with Multi-modal Large Model
Authors:
Lening Wang,
Yilong Ren,
Han Jiang,
Pinlong Cai,
Daocheng Fu,
Tianqi Wang,
Zhiyong Cui,
Haiyang Yu,
Xuesong Wang,
Hanchu Zhou,
Helai Huang,
Yinhai Wang
Abstract:
Traffic accidents, being a significant contributor to both human casualties and property damage, have long been a focal point of research for many scholars in the field of traffic safety. However, previous studies, whether focusing on static environmental assessments or dynamic driving analyses, as well as pre-accident predictions or post-accident rule analyses, have typically been conducted in is…
▽ More
Traffic accidents, being a significant contributor to both human casualties and property damage, have long been a focal point of research for many scholars in the field of traffic safety. However, previous studies, whether focusing on static environmental assessments or dynamic driving analyses, as well as pre-accident predictions or post-accident rule analyses, have typically been conducted in isolation. There has been a lack of an effective framework for develo** a comprehensive understanding and application of traffic safety. To address this gap, this paper introduces AccidentGPT, a comprehensive accident analysis and prevention multi-modal large model. AccidentGPT establishes a multi-modal information interaction framework grounded in multi-sensor perception, thereby enabling a holistic approach to accident analysis and prevention in the field of traffic safety. Specifically, our capabilities can be categorized as follows: for autonomous driving vehicles, we provide comprehensive environmental perception and understanding to control the vehicle and avoid collisions. For human-driven vehicles, we offer proactive long-range safety warnings and blind-spot alerts while also providing safety driving recommendations and behavioral norms through human-machine dialogue and interaction. Additionally, for traffic police and management agencies, our framework supports intelligent and real-time analysis of traffic safety, encompassing pedestrian, vehicles, roads, and the environment through collaborative perception from multiple vehicles and road testing devices. The system is also capable of providing a thorough analysis of accident causes and liability after vehicle collisions. Our framework stands as the first large model to integrate comprehensive scene understanding into traffic safety studies. Project page: https://accidentgpt.github.io
△ Less
Submitted 28 December, 2023; v1 submitted 20 December, 2023;
originally announced December 2023.
-
Quantum State Compression Shadow
Authors:
Chen Ding,
Xiao-Yue Xu,
Shuo Zhang,
Wan-Su Bao,
He-Liang Huang
Abstract:
Quantum state readout serves as the cornerstone of quantum information processing, exerting profound influence on quantum communication, computation, and metrology. In this study, we introduce an innovative readout architecture called Compression Shadow (CompShadow), which transforms the conventional readout paradigm by compressing multi-qubit states into single-qubit shadows before measurement. C…
▽ More
Quantum state readout serves as the cornerstone of quantum information processing, exerting profound influence on quantum communication, computation, and metrology. In this study, we introduce an innovative readout architecture called Compression Shadow (CompShadow), which transforms the conventional readout paradigm by compressing multi-qubit states into single-qubit shadows before measurement. Compared to direct measurements of the initial quantum states, CompShadow achieves comparable accuracy in amplitude and observable expectation estimation while consuming similar measurement resources. Furthermore, its implementation on near-term quantum hardware with nearest-neighbor coupling architectures is straightforward. Significantly, CompShadow brings forth novel features, including the complete suppression of correlated readout noise, fundamentally reducing the quantum hardware demands for readout. It also facilitates the exploration of multi-body system properties through single-qubit probes and opens the door to designing quantum communication protocols with exponential loss suppression. Our findings mark the emergence of a new era in quantum state readout, setting the stage for a revolutionary leap in quantum information processing capabilities.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
On the Role of Server Momentum in Federated Learning
Authors:
Jianhui Sun,
Xidong Wu,
Heng Huang,
Aidong Zhang
Abstract:
Federated Averaging (FedAvg) is known to experience convergence issues when encountering significant clients system heterogeneity and data heterogeneity. Server momentum has been proposed as an effective mitigation. However, existing server momentum works are restrictive in the momentum formulation, do not properly schedule hyperparameters and focus only on system homogeneous settings, which leave…
▽ More
Federated Averaging (FedAvg) is known to experience convergence issues when encountering significant clients system heterogeneity and data heterogeneity. Server momentum has been proposed as an effective mitigation. However, existing server momentum works are restrictive in the momentum formulation, do not properly schedule hyperparameters and focus only on system homogeneous settings, which leaves the role of server momentum still an under-explored problem. In this paper, we propose a general framework for server momentum, that (a) covers a large class of momentum schemes that are unexplored in federated learning (FL), (b) enables a popular stagewise hyperparameter scheduler, (c) allows heterogeneous and asynchronous local computing. We provide rigorous convergence analysis for the proposed framework. To our best knowledge, this is the first work that thoroughly analyzes the performances of server momentum with a hyperparameter scheduler and system heterogeneity. Extensive experiments validate the effectiveness of our proposed framework.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Joint DOA estimation and distorted sensor detection under entangled low-rank and row-sparse constraints
Authors:
Hui** Huang,
Tianjian Zhang,
Feng Yin,
Bin Liao,
Henk Wymeersch
Abstract:
The problem of joint direction-of-arrival estimation and distorted sensor detection has received a lot of attention in recent decades. Most state-of-the-art work formulated such a problem via low-rank and row-sparse decomposition, where the low-rank and row-sparse components were treated in an isolated manner. Such a formulation results in a performance loss. Differently, in this paper, we entangl…
▽ More
The problem of joint direction-of-arrival estimation and distorted sensor detection has received a lot of attention in recent decades. Most state-of-the-art work formulated such a problem via low-rank and row-sparse decomposition, where the low-rank and row-sparse components were treated in an isolated manner. Such a formulation results in a performance loss. Differently, in this paper, we entangle the low-rank and row-sparse components by exploring their inherent connection. Furthermore, we take into account the maximal distortion level of the sensors. An alternating optimization scheme is proposed to solve the low-rank component and the sparse component, where a closed-form solution is derived for the low-rank component and a quadratic programming is developed for the sparse component. Numerical results exhibit the effectiveness and superiority of the proposed method.
△ Less
Submitted 21 December, 2023; v1 submitted 19 December, 2023;
originally announced December 2023.